.TH RUN_ABUNDANCE.PY "1" "September 2021" "run_abundance.py" "User Commands"
.SH NAME
run_abundance.py \- helper script to estimate the abundance at a given taxonomic level
.SH DESCRIPTION
usage: run_abundance.py [\-h] [\-v] [\-A N] [\-P N] [\-F N] [\-\-distance DISTANCE]
.TP
[\-M DIAMETER] [\-S DECOMP] [\-p DIR] [\-rt] [\-o OUTPUT]
[\-d OUTPUT_DIR] [\-c CONFIG] [\-t TREE] [\-r RAXML]
[\-a ALIGN] [\-f FRAG] [\-m MOLECULE] [\-\-ignore-overlap]
[\-x N] [\-cp CHCK_FILE] [\-cpi N] [\-seed N] [\-bt N]
[\-at N] [\-pt N] [\-g N] [\-b N] [\-no_trim] [\-bin N] [\-D]
[\-C N] [\-G GENES]
.PP
This script runs the SEPP algorithm on an input tree, alignment, fragment
file, and RAxML info file.
.SS "optional arguments:"
.TP
\fB\-h\fR, \fB\-\-help\fR
show this help message and exit
.TP
\fB\-v\fR, \fB\-\-version\fR
show program's version number and exit
.SS "DECOMPOSITION OPTIONS:"
.IP
These options determine the alignment decomposition size and taxon
insertion size. If None is given, then the default is to align/place at
10% of total taxa. The alignment decomosition size must be less than the
taxon insertion size.
.TP
\fB\-A\fR N, \fB\-\-alignmentSize\fR N
max alignment subset size of N [default: 10% of the
total number of taxa or the placement subset size if
given]
.TP
\fB\-P\fR N, \fB\-\-placementSize\fR N
max placement subset size of N [default: 10% of the
total number of taxa or the alignment length
(whichever bigger)]
.TP
\fB\-F\fR N, \fB\-\-fragmentChunkSize\fR N
maximum fragment chunk size of N. Helps controlling
memory. [default: 20000]
.TP
\fB\-\-distance\fR DISTANCE
minimum p\-distance before stopping the
decomposition[default: 1]
.TP
\fB\-M\fR DIAMETER, \fB\-\-diameter\fR DIAMETER
maximum tree diameter before stopping the
decomposition[default: None]
.TP
\fB\-S\fR DECOMP, \fB\-\-decomp_strategy\fR DECOMP
decomposition strategy [default: using tree branch
length]
.SS "OUTPUT OPTIONS:"
.IP
These options control output.
.TP
\fB\-p\fR DIR, \fB\-\-tempdir\fR DIR
Tempfile files will be written to DIR. Full\-path
required. [default: /tmp/sepp]
.TP
\fB\-rt\fR, \fB\-\-remtemp\fR
Remove tempfile directory. [default: disabled]
.TP
\fB\-o\fR OUTPUT, \fB\-\-output\fR OUTPUT
output files with prefix OUTPUT. [default: output]
.TP
\fB\-d\fR OUTPUT_DIR, \fB\-\-outdir\fR OUTPUT_DIR
output to OUTPUT_DIR directory. full\-path required.
[default: .]
.SS "INPUT OPTIONS:"
.IP
These options control input. To run SEPP the following is required. A
backbone tree (in newick format), a RAxML_info file (this is the file
generated by RAxML during estimation of the backbone tree. Pplacer uses
this info file to set model parameters), a backbone alignment file (in
fasta format), and a fasta file including fragments. The input sequences
are assumed to be DNA unless specified otherwise.
.TP
\fB\-c\fR CONFIG, \fB\-\-config\fR CONFIG
A config file, including options used to run SEPP.
Options provided as command line arguments overwrite
config file values for those options. [default: None]
.TP
\fB\-t\fR TREE, \fB\-\-tree\fR TREE
Input tree file (newick format) [default: None]
.TP
\fB\-r\fR RAXML, \fB\-\-raxml\fR RAXML
RAxML_info file including model parameters, generated
by RAxML.[default: None]
.TP
\fB\-a\fR ALIGN, \fB\-\-alignment\fR ALIGN
Aligned fasta file [default: None]
.TP
\fB\-f\fR FRAG, \fB\-\-fragment\fR FRAG
fragment file [default: None]
.TP
\fB\-m\fR MOLECULE, \fB\-\-molecule\fR MOLECULE
Molecule type of sequences. Can be amino, dna, or rna
[default: dna]
.TP
\fB\-\-ignore-overlap\fR
When a query sequence has the same name as a backbone
sequence, ignore the query sequences and keep the
backbone sequence [default: False]
.SS "OTHER OPTIONS:"
.IP
These options control how SEPP is run
.TP
\fB\-x\fR N, \fB\-\-cpu\fR N
Use N cpus [default: number of cpus available on the
machine]
.TP
\fB\-cp\fR CHCK_FILE, \fB\-\-checkpoint\fR CHCK_FILE
checkpoint file [default: no checkpointing]
.TP
\fB\-cpi\fR N, \fB\-\-interval\fR N
Interval (in seconds) between checkpoint writes. Has
effect only with \fB\-cp\fR provided. [default: 3600]
.TP
\fB\-seed\fR N, \fB\-\-randomseed\fR N
random seed number. [default: 297834]
.SS "TIPP OPTIONS:"
.IP
These arguments set settings specific to TIPP
.TP
\fB\-bt\fR N, \fB\-\-blastThreshold\fR N
Minimum query coverage for blast hit to map read to a
markerThis should be a number between >0 [default : 50]
.TP
\fB\-at\fR N, \fB\-\-alignmentThreshold\fR N
Enough alignment subsets are selected to reach a
commulative probability of N. This should be a number
between 0 and 1 [default: 0.95]
.TP
\fB\-pt\fR N, \fB\-\-placementThreshold\fR N
Enough placements are selected to reach a commulative
probability of N. This should be a number between 0
and 1 [default: 0.95]
.TP
\fB\-g\fR N, \fB\-\-gene\fR N
Classify on only the specified gene.
.TP
\fB\-b\fR N, \fB\-\-blast_file\fR N
Blast file with fragments already binned.
.TP
\fB\-no_trim\fR, \fB\-\-do_not_trim_after_blast\fR
Trim query sequence if it extends outside marker (BLAST only).
.TP
\fB\-bin\fR N, \fB\-\-bin_using\fR N
Use blast or hmmer for binning [default: blast]
.TP
\fB\-D\fR, \fB\-\-dist\fR
Treat fragments as distribution
.TP
\fB\-C\fR N, \fB\-\-cutoff\fR N
Placement probability requirement to count toward the
distribution. This should be a number between 0 and 1
[default: 0.0]
.TP
\fB\-G\fR GENES, \fB\-\-genes\fR GENES
Use markers or cogs genes [default: markers-v3]
.SH "SEE ALSO"
\fBrun_sepp.py\fR(1), \fBrun_tipp.py\fR(1),