.TH RUN_TIPP.PY "1" "September 2021" "run_tipp.py" "User Commands"
.SH NAME
run_tipp.py \- an identification and phylogenetic profiling tool
.SH DESCRIPTION
usage: run_tipp.py [\-h] [\-v] [\-A N] [\-P N] [\-F N] [\-\-distance DISTANCE]
.IP
[\-M DIAMETER] [\-S DECOMP] [\-p DIR] [\-rt] [\-o OUTPUT]
[\-d OUTPUT_DIR] [\-c CONFIG] [\-t TREE] [\-r RAXML] [\-a ALIGN]
[\-f FRAG] [\-m MOLECULE] [\-\-ignore-overlap] [\-x N]
[\-cp CHCK_FILE] [\-cpi N] [\-seed N] [\-R N] [\-at N] [\-D]
[\-pt N] [\-PD N] [\-tx TAXONOMY] [\-txm MAPPING] [\-adt TREE]
[\-C N]
.PP
This script runs the SEPP algorithm on an input tree, alignment, fragment
file, and RAxML info file. It uses a reference dataset which has to be
downloaded from
\fBhttps://obj.umiacs.umd.edu/tipp/tipp2-refpkg.tar.gz\fR
.PP
If the local administrator has not set the path to this reference dataset in
/etc/tipp/tipp.config, you should copy this file to ~/.tipp/ and put the path
to the dataset in the \fBreference\fR section of the configuration file,
see \fBtipp.config\fR(5).
.SS "optional arguments:"
.TP
\fB\-h\fR, \fB\-\-help\fR
show this help message and exit
.TP
\fB\-v\fR, \fB\-\-version\fR
show program's version number and exit
.SS "DECOMPOSITION OPTIONS:"
.IP
These options determine the alignment decomposition size and taxon
insertion size. If None is given, then the default is to align/place at
10% of total taxa. The alignment decomosition size must be less than the
taxon insertion size.
.TP
\fB\-A\fR N, \fB\-\-alignmentSize\fR N
max alignment subset size of N [default: 10% of the
total number of taxa or the placement subset size if
given]
.TP
\fB\-P\fR N, \fB\-\-placementSize\fR N
max placement subset size of N [default: 10% of the
total number of taxa or the alignment length
(whichever bigger)]
.TP
\fB\-F\fR N, \fB\-\-fragmentChunkSize\fR N
maximum fragment chunk size of N. Helps controlling
memory. [default: 20000]
.TP
\fB\-\-distance\fR DISTANCE
minimum p\-distance before stopping the
decomposition[default: 1]
.TP
\fB\-M\fR DIAMETER, \fB\-\-diameter\fR DIAMETER
maximum tree diameter before stopping the
decomposition[default: None]
.TP
\fB\-S\fR DECOMP, \fB\-\-decomp_strategy\fR DECOMP
decomposition strategy [default: using tree branch
length]
.SS "OUTPUT OPTIONS:"
.IP
These options control output.
.TP
\fB\-p\fR DIR, \fB\-\-tempdir\fR DIR
Tempfile files will be written to DIR. Full\-path
required. [default: /tmp/sepp]
.TP
\fB\-rt\fR, \fB\-\-remtemp\fR
Remove template directory. [default: disabled]
.TP
\fB\-o\fR OUTPUT, \fB\-\-output\fR OUTPUT
output files with prefix OUTPUT. [default: output]
.TP
\fB\-d\fR OUTPUT_DIR, \fB\-\-outdir\fR OUTPUT_DIR
output to OUTPUT_DIR directory. full\-path required.
[default: .]
.SS "INPUT OPTIONS:"
.IP
These options control input. To run SEPP the following is required. A
backbone tree (in newick format), a RAxML_info file (this is the file
generated by RAxML during estimation of the backbone tree. Pplacer uses
this info file to set model parameters), a backbone alignment file (in
fasta format), and a fasta file including fragments. The input sequences
are assumed to be DNA unless specified otherwise.
.TP
\fB\-c\fR CONFIG, \fB\-\-config CONFIG
A config file, including options used to run SEPP.
Options provided as command line arguments overwrite
config file values for those options. [default: None]
.TP
\fB\-t\fR TREE, \fB\-\-tree\fR TREE
Input tree file (newick format) [default: None]
.TP
\fB\-r\fR RAXML, \fB\-\-raxml\fR RAXML
RAxML_info file including model parameters, generated
by RAxML.[default: None]
.TP
\fB\-a\fR ALIGN, \fB\-\-alignment\fR ALIGN
Aligned fasta file [default: None]
.TP
\fB\-f\fR FRAG, \fB\-\-fragment\fR FRAG
fragment file [default: None]
.TP
\fB\-m\fR MOLECULE, \fB\-\-molecule\fR MOLECULE
Molecule type of sequences. Can be amino, dna, or rna
[default: dna]
.TP
\fB\-\-ignore-overlap\fR
When a query sequence has the same name as a backbone
sequence, ignore the query sequences and keep the
backbone sequence [default: False]
.SS "OTHER OPTIONS:"
.IP
These options control how SEPP is run
.TP
\fB\-x\fR N, \fB\-\-cpu\fR N
Use N cpus [default: number of cpus available on the
machine]
.TP
\fB\-cp\fR CHCK_FILE, \fB\-\-checkpoint\fR CHCK_FILE
checkpoint file [default: no checkpointing]
.TP
\fB\-cpi\fR N, \fB\-\-interval\fR N
Interval (in seconds) between checkpoint writes. Has
effect only with \fB\-cp\fR provided. [default: 3600]
.TP
\fB\-seed\fR N, \fB\-\-randomseed\fR N
random seed number. [default: 297834]
.SS "TIPP OPTIONS:"
.IP
These arguments set settings specific to TIPP
.TP
\fB\-R\fR N, \fB\-\-reference_pkg\fR N
Use a pre\-computed reference package [default: None]
.TP
\fB\-at\fR N, \fB\-\-alignmentThreshold\fR N
Enough alignment subsets are selected to reach a
commulative probability of N. This should be a number
between 0 and 1 [default: 0.95]
.TP
\fB\-D\fR, \fB\-\-dist\fR
Treat fragments as distribution
.TP
\fB\-pt\fR N, \fB\-\-placementThreshold\fR N
Enough placements are selected to reach a commulative
probability of N. This should be a number between 0
and 1 [default: 0.95]
.TP
\fB\-PD\fR N, \fB\-\-push_down\fR N
Whether to classify based on children below or above
insertion point. [default: True]
.TP
\fB\-tx\fR TAXONOMY, \fB\-\-taxonomy\fR TAXONOMY
A file describing the taxonomy. This is a commaseparated text file that has the following fields:
taxon_id,parent_id,taxon_name,rank. If there are other
columns, they are ignored. The first line is also
ignored.
.TP
\fB\-txm\fR MAPPING, \fB\-\-taxonomyNameMapping\fR MAPPING
A comma\-separated text file mapping alignment sequence
names to taxonomic ids. Formats (each line):
sequence_name,taxon_id. If there are other columns,
they are ignored. The first line is also ignored.
.TP
\fB\-adt\fR TREE, \fB\-\-alignmentDecompositionTree\fR TREE
A newick tree file used for decomposing taxa into
alignment subsets. [default: the backbone tree]
.TP
\fB\-C\fR N, \fB\-\-cutoff\fR N
Placement probability requirement to count toward the
distribution. This should be a number between 0 and 1
[default: 0.0]
.SH "SEE ALSO"
\fBrun_sepp.py\fR(1), \fBtipp.config\fR(5)