.TH RUN_TIPP.PY "1" "September 2021" "run_tipp.py" "User Commands" .SH NAME run_tipp.py \- an identification and phylogenetic profiling tool .SH DESCRIPTION usage: run_tipp.py [\-h] [\-v] [\-A N] [\-P N] [\-F N] [\-\-distance DISTANCE] .IP [\-M DIAMETER] [\-S DECOMP] [\-p DIR] [\-rt] [\-o OUTPUT] [\-d OUTPUT_DIR] [\-c CONFIG] [\-t TREE] [\-r RAXML] [\-a ALIGN] [\-f FRAG] [\-m MOLECULE] [\-\-ignore-overlap] [\-x N] [\-cp CHCK_FILE] [\-cpi N] [\-seed N] [\-R N] [\-at N] [\-D] [\-pt N] [\-PD N] [\-tx TAXONOMY] [\-txm MAPPING] [\-adt TREE] [\-C N] .PP This script runs the SEPP algorithm on an input tree, alignment, fragment file, and RAxML info file. It uses a reference dataset which has to be downloaded from \fBhttps://obj.umiacs.umd.edu/tipp/tipp2-refpkg.tar.gz\fR .PP If the local administrator has not set the path to this reference dataset in /etc/tipp/tipp.config, you should copy this file to ~/.tipp/ and put the path to the dataset in the \fBreference\fR section of the configuration file, see \fBtipp.config\fR(5). .SS "optional arguments:" .TP \fB\-h\fR, \fB\-\-help\fR show this help message and exit .TP \fB\-v\fR, \fB\-\-version\fR show program's version number and exit .SS "DECOMPOSITION OPTIONS:" .IP These options determine the alignment decomposition size and taxon insertion size. If None is given, then the default is to align/place at 10% of total taxa. The alignment decomosition size must be less than the taxon insertion size. .TP \fB\-A\fR N, \fB\-\-alignmentSize\fR N max alignment subset size of N [default: 10% of the total number of taxa or the placement subset size if given] .TP \fB\-P\fR N, \fB\-\-placementSize\fR N max placement subset size of N [default: 10% of the total number of taxa or the alignment length (whichever bigger)] .TP \fB\-F\fR N, \fB\-\-fragmentChunkSize\fR N maximum fragment chunk size of N. Helps controlling memory. [default: 20000] .TP \fB\-\-distance\fR DISTANCE minimum p\-distance before stopping the decomposition[default: 1] .TP \fB\-M\fR DIAMETER, \fB\-\-diameter\fR DIAMETER maximum tree diameter before stopping the decomposition[default: None] .TP \fB\-S\fR DECOMP, \fB\-\-decomp_strategy\fR DECOMP decomposition strategy [default: using tree branch length] .SS "OUTPUT OPTIONS:" .IP These options control output. .TP \fB\-p\fR DIR, \fB\-\-tempdir\fR DIR Tempfile files will be written to DIR. Full\-path required. [default: /tmp/sepp] .TP \fB\-rt\fR, \fB\-\-remtemp\fR Remove template directory. [default: disabled] .TP \fB\-o\fR OUTPUT, \fB\-\-output\fR OUTPUT output files with prefix OUTPUT. [default: output] .TP \fB\-d\fR OUTPUT_DIR, \fB\-\-outdir\fR OUTPUT_DIR output to OUTPUT_DIR directory. full\-path required. [default: .] .SS "INPUT OPTIONS:" .IP These options control input. To run SEPP the following is required. A backbone tree (in newick format), a RAxML_info file (this is the file generated by RAxML during estimation of the backbone tree. Pplacer uses this info file to set model parameters), a backbone alignment file (in fasta format), and a fasta file including fragments. The input sequences are assumed to be DNA unless specified otherwise. .TP \fB\-c\fR CONFIG, \fB\-\-config CONFIG A config file, including options used to run SEPP. Options provided as command line arguments overwrite config file values for those options. [default: None] .TP \fB\-t\fR TREE, \fB\-\-tree\fR TREE Input tree file (newick format) [default: None] .TP \fB\-r\fR RAXML, \fB\-\-raxml\fR RAXML RAxML_info file including model parameters, generated by RAxML.[default: None] .TP \fB\-a\fR ALIGN, \fB\-\-alignment\fR ALIGN Aligned fasta file [default: None] .TP \fB\-f\fR FRAG, \fB\-\-fragment\fR FRAG fragment file [default: None] .TP \fB\-m\fR MOLECULE, \fB\-\-molecule\fR MOLECULE Molecule type of sequences. Can be amino, dna, or rna [default: dna] .TP \fB\-\-ignore-overlap\fR When a query sequence has the same name as a backbone sequence, ignore the query sequences and keep the backbone sequence [default: False] .SS "OTHER OPTIONS:" .IP These options control how SEPP is run .TP \fB\-x\fR N, \fB\-\-cpu\fR N Use N cpus [default: number of cpus available on the machine] .TP \fB\-cp\fR CHCK_FILE, \fB\-\-checkpoint\fR CHCK_FILE checkpoint file [default: no checkpointing] .TP \fB\-cpi\fR N, \fB\-\-interval\fR N Interval (in seconds) between checkpoint writes. Has effect only with \fB\-cp\fR provided. [default: 3600] .TP \fB\-seed\fR N, \fB\-\-randomseed\fR N random seed number. [default: 297834] .SS "TIPP OPTIONS:" .IP These arguments set settings specific to TIPP .TP \fB\-R\fR N, \fB\-\-reference_pkg\fR N Use a pre\-computed reference package [default: None] .TP \fB\-at\fR N, \fB\-\-alignmentThreshold\fR N Enough alignment subsets are selected to reach a commulative probability of N. This should be a number between 0 and 1 [default: 0.95] .TP \fB\-D\fR, \fB\-\-dist\fR Treat fragments as distribution .TP \fB\-pt\fR N, \fB\-\-placementThreshold\fR N Enough placements are selected to reach a commulative probability of N. This should be a number between 0 and 1 [default: 0.95] .TP \fB\-PD\fR N, \fB\-\-push_down\fR N Whether to classify based on children below or above insertion point. [default: True] .TP \fB\-tx\fR TAXONOMY, \fB\-\-taxonomy\fR TAXONOMY A file describing the taxonomy. This is a commaseparated text file that has the following fields: taxon_id,parent_id,taxon_name,rank. If there are other columns, they are ignored. The first line is also ignored. .TP \fB\-txm\fR MAPPING, \fB\-\-taxonomyNameMapping\fR MAPPING A comma\-separated text file mapping alignment sequence names to taxonomic ids. Formats (each line): sequence_name,taxon_id. If there are other columns, they are ignored. The first line is also ignored. .TP \fB\-adt\fR TREE, \fB\-\-alignmentDecompositionTree\fR TREE A newick tree file used for decomposing taxa into alignment subsets. [default: the backbone tree] .TP \fB\-C\fR N, \fB\-\-cutoff\fR N Placement probability requirement to count toward the distribution. This should be a number between 0 and 1 [default: 0.0] .SH "SEE ALSO" \fBrun_sepp.py\fR(1), \fBtipp.config\fR(5)