.\" Man page generated from reStructuredText. . .TH "OBIADDTAXIDS" "1" "Jul 27, 2019" " 1.02 13" "OBITools" .SH NAME obiaddtaxids \- description of obiaddtaxids . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .sp The \fI\%obiaddtaxids\fP command annotates sequence records with a \fItaxid\fP based on a taxon scientific name stored in the sequence record header. .sp Taxonomic information linking a \fItaxid\fP to a taxon scientific name is stored in a database formatted as an ecoPCR database (see obitaxonomy) or a NCBI taxdump (see NCBI ftp site). .sp The way to extract the taxon scientific name from the sequence record header can be specified by two options: .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 By default, the sequence identifier is used. Underscore characters (\fB_\fP) are substituted by spaces before looking for the taxon scientific name into the taxonomic database. .IP \(bu 2 If the input file is an \fBOBITools\fP extended fasta format, the \fB\-k\fP option specifies the attribute containing the taxon scientific name. .IP \(bu 2 If the input file is a fasta file imported from the UNITE or from the SILVA web sites, the \fB\-f\fP option allows specifying this source and parsing correctly the associated taxonomic information. .UNINDENT .UNINDENT .UNINDENT .sp For each sequence record, \fI\%obiaddtaxids\fP tries to match the extracted taxon scientific name with those stored in the taxonomic database. .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 If a match is found, the sequence record is annotated with the corresponding \fItaxid\fP\&. .UNINDENT .UNINDENT .UNINDENT .sp Otherwise, .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 If the \fB\-g\fP option is set and the taxon name is composed of two words and only the first one is found in the taxonomic database at the ‘genus’ rank, \fI\%obiaddtaxids\fP considers that it found the genus associated with this sequence record and it stores this sequence record in the file specified by the \fB\-g\fP option. .IP \(bu 2 If the \fB\-u\fP option is set and no taxonomic information is retrieved from the scientific taxon name, the sequence record is stored in the file specified by the \fB\-u\fP option. .UNINDENT .sp \fIExample\fP .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C > obiaddtaxids \-k species_name \-g genus_identified.fasta \e \-u unidentified.fasta \-d my_ecopcr_database \e my_sequences.fasta > identified.fasta .ft P .fi .UNINDENT .UNINDENT .sp Tries to match the value associated with the \fBspecies_name\fP key of each sequence record from the \fBmy_sequences.fasta\fP file with a taxon name from the ecoPCR database \fBmy_ecopcr_database\fP\&. .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 If there is an exact match, the sequence record is stored in the \fBidentified.fasta\fP file. .IP \(bu 2 If not and the \fBspecies_name\fP value is composed of two words, \fI\%obiaddtaxids\fP considers the first word as a genus name and tries to find it into the taxonomic database. .INDENT 2.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 If a genus is found, the sequence record is stored in the \fBgenus_identified.fasta\fP file. .IP \(bu 2 Otherwise the sequence record is stored in the \fBunidentified.fasta\fP file. .UNINDENT .UNINDENT .UNINDENT .UNINDENT .UNINDENT .UNINDENT .UNINDENT .UNINDENT .SH OBIADDTAXIDS SPECIFIC OPTIONS .INDENT 0.0 .TP .B \-f , \-\-format= Format of the sequence file. Possible formats are: .INDENT 7.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 \fBraw\fP: for regular \fBOBITools\fP extended fasta files (default value). .IP \(bu 2 \fBUNITE\fP: for fasta files downloaded from the \fI\%UNITE web site\fP\&. .IP \(bu 2 \fBSILVA\fP: for fasta files downloaded from the \fI\%SILVA web site\fP\&. .UNINDENT .UNINDENT .UNINDENT .UNINDENT .INDENT 0.0 .TP .B \-k , \-\-key\-name= Key of the attribute containing the taxon name in sequence files in the \fBOBITools\fP extended fasta format. .UNINDENT .INDENT 0.0 .TP .B \-a , \-\-restricting_ancestor= Enables to restrict the search of \fItaxids\fP under a specified ancestor. .sp \fB\fP can be a \fItaxid\fP (integer) or a key (string). .INDENT 7.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 If it is a \fItaxid\fP, this \fItaxid\fP is used to restrict the search for all the sequence records. .IP \(bu 2 If it is a key, \fI\%obiaddtaxids\fP looks for the ancestor \fItaxid\fP in the corresponding attribute. This allows having a different ancestor restriction for each sequence record. .UNINDENT .UNINDENT .UNINDENT .UNINDENT .INDENT 0.0 .TP .B \-g , \-\-genus_found= File used to store sequences with a match found for the genus. .sp \fBCAUTION:\fP .INDENT 7.0 .INDENT 3.5 this option is not valid with the UNITE format. .UNINDENT .UNINDENT .UNINDENT .INDENT 0.0 .TP .B \-u , \-\-unidentified= File used to store sequences with no taxonomic match found. .UNINDENT .SH TAXONOMY RELATED OPTIONS .INDENT 0.0 .TP .B \-d , \-\-database= ecoPCR taxonomy Database name .UNINDENT .INDENT 0.0 .TP .B \-t , \-\-taxonomy\-dump= NCBI Taxonomy dump repository name .UNINDENT .SH COMMON OPTIONS .INDENT 0.0 .TP .B \-h, \-\-help Shows this help message and exits. .UNINDENT .INDENT 0.0 .TP .B \-\-DEBUG Sets logging in debug mode. .UNINDENT .SH OBIADDTAXIDS ADDED SEQUENCE ATTRIBUTE .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 taxid .UNINDENT .UNINDENT .UNINDENT .SH AUTHOR The OBITools Development Team - LECA .SH COPYRIGHT 2019 - 2015, OBITool Development Team .\" Generated by docutils manpage writer. .