.\" Man page generated from reStructuredText.
.
.TH "OBIADDTAXIDS" "1" "Jul 27, 2019" " 1.02 13" "OBITools"
.SH NAME
obiaddtaxids \- description of obiaddtaxids
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.sp
The \fI\%obiaddtaxids\fP command annotates sequence records with a \fItaxid\fP based on 
a taxon scientific name stored in the sequence record header.
.sp
Taxonomic information linking a \fItaxid\fP to a taxon scientific name is stored in a 
database formatted as an ecoPCR database (see obitaxonomy) or 
a NCBI taxdump (see NCBI ftp site).
.sp
The way to extract the taxon scientific name from the sequence record header can be
specified by two options:
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
By default, the sequence identifier is used. Underscore characters (\fB_\fP) are substituted
by spaces before looking for the taxon scientific name into the taxonomic
database.
.IP \(bu 2
If the input file is an \fBOBITools\fP extended fasta format, the \fB\-k\fP option
specifies the attribute containing the taxon scientific name.
.IP \(bu 2
If the input file is a fasta file imported from the UNITE or from the SILVA web sites,
the \fB\-f\fP option allows specifying this source and parsing correctly the associated 
taxonomic information.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
For each sequence record, \fI\%obiaddtaxids\fP tries to match the extracted taxon scientific name 
with those stored in the taxonomic database.
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
If a match is found, the sequence record is annotated with the corresponding \fItaxid\fP\&.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
Otherwise,
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
If the \fB\-g\fP option is set and the taxon name is composed of two words and only the 
first one is found in the taxonomic database at the ‘genus’ rank, \fI\%obiaddtaxids\fP 
considers that it found the genus associated with this sequence record and it stores this 
sequence record in the file specified by the \fB\-g\fP option.
.IP \(bu 2
If the \fB\-u\fP option is set and no taxonomic information is retrieved from the 
scientific taxon name, the sequence record is stored in the file specified by the 
\fB\-u\fP option.
.UNINDENT
.sp
\fIExample\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
> obiaddtaxids \-k species_name \-g genus_identified.fasta \e
               \-u unidentified.fasta \-d my_ecopcr_database \e
               my_sequences.fasta > identified.fasta
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Tries to match the value associated with the \fBspecies_name\fP key of each sequence record 
from the \fBmy_sequences.fasta\fP file with a taxon name from the ecoPCR database \fBmy_ecopcr_database\fP\&.
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
If there is an exact match, the sequence record is stored in the \fBidentified.fasta\fP file.
.IP \(bu 2
If not and the \fBspecies_name\fP value is composed of two words, \fI\%obiaddtaxids\fP 
considers the first word as a genus name and tries to find it into the taxonomic database.
.INDENT 2.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
If a genus is found, the sequence record is stored in the \fBgenus_identified.fasta\fP
file.
.IP \(bu 2
Otherwise the sequence record is stored in the \fBunidentified.fasta\fP file.
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.SH OBIADDTAXIDS SPECIFIC OPTIONS
.INDENT 0.0
.TP
.B \-f <FORMAT>, \-\-format=<FORMAT>
Format of the sequence file. Possible formats are:
.INDENT 7.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
\fBraw\fP: for regular \fBOBITools\fP extended fasta files (default value).
.IP \(bu 2
\fBUNITE\fP: for fasta files downloaded from the \fI\%UNITE web site\fP\&.
.IP \(bu 2
\fBSILVA\fP: for fasta files downloaded from the \fI\%SILVA web site\fP\&.
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B \-k <KEY>, \-\-key\-name=<KEY>
Key of the attribute containing the taxon name in sequence files in the \fBOBITools\fP extended
fasta format.
.UNINDENT
.INDENT 0.0
.TP
.B \-a <ANCESTOR>, \-\-restricting_ancestor=<ANCESTOR>
Enables to restrict the search of \fItaxids\fP under a specified ancestor.
.sp
\fB<ANCESTOR>\fP can be a \fItaxid\fP (integer) or a key (string).
.INDENT 7.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
If it is a \fItaxid\fP, this \fItaxid\fP is used to restrict the search for all the sequence
records.
.IP \(bu 2
If it is a key, \fI\%obiaddtaxids\fP looks for the ancestor \fItaxid\fP in the
corresponding attribute. This allows having a different ancestor restriction
for each sequence record.
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B \-g <FILENAME>, \-\-genus_found=<FILENAME>
File used to store sequences with a match found for the genus.
.sp
\fBCAUTION:\fP
.INDENT 7.0
.INDENT 3.5
this option is not valid with the UNITE format.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B \-u <FILENAME>, \-\-unidentified=<FILENAME>
File used to store sequences with no taxonomic match found.
.UNINDENT
.SH TAXONOMY RELATED OPTIONS
.INDENT 0.0
.TP
.B \-d <FILENAME>, \-\-database=<FILENAME>
ecoPCR taxonomy Database name
.UNINDENT
.INDENT 0.0
.TP
.B \-t <FILENAME>, \-\-taxonomy\-dump=<FILENAME>
NCBI Taxonomy dump repository name
.UNINDENT
.SH COMMON OPTIONS
.INDENT 0.0
.TP
.B \-h, \-\-help
Shows this help message and exits.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-DEBUG
Sets logging in debug mode.
.UNINDENT
.SH OBIADDTAXIDS ADDED SEQUENCE ATTRIBUTE
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
taxid
.UNINDENT
.UNINDENT
.UNINDENT
.SH AUTHOR
The OBITools Development Team - LECA
.SH COPYRIGHT
2019 - 2015, OBITool Development Team
.\" Generated by docutils manpage writer.
.