.TH SUMATRA "1" "August 2015" "sumatra 1.0.03" "User Commands" .SH NAME sumatra \- fast and exact comparison and clustering of sequences .SH SYNOPSIS .B sumatra \fI[options] [dataset2]\fR .SH DESCRIPTION Sumatra computes all the pairwise LCS (Longest Common Subsequence) scores of one nucleotide dataset or between two nucleotide datasets. .SH OPTIONS .TP \fB\-h\fR [H]elp \- print help .TP \fB\-l\fR Reference sequence length is the shortest. .TP \fB\-L\fR Reference sequence length is the largest. .TP \fB\-a\fR Reference sequence length is the alignment length (default). .TP \fB\-n\fR Score is normalized by reference sequence length (default). .TP \fB\-r\fR Raw score, not normalized. .TP \fB\-d\fR Score is expressed in distance (default: score is expressed in similarity). .TP \fB\-t\fR \fI##.##\fR Score threshold. If the score is normalized and expressed in similarity (default), it is an identity, e.g. 0.95 for an identity of 95%. If the score is normalized and expressed in distance, it is (1.0 \- identity), e.g. 0.05 for an identity of 95%. If the score is not normalized and expressed in similarity, it is the length of the Longest Common Subsequence. If the score is not normalized and expressed in distance, it is (reference length \- LCS length). .br Only sequence pairs with a similarity above ##.## are printed. Default: 0.00 (no threshold). .TP \fB\-p\fR \fI##\fR Number of threads used for computation (default=1). .TP \fB\-g\fR n's are replaced with a's (default: sequences with n's are discarded). .TP \fB\-x\fR Adds four extra columns with the count and length of both sequences. .TP \fIdataset1\fR (First argument) the nucleotide dataset to analyze .TP \fIdataset2\fR (Second argument) optionally the second nucleotide dataset .SH RESULTS Results table description .br column 1 : Identifier sequence 1 .br column 2 : Identifier sequence 2 .br column 3 : Score .br column 4 : Count of sequence 1 (only with option \fB\-x\fR) .br column 5 : Count of sequence 2 (only with option \fB\-x\fR) .br column 6 : Length of sequence 1 (only with option \fB\-x\fR) .br column 7 : Length of sequence 2 (only with option \fB\-x\fR) .SH SEE ALSO http://metabarcoding.org/sumatra