.\" Man page generated from reStructuredText. . .TH "OBISAMPLE" "1" "Jul 27, 2019" " 1.02 13" "OBITools" .SH NAME obisample \- description of obisample . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .sp \fI\%obisample\fP randomly resamples sequence records with or without replacement. .SH OBISAMPLE SPECIFIC OPTIONS .INDENT 0.0 .TP .B \-s ###, \-\-sample\-size ### .INDENT 7.0 .INDENT 3.5 Specifies the size of the generated sample. .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 without the \fB\-a\fP option, sample size is expressed as the exact number of sequence records to be sampled (default: number of sequence records in the input file). .IP \(bu 2 with the \fB\-a\fP option, sample size is expressed as a fraction of the sequence record numbers in the input file (expressed as a number between 0 and 1). .UNINDENT .UNINDENT .UNINDENT .UNINDENT .UNINDENT .sp \fIExample:\fP .INDENT 7.0 .INDENT 3.5 .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C > obisample \-s 1000 seq1.fasta > seq2.fasta .ft P .fi .UNINDENT .UNINDENT .sp Samples randomly 1000 sequence records from the \fBseq1.fasta\fP file, with replacement, and saves them in the \fBseq2.fasta\fP file. .UNINDENT .UNINDENT .UNINDENT .INDENT 0.0 .TP .B \-a, \-\-approx\-sampling .INDENT 7.0 .INDENT 3.5 Switches the resampling algorithm to an approximative one, useful for large files. .sp The default algorithm selects exactly the number of sequence records specified with the \fB\-s\fP option. When the \fB\-a\fP option is set, each sequence record has a probability to be selected related to the \fBcount\fP attribute of the sequence record and the \fB\-s\fP fraction. .UNINDENT .UNINDENT .sp \fIExample:\fP .INDENT 7.0 .INDENT 3.5 .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C > obisample \-s 0.5 \-a seq1.fastq > seq2.fastq .ft P .fi .UNINDENT .UNINDENT .sp Samples randomly half of the sequence records of the \fBseq1.fastq\fP file, without replacement, and saves them in the \fBseq2.fastq\fP file. .UNINDENT .UNINDENT .UNINDENT .INDENT 0.0 .TP .B \-w, \-\-without\-replacement .INDENT 7.0 .INDENT 3.5 Asks for sampling without replacement. .UNINDENT .UNINDENT .sp \fIExample:\fP .INDENT 7.0 .INDENT 3.5 .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C > obisample \-s 1000 \-w seq1.fasta > seq2.fasta .ft P .fi .UNINDENT .UNINDENT .sp Samples randomly 1000 sequence records from the \fBseq1.fasta\fP file, without replacement (the input file must contain at least 1000 sequences), and saves them in the \fBseq2.fasta\fP file. .UNINDENT .UNINDENT .UNINDENT .SH OPTIONS TO SPECIFY INPUT FORMAT .SS Restrict the analysis to a sub\-part of the input file .INDENT 0.0 .TP .B \-\-skip The N first sequence records of the file are discarded from the analysis and not reported to the output file .UNINDENT .INDENT 0.0 .TP .B \-\-only Only the N next sequence records of the file are analyzed. The following sequences in the file are neither analyzed, neither reported to the output file. This option can be used conjointly with the \fIā€“skip\fP option. .UNINDENT .SS Sequence annotated format .INDENT 0.0 .TP .B \-\-genbank Input file is in genbank format. .UNINDENT .INDENT 0.0 .TP .B \-\-embl Input file is in embl format. .UNINDENT .SS fasta related format .INDENT 0.0 .TP .B \-\-fasta Input file is in fasta format (including OBITools fasta extensions). .UNINDENT .SS fastq related format .INDENT 0.0 .TP .B \-\-sanger Input file is in Sanger fastq format (standard fastq used by HiSeq/MiSeq sequencers). .UNINDENT .INDENT 0.0 .TP .B \-\-solexa Input file is in fastq format produced by Solexa (Ga IIx) sequencers. .UNINDENT .SS ecoPCR related format .INDENT 0.0 .TP .B \-\-ecopcr Input file is in ecoPCR format. .UNINDENT .INDENT 0.0 .TP .B \-\-ecopcrdb Input is an ecoPCR database. .UNINDENT .SS Specifying the sequence type .INDENT 0.0 .TP .B \-\-nuc Input file contains nucleic sequences. .UNINDENT .INDENT 0.0 .TP .B \-\-prot Input file contains protein sequences. .UNINDENT .SH COMMON OPTIONS .INDENT 0.0 .TP .B \-h, \-\-help Shows this help message and exits. .UNINDENT .INDENT 0.0 .TP .B \-\-DEBUG Sets logging in debug mode. .UNINDENT .SH OBISAMPLE USED SEQUENCE ATTRIBUTE .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 count .UNINDENT .UNINDENT .UNINDENT .SH AUTHOR The OBITools Development Team - LECA .SH COPYRIGHT 2019 - 2015, OBITool Development Team .\" Generated by docutils manpage writer. .