.TH RABEMA_BUILD_GOLD_STANDARD 1 "" "rabema_build_gold_standard 1.2.9 [tarball]" "" .SH NAME rabema_build_gold_standard \- RABEMA Gold Standard Builder .SH SYNOPSIS \fBrabema_build_gold_standard\fP [\fIOPTIONS\fP] \fB--out-gsi\fP \fIOUT.gsi\fP \fB--reference\fP \fIREF.fa\fP \fB--in-bam\fP \fIPERFECT.{sam,bam}\fP .SH DESCRIPTION This program allows one to build a RABEMA gold standard. The input is a reference FASTA file and a perfect SAM/BAM map (e.g. created using RazerS 3 in full-sensitivity mode). .sp The input SAM/BAM file must be \fIsorted by coordinate\fP. The program will create a FASTA index file \fIREF.fa.fai\fP for fast random access to the reference. .SH OPTIONS .TP \fB-h\fP, \fB--help\fP Display the help message. .TP \fB--version\fP Display version information. .TP \fB-v\fP, \fB--verbose\fP Enable verbose output. .TP \fB-vv\fP, \fB--very-verbose\fP Enable even more verbose output. .SS Input / Output: .TP \fB-o\fP, \fB--out-gsi\fP \fIOUTPUT_FILE\fP Path to write the resulting GSI file to. Valid filetype is: \fI.gsi[.*]\fP, where * is any of the following extensions: \fIgz\fP for transparent (de)compression. .TP \fB-r\fP, \fB--reference\fP \fIINPUT_FILE\fP Path to load reference FASTA from. Valid filetypes are: \fI.sam[.*]\fP, \fI.raw[.*]\fP, \fI.gbk[.*]\fP, \fI.frn[.*]\fP, \fI.fq[.*]\fP, \fI.fna[.*]\fP, \fI.ffn[.*]\fP, \fI.fastq[.*]\fP, \fI.fasta[.*]\fP, \fI.faa[.*]\fP, \fI.fa[.*]\fP, \fI.embl[.*]\fP, and \fI.bam\fP, where * is any of the following extensions: \fIgz\fP, \fIbz2\fP, and \fIbgzf\fP for transparent (de)compression. .TP \fB-b\fP, \fB--in-bam\fP \fIINPUT_FILE\fP Path to load the "perfect" SAM/BAM file from. Valid filetypes are: \fI.sam[.*]\fP and \fI.bam\fP, where * is any of the following extensions: \fIgz\fP, \fIbz2\fP, and \fIbgzf\fP for transparent (de)compression. .SS Gold Standard Parameters: .TP \fB--oracle-mode\fP Enable oracle mode. This is used for simulated data when the input SAM/BAM file gives exactly one position that is considered as the true sample position. .TP \fB--match-N\fP When set, N matches all characters without penalty. .TP \fB--distance-metric\fP \fISTRING\fP Set distance metric. Valid values: hamming, edit. Default: edit. One of \fIhamming\fP and \fIedit\fP. Default: \fIedit\fP. .TP \fB-e\fP, \fB--max-error\fP \fIINTEGER\fP Maximal error rate to build gold standard for in percent. This parameter is an integer and relative to the read length. In case of oracle mode, the error rate for the read at the sampling position is used and \fIRATE\fP is used as a cutoff threshold. Default: \fI0\fP. .SH RETURN VALUES A return value of 0 indicates success, any other value indicates an error. .SH EXAMPLES .TP \fBrabema_build_gold_standard\fP \fB-e\fP \fI4\fP \fB-o\fP \fIOUT.gsi\fP \fB-s\fP \fIIN.sam\fP \fB-r\fP \fIREF.fa\fP Build gold standard from a SAM file \fIIN.sam\fP with all mapping locations and a FASTA reference \fIREF.fa\fP to GSI file \fIOUT.gsi\fP with a maximal error rate of \fI4\fP. .TP \fBrabema_build_gold_standard\fP \fB--distance-metric\fP \fIedit\fP \fB-e\fP \fI4\fP \fB-o\fP \fIOUT.gsi\fP \fB-b\fP \fIIN.bam\fP \fB-r\fP \fIREF.fa\fP Same as above, but using Hamming instead of edit distance and BAM as the input. .TP \fBrabema_build_gold_standard\fP \fB--oracle-mode\fP \fB-o\fP \fIOUT.gsi\fP \fB-s\fP \fIIN.sam\fP \fB-r\fP \fIREF.fa\fP Build gold standard from a SAM file \fIIN.sam\fP with the original sample position, e.g. as exported by read simulator Mason. .SH MEMORY REQUIREMENTS From version 1.1, great care has been taken to keep the memory requirements as low as possible. There memory required is two times the size of the largest chromosome plus some constant memory for each match. .sp For example, the memory usage for 100bp human genome reads at 5% error rate was 1.7GB. Of this, roughly 400GB came from the chromosome and 1.3GB from the matches. .SH REFERENCES M. Holtgrewe, A.-K. Emde, D. Weese and K. Reinert. A Novel And Well-Defined Benchmarking Method For Second Generation Read Mapping, BMC Bioinformatics 2011, 12:210. .TP \fIhttp://www.seqan.de/rabema\fP RABEMA Homepage .TP \fIhttp://www.seqan.de/mason\fP Mason Homepage