.TH MASON_FRAG_SEQUENCING 1 "" "mason_frag_sequencing 2.0.9 [tarball]" "" .SH NAME mason_frag_sequencing \- Fragment Sequencing Simulation .SH SYNOPSIS \fBmason_frag_sequencing\fP [\fIOPTIONS\fP] \fB-i\fP \fIIN.fa\fP \fB-o\fP \fIOUT.{fa,fq}\fP [\fB-or\fP \fIOUT2.{fa,fq}\fP] .SH DESCRIPTION Given a FASTA file with fragments, simulate sequencing thereof. .sp This program is a more lightweight version of mason_sequencing without support for the application of VCF and fragment sampling. Output of SAM is also not available. However, it uses the same code for the simulation of the reads as the more powerful mason_simulator. .sp You can use mason_frag_sequencing if you want to implement you rown fragmentation behaviour, e.g. if you have implemented your own bias models. .SH OPTIONS .TP \fB-h\fP, \fB--help\fP Display the help message. .TP \fB--version\fP Display version information. .TP \fB-q\fP, \fB--quiet\fP Low verbosity. .TP \fB-v\fP, \fB--verbose\fP Higher verbosity. .TP \fB-vv\fP, \fB--very-verbose\fP Highest verbosity. .TP \fB--seed\fP \fIINTEGER\fP Seed to use for random number generator. Default: \fI0\fP. .TP \fB-i\fP, \fB--in\fP \fIINPUT_FILE\fP Path to input file. Valid filetypes are: \fI.sam[.*]\fP, \fI.raw[.*]\fP, \fI.gbk[.*]\fP, \fI.frn[.*]\fP, \fI.fq[.*]\fP, \fI.fna[.*]\fP, \fI.ffn[.*]\fP, \fI.fastq[.*]\fP, \fI.fasta[.*]\fP, \fI.faa[.*]\fP, \fI.fa[.*]\fP, \fI.embl[.*]\fP, and \fI.bam\fP, where * is any of the following extensions: \fIgz\fP, \fIbz2\fP, and \fIbgzf\fP for transparent (de)compression. .TP \fB-o\fP, \fB--out\fP \fIOUTPUT_FILE\fP Output of single-end/left end reads. Valid filetypes are: \fI.sam[.*]\fP, \fI.raw[.*]\fP, \fI.frn[.*]\fP, \fI.fq[.*]\fP, \fI.fna[.*]\fP, \fI.ffn[.*]\fP, \fI.fastq[.*]\fP, \fI.fasta[.*]\fP, \fI.faa[.*]\fP, \fI.fa[.*]\fP, and \fI.bam\fP, where * is any of the following extensions: \fIgz\fP, \fIbz2\fP, and \fIbgzf\fP for transparent (de)compression. .TP \fB-or\fP, \fB--out-right\fP \fIOUTPUT_FILE\fP Output of right reads. Giving this options enables paired-end simulation. Valid filetypes are: \fI.sam[.*]\fP, \fI.raw[.*]\fP, \fI.frn[.*]\fP, \fI.fq[.*]\fP, \fI.fna[.*]\fP, \fI.ffn[.*]\fP, \fI.fastq[.*]\fP, \fI.fasta[.*]\fP, \fI.faa[.*]\fP, \fI.fa[.*]\fP, and \fI.bam\fP, where * is any of the following extensions: \fIgz\fP, \fIbz2\fP, and \fIbgzf\fP for transparent (de)compression. .TP \fB--force-single-end\fP Force single-end simulation although --out-right is given. .SS Global Read Simulation Options: .TP \fB--seq-technology\fP \fISTRING\fP Set sequencing technology to simulate. One of \fIillumina\fP, \fI454\fP, and \fIsanger\fP. Default: \fIillumina\fP. .TP \fB--seq-mate-orientation\fP \fISTRING\fP Orientation for paired reads. See section Read Orientation below. One of \fIFR\fP, \fIRF\fP, \fIFF\fP, and \fIFF2\fP. Default: \fIFR\fP. .TP \fB--seq-strands\fP \fISTRING\fP Strands to simulate from, only applicable to paired sequencing simulation. One of \fIforward\fP, \fIreverse\fP, and \fIboth\fP. Default: \fIboth\fP. .TP \fB--embed-read-info\fP Whether or not to embed read information. .TP \fB--read-name-prefix\fP \fISTRING\fP Read names will have this prefix. Default: \fIsimulated.\fP. .SS BS-Seq Options: .TP \fB--enable-bs-seq\fP Enable BS-seq simulation. .TP \fB--bs-seq-protocol\fP \fISTRING\fP Protocol to use for BS-Seq simulation. One of \fIdirectional\fP and \fIundirectional\fP. Default: \fIdirectional\fP. .TP \fB--bs-seq-conversion-rate\fP \fIDOUBLE\fP Conversion rate for unmethylated Cs to become Ts. In range [0..1]. Default: \fI0.99\fP. .SS Illumina Options: .TP \fB--illumina-read-length\fP \fIINTEGER\fP Read length for Illumina simulation. In range [1..inf]. Default: \fI100\fP. .TP \fB--illumina-error-profile-file\fP \fIINPUT_FILE\fP Path to file with Illumina error profile. The file must be a text file with floating point numbers separated by space, each giving a positional error rate. Valid filetype is: \fI.txt\fP. .TP \fB--illumina-prob-insert\fP \fIDOUBLE\fP Insert per-base probability for insertion in Illumina sequencing. In range [0..1]. Default: \fI0.00005\fP. .TP \fB--illumina-prob-deletion\fP \fIDOUBLE\fP Insert per-base probability for deletion in Illumina sequencing. In range [0..1]. Default: \fI0.00005\fP. .TP \fB--illumina-prob-mismatch-scale\fP \fIDOUBLE\fP Scaling factor for Illumina mismatch probability. In range [0..inf]. Default: \fI1.0\fP. .TP \fB--illumina-prob-mismatch\fP \fIDOUBLE\fP Average per-base mismatch probability in Illumina sequencing. In range [0.0..1.0]. Default: \fI0.004\fP. .TP \fB--illumina-prob-mismatch-begin\fP \fIDOUBLE\fP Per-base mismatch probability of first base in Illumina sequencing. In range [0.0..1.0]. Default: \fI0.002\fP. .TP \fB--illumina-prob-mismatch-end\fP \fIDOUBLE\fP Per-base mismatch probability of last base in Illumina sequencing. In range [0.0..1.0]. Default: \fI0.012\fP. .TP \fB--illumina-position-raise\fP \fIDOUBLE\fP Point where the error curve raises in relation to read length. In range [0.0..1.0]. Default: \fI0.66\fP. .TP \fB--illumina-quality-mean-begin\fP \fIDOUBLE\fP Mean PHRED quality for non-mismatch bases of first base in Illumina sequencing. Default: \fI40.0\fP. .TP \fB--illumina-quality-mean-end\fP \fIDOUBLE\fP Mean PHRED quality for non-mismatch bases of last base in Illumina sequencing. Default: \fI39.5\fP. .TP \fB--illumina-quality-stddev-begin\fP \fIDOUBLE\fP Standard deviation of PHRED quality for non-mismatch bases of first base in Illumina sequencing. Default: \fI0.05\fP. .TP \fB--illumina-quality-stddev-end\fP \fIDOUBLE\fP Standard deviation of PHRED quality for non-mismatch bases of last base in Illumina sequencing. Default: \fI10.0\fP. .TP \fB--illumina-mismatch-quality-mean-begin\fP \fIDOUBLE\fP Mean PHRED quality for mismatch bases of first base in Illumina sequencing. Default: \fI40.0\fP. .TP \fB--illumina-mismatch-quality-mean-end\fP \fIDOUBLE\fP Mean PHRED quality for mismatch bases of last base in Illumina sequencing. Default: \fI30.0\fP. .TP \fB--illumina-mismatch-quality-stddev-begin\fP \fIDOUBLE\fP Standard deviation of PHRED quality for mismatch bases of first base in Illumina sequencing. Default: \fI3.0\fP. .TP \fB--illumina-mismatch-quality-stddev-end\fP \fIDOUBLE\fP Standard deviation of PHRED quality for mismatch bases of last base in Illumina sequencing. Default: \fI15.0\fP. .TP \fB--illumina-left-template-fastq\fP \fIINPUT_FILE\fP FASTQ file to use for a template for left-end reads. Valid filetypes are: \fI.sam[.*]\fP, \fI.raw[.*]\fP, \fI.gbk[.*]\fP, \fI.frn[.*]\fP, \fI.fq[.*]\fP, \fI.fna[.*]\fP, \fI.ffn[.*]\fP, \fI.fastq[.*]\fP, \fI.fasta[.*]\fP, \fI.faa[.*]\fP, \fI.fa[.*]\fP, \fI.embl[.*]\fP, and \fI.bam\fP, where * is any of the following extensions: \fIgz\fP, \fIbz2\fP, and \fIbgzf\fP for transparent (de)compression. .TP \fB--illumina-right-template-fastq\fP \fIINPUT_FILE\fP FASTQ file to use for a template for right-end reads. Valid filetypes are: \fI.sam[.*]\fP, \fI.raw[.*]\fP, \fI.gbk[.*]\fP, \fI.frn[.*]\fP, \fI.fq[.*]\fP, \fI.fna[.*]\fP, \fI.ffn[.*]\fP, \fI.fastq[.*]\fP, \fI.fasta[.*]\fP, \fI.faa[.*]\fP, \fI.fa[.*]\fP, \fI.embl[.*]\fP, and \fI.bam\fP, where * is any of the following extensions: \fIgz\fP, \fIbz2\fP, and \fIbgzf\fP for transparent (de)compression. .SS Sanger Sequencing Options: .TP \fB--sanger-read-length-model\fP \fISTRING\fP The model to use for sampling the Sanger read length. One of \fInormal\fP and \fIuniform\fP. Default: \fInormal\fP. .TP \fB--sanger-read-length-min\fP \fIINTEGER\fP The minimal read length when the read length is sampled uniformly. In range [0..inf]. Default: \fI400\fP. .TP \fB--sanger-read-length-max\fP \fIINTEGER\fP The maximal read length when the read length is sampled uniformly. In range [0..inf]. Default: \fI600\fP. .TP \fB--sanger-read-length-mean\fP \fIDOUBLE\fP The mean read length when the read length is sampled with normal distribution. In range [0..inf]. Default: \fI400\fP. .TP \fB--sanger-read-length-error\fP \fIDOUBLE\fP The read length standard deviation when the read length is sampled uniformly. In range [0..inf]. Default: \fI40\fP. .TP \fB--sanger-prob-mismatch-scale\fP \fIDOUBLE\fP Scaling factor for Sanger mismatch probability. In range [0..inf]. Default: \fI1.0\fP. .TP \fB--sanger-prob-mismatch-begin\fP \fIDOUBLE\fP Per-base mismatch probability of first base in Sanger sequencing. In range [0.0..1.0]. Default: \fI0.005\fP. .TP \fB--sanger-prob-mismatch-end\fP \fIDOUBLE\fP Per-base mismatch probability of last base in Sanger sequencing. In range [0.0..1.0]. Default: \fI0.001\fP. .TP \fB--sanger-prob-insertion-begin\fP \fIDOUBLE\fP Per-base insertion probability of first base in Sanger sequencing. In range [0.0..1.0]. Default: \fI0.0025\fP. .TP \fB--sanger-prob-insertion-end\fP \fIDOUBLE\fP Per-base insertion probability of last base in Sanger sequencing. In range [0.0..1.0]. Default: \fI0.005\fP. .TP \fB--sanger-prob-deletion-begin\fP \fIDOUBLE\fP Per-base deletion probability of first base in Sanger sequencing. In range [0.0..1.0]. Default: \fI0.0025\fP. .TP \fB--sanger-prob-deletion-end\fP \fIDOUBLE\fP Per-base deletion probability of last base in Sanger sequencing. In range [0.0..1.0]. Default: \fI0.005\fP. .TP \fB--sanger-quality-match-start-mean\fP \fIDOUBLE\fP Mean PHRED quality for non-mismatch bases of first base in Sanger sequencing. Default: \fI40.0\fP. .TP \fB--sanger-quality-match-end-mean\fP \fIDOUBLE\fP Mean PHRED quality for non-mismatch bases of last base in Sanger sequencing. Default: \fI39.5\fP. .TP \fB--sanger-quality-match-start-stddev\fP \fIDOUBLE\fP Mean PHRED quality for non-mismatch bases of first base in Sanger sequencing. Default: \fI0.1\fP. .TP \fB--sanger-quality-match-end-stddev\fP \fIDOUBLE\fP Mean PHRED quality for non-mismatch bases of last base in Sanger sequencing. Default: \fI2\fP. .TP \fB--sanger-quality-error-start-mean\fP \fIDOUBLE\fP Mean PHRED quality for erroneous bases of first base in Sanger sequencing. Default: \fI30\fP. .TP \fB--sanger-quality-error-end-mean\fP \fIDOUBLE\fP Mean PHRED quality for erroneous bases of last base in Sanger sequencing. Default: \fI20\fP. .TP \fB--sanger-quality-error-start-stddev\fP \fIDOUBLE\fP Mean PHRED quality for erroneous bases of first base in Sanger sequencing. Default: \fI2\fP. .TP \fB--sanger-quality-error-end-stddev\fP \fIDOUBLE\fP Mean PHRED quality for erroneous bases of last base in Sanger sequencing. Default: \fI5\fP. .SS 454 Sequencing Options: .TP \fB--454-read-length-model\fP \fISTRING\fP The model to use for sampling the 454 read length. One of \fInormal\fP and \fIuniform\fP. Default: \fInormal\fP. .TP \fB--454-read-length-min\fP \fIINTEGER\fP The minimal read length when the read length is sampled uniformly. In range [0..inf]. Default: \fI10\fP. .TP \fB--454-read-length-max\fP \fIINTEGER\fP The maximal read length when the read length is sampled uniformly. In range [0..inf]. Default: \fI600\fP. .TP \fB--454-read-length-mean\fP \fIDOUBLE\fP The mean read length when the read length is sampled with normal distribution. In range [0..inf]. Default: \fI400\fP. .TP \fB--454-read-length-stddev\fP \fIDOUBLE\fP The read length standard deviation when the read length is sampled with normal distribution. In range [0..inf]. Default: \fI40\fP. .TP \fB--454-no-sqrt-in-std-dev\fP For error model, if set then (sigma = k * r)) is used, otherwise (sigma = k * sqrt(r)). .TP \fB--454-proportionality-factor\fP \fIDOUBLE\fP Proportionality factor for calculating the standard deviation proportional to the read length. In range [0..inf]. Default: \fI0.15\fP. .TP \fB--454-background-noise-mean\fP \fIDOUBLE\fP Mean of lognormal distribution to use for the noise. In range [0..inf]. Default: \fI0.23\fP. .TP \fB--454-background-noise-stddev\fP \fIDOUBLE\fP Standard deviation of lognormal distribution to use for the noise. In range [0..inf]. Default: \fI0.15\fP. .SH SEQUENCING SIMULATION Simulation of base qualities is disabled when writing out FASTA files. Simulation of paired-end sequencing is enabled when specifying two output files. .SH READ ORIENTATION You can use the \fB--mate-orientation\fP to set the relative orientation when doing paired-end sequencing. The valid values are given in the following. .TP FR Reads are inward-facing, the same as Illumina paired-end reads: R1 --> <-- R2. .TP RF Reads are outward-facing, the same as Illumina mate-pair reads: R1 <-- --> R2. .TP FF Reads are on the same strand: R1 --> --> R2. .TP FF2 Reads are on the same strand but the "right" reads are sequenced to the left of the "left" reads, same as 454 paired: R2 --> --> R1.