RAZERS3(1)

NAME¶

razers3 - Faster, fully sensitive read mapping

SYNOPSIS¶

razers3 [OPTIONS] <GENOME FILE> <READS FILE>
razers3 [OPTIONS] <GENOME FILE> <PE-READS FILE1> <PE-READS FILE2>

DESCRIPTION¶

RazerS 3 is a versatile full-sensitive read mapper based on k-mer counting and seeding filters. It supports single and paired-end mapping, shared-memory parallelism, and optimally parametrizes the filter based on a user-defined minimal sensitivity. See http://www.seqan.de/projects/razers for more information.

Input to RazerS 3 is a reference genome file and either one file with single-end reads or two files containing left or right mates of paired-end reads. Use - to read single-end reads from stdin.

REQUIRED ARGUMENTS¶

ARGUMENT 0 INPUT_FILE: A reference genome file. Valid filetypes are: .sam[.*], .raw[.*], .gbk[.*], .frn[.*], .fq[.*], .fna[.*], .ffn[.*], .fastq[.*], .fasta[.*], .faa[.*], .fa[.*], .embl[.*], and .bam, where * is any of the following extensions: gz, bz2, and bgzf for transparent (de)compression.
READS List of INPUT_FILE's: Either one (single-end) or two (paired-end) read files. Valid filetypes are: .sam[.*], .raw[.*], .gbk[.*], .frn[.*], .fq[.*], .fna[.*], .ffn[.*], .fastq[.*], .fasta[.*], .faa[.*], .fa[.*], .embl[.*], and .bam, where * is any of the following extensions: gz, bz2, and bgzf for transparent (de)compression.

OPTIONS¶

-h, --help: Display the help message.
--version: Display version information.

Main Options:¶

-i, --percent-identity DOUBLE: Percent identity threshold. In range [50..100]. Default: 95.
-rr, --recognition-rate DOUBLE: Percent recognition rate. In range [80..100]. Default: 100.
-ng, --no-gaps: Allow only mismatches, no indels. Default: allow both.
-f, --forward: Map reads only to forward strands.
-r, --reverse: Map reads only to reverse strands.
-m, --max-hits INTEGER: Output only <NUM> of the best hits. In range [1..inf]. Default: 100.
--unique: Output only unique best matches (-m 1 -dr 0 -pa).
-tr, --trim-reads INTEGER: Trim reads to given length. Default: off. In range [14..inf].
-o, --output OUTPUT_FILE: Mapping result filename (use - to dump to stdout in razers format). Default: <READS FILE>.razers. Valid filetypes are: .sam, .razers, .gff, .fasta, .fa, .eland, .bam, and .afg.
-v, --verbose: Verbose mode.
-vv, --vverbose: Very verbose mode.

Paired-end Options:¶

-ll, --library-length INTEGER: Paired-end library length. In range [1..inf]. Default: 220.
-le, --library-error INTEGER: Paired-end library length tolerance. In range [0..inf]. Default: 50.

Output Format Options:¶

-a, --alignment: Dump the alignment for each match (only razer or fasta format).
-pa, --purge-ambiguous: Purge reads with more than <max-hits> best matches.
-dr, --distance-range INTEGER: Only consider matches with at most NUM more errors compared to the best. Default: output all.
-gn, --genome-naming INTEGER: Select how genomes are named (see Naming section below). In range [0..1]. Default: 0.
-rn, --read-naming INTEGER: Select how reads are named (see Naming section below). In range [0..3]. Default: 0.
--full-readid: Use the whole read id (don't clip after whitespace).
-so, --sort-order INTEGER: Select how matches are sorted (see Sorting section below). In range [0..1]. Default: 0.
-pf, --position-format INTEGER: Select begin/end position numbering (see Coordinate section below). In range [0..1]. Default: 0.
-ds, --dont-shrink-alignments: Disable alignment shrinking in SAM. This is required for generating a gold mapping for Rabema.

Filtration Options:¶

-fl, --filter STRING: Select k-mer filter. One of pigeonhole and swift. Default: pigeonhole.
-mr, --mutation-rate DOUBLE: Set the percent mutation rate (pigeonhole). In range [0..20]. Default: 5.
-ol, --overlap-length INTEGER: Manually set the overlap length of adjacent k-mers (pigeonhole). In range [0..inf].
-pd, --param-dir STRING: Read user-computed parameter files in the directory <DIR> (swift).
-t, --threshold INTEGER: Manually set minimum k-mer count threshold (swift). In range [1..inf].
-tl, --taboo-length INTEGER: Set taboo length (swift). In range [1..inf]. Default: 1.
-s, --shape STRING: Manually set k-mer shape.
-oc, --overabundance-cut INTEGER: Set k-mer overabundance cut ratio. In range [0..1]. Default: 1.
-rl, --repeat-length INTEGER: Skip simple-repeats of length <NUM>. In range [1..inf]. Default: 1000.
-lf, --load-factor DOUBLE: Set the load factor for the open addressing k-mer index. In range [1..inf]. Default: 1.6.

Verification Options:¶

-mN, --match-N: N matches all other characters. Default: N matches nothing.
-ed, --error-distr STRING: Write error distribution to FILE.
-mf, --mismatch-file STRING: Write mismatch patterns to FILE.

Misc Options:¶

-cm, --compact-mult DOUBLE: Multiply compaction threshold by this value after reaching and compacting. In range [0..inf]. Default: 2.2.
-ncf, --no-compact-frac DOUBLE: Don't compact if in this last fraction of genome. In range [0..1]. Default: 0.05.

Parallelism Options:¶

-tc, --thread-count INTEGER: Set the number of threads to use (0 to force sequential mode). In range [0..inf]. Default: 1.
-pws, --parallel-window-size INTEGER: Collect candidates in windows of this length. In range [1..inf]. Default: 500000.
-pvs, --parallel-verification-size INTEGER: Verify candidates in packages of this size. In range [1..inf]. Default: 100.
-pvmpc, --parallel-verification-max-package-count INTEGER: Largest number of packages to create for verification per thread-1. In range [1..inf]. Default: 100.
-amms, --available-matches-memory-size INTEGER: Bytes of main memory available for storing matches. In range [-1..inf]. Default: 0.
-mhst, --match-histo-start-threshold INTEGER: When to start histogram. In range [1..inf]. Default: 5.

FORMATS, NAMING, SORTING, AND COORDINATE SCHEMES¶

RazerS 3 supports various output formats. The output format is detected automatically from the file name suffix.

.razers

Razer format

.fa, .fasta

Enhanced Fasta format

.eland

Eland format

.gff

GFF format

.sam

SAM format

.bam

BAM format

.afg

Amos AFG format

By default, reads and contigs are referred by their Fasta ids given in the input files. With the -gn and -rn options this behaviour can be changed:

0

Use Fasta id.

1

Enumerate beginning with 1.

2

Use the read sequence (only for short reads!).

3

Use the Fasta id, do NOT append /L or /R for mate pairs.

The way matches are sorted in the output file can be changed with the -so option for the following formats: razers, fasta, sam, and afg. Primary and secondary sort keys are:

0

1. read number, 2. genome position

1

1. genome position, 2. read number

The coordinate space used for begin and end positions can be changed with the -pf option for the razer and fasta formats:

0

Gap space. Gaps between characters are counted from 0.

1

Position space. Characters are counted from 1.

EXAMPLES¶

razers3 -i 96 -tc 12 -o mapped.razers hg18.fa reads.fq: Map single-end reads with 4% error rate using 12 threads.
razers3 -i 95 -no-gaps -o mapped.razers hg18.fa reads.fq.gz: Map single-end gzipped reads with 5% error rate and no indels.
razers3 -i 94 -rr 95 -tc 12 -ll 280 --le 80 -o mapped.razers hg18.fa reads_1.fq reads_2.fq: Map paired-end reads with up to 6% errors, 95% sensitivity, 12 threads, and only output aligned pairs with an outer distance of 200-360bp.

razers3 3.5.8 [tarball]

Source file:	razers3.1.en.gz (from seqan-apps 2.4.0+dfsg-15)
Source last updated:	2022-12-06T14:52:21Z
Converted to HTML:	2024-01-15T21:45:10Z