Scroll to navigation

RACON(1) User Commands RACON(1)

NAME

racon - consensus module for raw de novo DNA assembly of long uncorrected reads

SYNPOPSIS

racon [options ...] <sequences> <overlaps> <target sequences>

DESCRIPTION

Racon is intended as a standalone consensus module to correct raw contigs generated by rapid assembly methods which do not include a consensus step. The goal of Racon is to generate genomic consensus which is of similar or better quality compared to the output generated by assembly methods which employ both error correction and consensus steps, while providing a speedup of several times compared to those methods. It supports data produced by both Pacific Biosciences and Oxford Nanopore Technologies.

Racon can be used as a polishing tool after the assembly with either Illumina data or data produced by third generation of sequencing. The type of data inputed is automatically detected.

Racon takes as input only three files: contigs in FASTA/FASTQ format, reads in FASTA/FASTQ format and overlaps/alignments between the reads and the contigs in MHAP/PAF/SAM format. Output is a set of polished contigs in FASTA format printed to stdout. All input files can be compressed with gzip.

Racon can also be used as a read error-correction tool. In this scenario, the MHAP/PAF/SAM file needs to contain pairwise overlaps between reads including dual overlaps.

A wrapper script is also available to enable easier usage to the end- user for large datasets. It has the same interface as racon but adds two additional features from the outside. Sequences can be subsampled to decrease the total execution time (accuracy might be lower) while target sequences can be split into smaller chunks and run sequentially to decrease memory consumption. Both features can be run at the same time as well.

OPTIONS

<sequences>

input file in FASTA/FASTQ format (can be compressed with gzip) containing sequences used for correction

<overlaps>

input file in MHAP/PAF/SAM format (can be compressed with gzip) containing overlaps between sequences and target sequences

<target sequences>

input file in FASTA/FASTQ format (can be compressed with gzip) containing sequences which will be corrected

optional

-u, --include-unpolished

output unpolished target sequences

-f, --fragment-correction

perform fragment correction instead of contig polishing (overlaps file should contain dual/self overlaps!)

-w, --window-length <int>

default: 500 size of window on which POA is performed

-q, --quality-threshold <float>

default: 10.0 threshold for average base quality of windows used in POA

-e, --error-threshold <float>

default: 0.3 maximum allowed error rate used for filtering overlaps

-m, --match <int>

default: 5 score for matching bases

-x, --mismatch <int>

default: -4 score for mismatching bases

-g, --gap <int>

default: -8 gap penalty (must be negative)

-t, --threads <int>

default: 1 number of threads

--version

prints the version number

-h, --help

prints the usage

AUTHOR

This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
June 2018 racon 1.3.1