DESCRIPTION¶
-c [string]
cognate sequence (encoded using gt encseq encode)
-map [string]
mapping of reads to the cognate sequence it must be in
SAM/BAM format, and sorted by coordinate (can be prepared e.g. using: samtools
sort)
-sam [yes|no]
mapping file is SAM default: BAM
-aggressive [yes|no]
correct as much as possible
-moderate [yes|no]
mediate between sensitivity and precision
-conservative [yes|no]
correct only most likely errors
-expert [yes|no]
manually select correction criteria
-reads
uncorrected read file(s) in FastQ format; the corrected
reads are output in the currect working directory in files which are named as
the input files, each prepended by a prefix (see -outprefix option) -reads
allows one to output the reads in the same order as in the input and is
mandatory if the SAM contains more than a single primary alignment for each
read (e.g. output of bwasw) see also -o option as an alternative
-outprefix [string]
prefix for output filenames (corrected reads)when -reads
is specified the prefix is prepended to each input filename (default:
hop_)
-o [string]
output file for corrected reads (see also
-reads/-outprefix) if -o is used, reads are output in a single file in the
order they are found in the SAM file (which usually differ from the original
order) this will only work if the reads were aligned with a software which
only includes 1 alignment for each read (e.g. bwa) (default: undefined)
-hmin [value]
minimal homopolymer length in cognate sequence (default:
3)
-read-hmin [value]
minimal homopolymer length in reads (default: 2)
-qmax [value]
maximal average quality of homopolymer in a read
(default: 120)
-altmax [value]
max support of alternate homopol. length; e.g. 0.8 means:
do not correct any read if homop. length in more than 80%% of the reads has
the same value, different from the cognate if altmax is set to 1.0 reads are
always corrected (default: 0.800000)
-cogmin [value]
min support of cognate sequence homopol. length; e.g. 0.1
means: do not correct any read if cognate homop. length is not present in at
least 10%% of the reads if cogmin is set to 0.0 reads are always
corrected
-mapqmin [value]
minimal mapping quality (default: 21)
-covmin [value]
minimal coverage; e.g. 5 means: do not correct any read
if coverage (number of reads mapped over whole homopolymer) is less than 5 if
covmin is set to 1 reads are always corrected (default: 1)
-allow-muliple [yes|no]
allow multiple corrections in a read (default: no)
-clenmax [value]
maximal correction length default: unlimited
-ann [string]
annotation of cognate sequence it must be sorted by
coordinates on the cognate sequence (this can be e.g. done using: gt gff3
-sort) if -ann is used, corrections will be limited to homopolymers startingor
ending inside the feature type indicated by -ft optionformat: sorted GFF3
(default: undefined)
-ft [string]
feature type to use when -ann option is specified
(default: CDS)
-v [yes|no]
be verbose (default: no)
-help
display help for basic options and exit
-help+
display help for all options and exit
-version
display version information and exit
Correction mode:
One of the options -aggressive, -moderate,
-conservative or -expert must be selected.
The -aggressive, -moderate and -conservative
modes are presets of the criteria by which it is decided if an observed
discrepancy in homopolymer length between cognate sequence and a read shall
be corrected or not. A description of the single criteria is provided by
using the -help+' option. The presets are equivalent to the following
settings:
-aggressive -moderate -conservative
-hmin 3 3 3
-read-hmin 1 1 2
-altmax 1.00 0.99 0.80
-refmin 0.00 0.00 0.10
-mapqmin 0 10 21
-covmin 1 1 1
-clenmax unlimited unlimited unlimited
-allow-multiple yes yes no
The aggressive mode tries to maximize the sensitivity, the
conservative mode to minimize the false positives. An even more conservative
set of corrections can be achieved using the -ann option (see
-help+).
The -expert mode allows one to manually set each parameter;
the default values are the same as in the -conservative mode.
(Finally, for evaluation purposes only, the -state-of-truth
mode can be used: this mode assumes that the sequenced genome has been
specified as cognate sequence and outputs an ideal list of corrections.)