Scroll to navigation
KMER-MASK(1) |
User Commands |
KMER-MASK(1) |
NAME¶
kmer-mask - mask and filter set of nucleotide sequences by kmer content
SYNOPSIS¶
kmer-mask {-novel|-confirmed} [-mdb
mer-database] [-ms mer-size] [-edb
exist-database] [-m min-size] [-e
extend-size] [-lowthreshold l] [-highthreshold
h] [-t threads] [-v] [-h histogram]
[-promote|-demote|-discard]
-1 in.1.fastq [-2 in.2.fastq]
-o output-prefix
DESCRIPTION¶
Mask and filter set of sequences (presumed to be reads) by kmer content. Masking
can be done to retain novel sequence not in the database, or to retain
confirmed sequence present in the database. Filtering will segregate sequences
fully, partially or not masked.
OPTIONS¶
- -mdb mer-database
- load masking kmers from meryl(1) mer-database
- -ms mer-size
- -edb exist-database
- save masking kmers to an existDB(1) file exist-database for
faster restarts
- -1 in.1.fastq
- -2 in.2.fastq
- input reads files in fastq, fastq.gz, fastq.bz2 or fastq.xz format. The
second is optional, but messes up the output classification if not
present.
- -o out
prefix for output reads
- out.fullymasked.[12].fastq
- reads with below 'lowthreshold' bases retained
- out.partiallymasked.[12].fastq
- reads in between
- out.retained.[12].fastq
- reads with more than 'hightreshold' bases retained
- out.discarded.[12].fastq
- reads with conflicting status
- -m min-size
- ignore database hits below this many consecutive kmers (0)
- -e extend-size
- extend database hits across this many missing kmers (0)
- -novel
- RETAIN novel sequence not present in the database
- -confirmed
- RETAIN confirmed sequence present in the database
- -promote
- promote the lesser RETAINED read to the status of the more RETAINED read
read1=fullymasked and read2=partiallymasked -> both are
partiallymasked
- -demote
- demote the more RETAINED read to the status of the lesser RETAINED read
read1=fullymasked and read2=partiallymasked -> both are
fullymasked
- -discard
- discard pairs with conflicting status (DEFAULT) read1=fullymasked and
read2=partiallymasked -> both are discarded
stats on stderr, number of sequences with amount RETAINED:¶
- -lowthreshold t
- (0.3333)
- -highthreshold t
- (0.6667)
- -h histogram
- write a histogram of the amount of sequence RETAINED
- -t t
- use t compute threads
- -v
- show progress