.TH KMER-MASK "1" "May 2015" "kmer-mask 0~20150520+2004" "User Commands" .SH NAME kmer-mask \- mask and filter set of nucleotide sequences by kmer content .SH SYNOPSIS .B kmer\-mask .RB { \-novel | \-confirmed } .RB [ \-mdb .IR mer-database ] .RB [ \-ms .IR mer-size ] .RB [ \-edb .IR exist-database ] .RB [ \-m .IR min-size ] .RB [ \-e .IR extend-size] .RB [ \-lowthreshold .IR l ] .RB [ \-highthreshold .IR h ] .RB [ \-t .IR threads ] .RB [ \-v ] .RB [ \-h .IR histogram ] .RB [ \-promote | \-demote | \-discard ] .BI \-1 \0in.1.fastq .RB [ \-2 .IR in.2.fastq ] .BI \-o \0output-prefix .SH DESCRIPTION Mask and filter set of sequences (presumed to be reads) by kmer content. Masking can be done to retain novel sequence not in the database, or to retain confirmed sequence present in the database. Filtering will segregate sequences fully, partially or not masked. .SH OPTIONS .TP .BI \-mdb \0mer\-database load masking kmers from .BR meryl (1) .I mer\-database .TP .BI \-ms \0mer\-size .TP .BI \-edb \0exist\-database save masking kmers to an .BR existDB (1) file .I exist\-database for faster restarts .TP .BI \-1 \0in.1.fastq .TP .BI \-2 \0in.2.fastq input reads files in fastq, fastq.gz, fastq.bz2 or fastq.xz format. The second is optional, but messes up the output classification if not present. .TP .BI \-o \0out .RS prefix for output reads .TP \fIout\fR.fullymasked.[12].fastq reads with below 'lowthreshold' bases retained .TP .IR out .partiallymasked.[12].fastq reads in between .TP .IR out .retained.[12].fastq reads with more than 'hightreshold' bases retained .TP .IR out .discarded.[12].fastq reads with conflicting status .RE .TP .BI \-m \0min\-size ignore database hits below this many consecutive kmers (0) .TP .BI \-e \0extend\-size extend database hits across this many missing kmers (0) .TP .B \-novel RETAIN novel sequence not present in the database .TP .B \-confirmed RETAIN confirmed sequence present in the database .TP .B \-promote promote the lesser RETAINED read to the status of the more RETAINED read read1=fullymasked and read2=partiallymasked \-> both are partiallymasked .TP .B \-demote demote the more RETAINED read to the status of the lesser RETAINED read read1=fullymasked and read2=partiallymasked \-> both are fullymasked .TP .B \-discard discard pairs with conflicting status (DEFAULT) read1=fullymasked and read2=partiallymasked \-> both are discarded .SS "stats on stderr, number of sequences with amount RETAINED:" .TP .BI \-lowthreshold \0t (0.3333) .TP .BI \-highthreshold \0t (0.6667) .TP .BI \-h \0histogram write a histogram of the amount of sequence RETAINED .TP .BI \-t \0t use .I t compute threads .TP \fB\-v\fR show progress .SH SEE ALSO .BR meryl (1) .BR existDB (1)