Scroll to navigation

MASH-SCREEN(1)   MASH-SCREEN(1)

NAME

mash-screen - determine whether query sequences are within a larger pool of sequences

SYNOPSIS

mash screen [options] <queries>.msh <pool> [<pool>] ...

DESCRIPTION

Determine how well query sequences are contained within a pool of sequences. The queries must be formatted as a single Mash sketch file (.msh), created with the mash sketch command. The <pool> files can be contigs or reads, in fasta or fastq, gzipped or not, and "-" can be given for <pool> to read from standard input. The <pool> sequences are assumed to be nucleotides, and will be 6-frame translated if the <queries> are amino acids. The output fields are [identity, shared-hashes, median-multiplicity, p-value, query-ID, query-comment], where median-multiplicity is computed for shared hashes, based on the number of observations of those hashes within the pool.

OPTIONS

-h

Help

-p <int>

Parallelism. This many threads will be spawned for processing.

-w

Winner-takes-all strategy for identity estimates. After counting hashes for each query, hashes that appear in multiple queries will be removed from all except the one with the best identity (ties broken by larger query), and other identities will be reduced. This removes output redundancy, providing a rough compositional outline.

Output

-i <num>

Minimum identity to report. Inclusive unless set to zero, in which case only identities greater than zero (i.e. with at least one shared hash) will be reported. Set to -1 to output everything.

-v <num>

Maximum p-value to report.

SEE ALSO

mash(1)

2019-12-13