Scroll to navigation

MIRABAIT(1) User Commands MIRABAIT(1)

NAME

mirabait - a 'grep' like tool to select reads with kmers up to 256 bp

SYNOPSIS

mirabait [options] {-b baitfile [-b ...] | -B file | -j joblibrary} {-p file_1 file_2 | -P file3}* [file4 ...]

DESCRIPTION

mirabait selects reads from a read collection which are partly similar or equal to sequences defined as target baits. Similarity is defined by finding a user-adjustable number of common k-mers (sequences of k consecutive bases) which are the same in the bait sequences and the screened sequences to be selected, either in forward or forward/reverse complement direction. Adding a DUST-like repeat filter for repeats up 4 bases is optional.

When used on paired files, selects sequences where at least one mate matches.

OPTIONS

Main options:

Load bait sequences from file (multiple -b allowed)
Load baits from kmer statistics file, not from sequence files. Only one -B allowed, cannot be combined with -b. (see -K for creating such a file)
Set options for predefined job from supplied MIRA library Currently available jobs:
rrna Bait rRNA sequences
Load paired sequences to search from file1 and file2 Files must contain same number of sequences, sequence names must be in same order. Multiple -p allowed, but must come before non-paired files.
Load paired sequences from file File must be interleaved: pairs must follow each other, non-pairs are not allowed. Multiple -p allowed, but must come before non-paired files.
kmer length of bait in bases (<=256, default=31)
If >0: minimum number of k-mer baits needed (default=1) If <=0: allowed number of missed kmers over sequence
length
Do not use kmers with microrepeats (DUST-like, see also -D)
Set length of microrepeats in kmers to discard from bait.
- int > 0 microrepeat len in percentage of kmer length. E.g.: -k 17 -D 67 --> 11.39 bases --> 12 bases.
- int < 0 microrepeat len in bases.
- int != 0 implies -d, int=0 turns DUST filter off.
Selects sequences that do not hit bait
Selects sequences that hit and do not hit bait (to different files)
No checking of reverse complement direction
Number of threads to use (default=0 -> up to 4 CPU cores)

Options for output definition:

Normally mirabait writes separate result files (named 'bait_match_*' and 'bait_miss_*') for each input to the current directory. For changing this behaviour and other relating to output, use these options:

No case change of sequence to denote bait hits
length of a line (FASTA only, default 0=unlimited)
Save kmer statistics to 'file' (see also -B)
Change the prefix 'bait' to <name> Has no effect if -o/-O is used and targets are not directories
Save sequences matching bait to path If path is a directory, write separate files into this directory. If not, combine all matching sequences from the input file(s) into a single file specified by the path.
Like -o, but for sequences not matching

Other options:

Use 'dir' as directory for temporary files instead of current working directory.
Memory to use for computing kmer statistics
0..100 = use percentage of free system memory
>100 = amount of MiB to use (e.g. 16384 for 16 GiB)
Default 75 (75% of free system memory).

Defining files types to load/save:

Normally mirabait recognises the file types according to the file extension (even when packed). In cases you need to force a certain file type because the file extension is non-standard, use the EMBOSS notation to force a type: <filetype>::<name_of_file>. E.g., to tell that "somefile.dat" is FASTQ, use: fastq::somefile.dat Recognised types are: caf, fasta, fastq, gbf, gbk, gbff, maf and phd.

MIRABAIT will write files in the same file type as the corresponding input files. Examples:

SEE ALSO

mira(1), miraconvert(1)

A more extensive documentation is provided in the MIRA manual available online at

http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html

On Debian, this can be installed with the mira-doc package and can then be found at /usr/share/doc/mira-assembler/DefinitiveGuideToMIRA.html. On other systems, you may want to check in /usr/local/share/mira/doc or run "locate DefinitiveGuideToMIRA" to find it locally.

You can also subscribe one of the MIRA mailing lists at

http://www.chevreux.org/mira_mailinglists.html

After subscribing, mail general questions to the MIRA talk mailing list:

mira_talk@freelists.org

BUGS

To report bugs or ask for features, please use the ticketing system at:

http://sourceforge.net/projects/mira-assembler/

AUTHOR

Bastien Chevreux <bach@chevreux.org>

This manual page was written by Bastien Chevreux <bach@chevreux.org> but can be freely used for any documentation purpose.

May 2016 mirabait 5.0.x