.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.2. .TH MIRABAIT "1" "May 2016" "mirabait 5.0.x" "User Commands" .SH NAME mirabait \- a 'grep' like tool to select reads with kmers up to 256 bp .SH SYNOPSIS mirabait [options] {\-b baitfile [\-b ...] | \fB\-B\fR file | \fB\-j\fR joblibrary} {\-p file_1 file_2 | \fB\-P\fR file3}* [file4 ...] .SH DESCRIPTION mirabait selects reads from a read collection which are partly similar or equal to sequences defined as target baits. Similarity is defined by finding a user\-adjustable number of common k\-mers (sequences of k consecutive bases) which are the same in the bait sequences and the screened sequences to be selected, either in forward or forward/reverse complement direction. Adding a DUST\-like repeat filter for repeats up 4 bases is optional. .PP When used on paired files, selects sequences where at least one mate matches. .SH OPTIONS .SS "Main options:" .TP \fB\-b\fR file Load bait sequences from file (multiple \fB\-b\fR allowed) .TP \fB\-B\fR file Load baits from kmer statistics file, not from sequence files. Only one \fB\-B\fR allowed, cannot be combined with \fB\-b\fR. (see \fB\-K\fR for creating such a file) .TP \fB\-j\fR job Set options for predefined job from supplied MIRA library Currently available jobs: .IP rrna Bait rRNA sequences .TP \fB\-p\fR file1 file2 Load paired sequences to search from file1 and file2 Files must contain same number of sequences, sequence names must be in same order. Multiple \fB\-p\fR allowed, but must come before non\-paired files. .TP \fB\-P\fR file Load paired sequences from file File must be interleaved: pairs must follow each other, non\-pairs are not allowed. Multiple \fB\-p\fR allowed, but must come before non\-paired files. .TP \fB\-k\fR int kmer length of bait in bases (<=256, default=31) .TP \fB\-n\fR int If >0: minimum number of k\-mer baits needed (default=1) If <=0: allowed number of missed kmers over sequence .IP length .TP \fB\-d\fR Do not use kmers with microrepeats (DUST\-like, see also \fB\-D\fR) .TP \fB\-D\fR int Set length of microrepeats in kmers to discard from bait. .br \- int > 0 microrepeat len in percentage of kmer length. E.g.: \fB\-k\fR 17 \fB\-D\fR 67 \fB\-\-\fR> 11.39 bases \fB\-\-\fR> 12 bases. .br \- int < 0 microrepeat len in bases. .br \- int != 0 implies \fB\-d\fR, int=0 turns DUST filter off. .TP \fB\-i\fR Selects sequences that do not hit bait .TP \fB\-I\fR Selects sequences that hit and do not hit bait (to different files) .TP \fB\-r\fR No checking of reverse complement direction .TP \fB\-t\fR Number of threads to use (default=0 \-> up to 4 CPU cores) .SS Options for output definition: Normally mirabait writes separate result files (named 'bait_match_*' and \&'bait_miss_*') for each input to the current directory. For changing this behaviour and other relating to output, use these options: .TP \fB\-c\fR No case change of sequence to denote bait hits .TP \fB\-l\fR int length of a line (FASTA only, default 0=unlimited) .TP \fB\-K\fR file Save kmer statistics to 'file' (see also \fB\-B\fR) .TP \fB\-N\fR name Change the prefix 'bait' to Has no effect if \fB\-o\fR/\-O is used and targets are not directories .TP \fB\-o\fR Save sequences matching bait to path If path is a directory, write separate files into this directory. If not, combine all matching sequences from the input file(s) into a single file specified by the path. .TP \fB\-O\fR Like \fB\-o\fR, but for sequences not matching .SS "Other options:" .TP \fB\-T\fR dir Use 'dir' as directory for temporary files instead of current working directory. .TP \fB\-m\fR integer Memory to use for computing kmer statistics .br 0..100 = use percentage of free system memory .br >100 = amount of MiB to use (e.g. 16384 for 16 GiB) .br Default 75 (75% of free system memory). .SH Defining files types to load/save: Normally mirabait recognises the file types according to the file extension (even when packed). In cases you need to force a certain file type because the file extension is non\-standard, use the EMBOSS notation to force a type: ::. E.g., to tell that "somefile.dat" is FASTQ, use: fastq::somefile.dat Recognised types are: caf, fasta, fastq, gbf, gbk, gbff, maf and phd. .PP MIRABAIT will write files in the same file type as the corresponding input files. Examples: .TP mirabait \fB\-b\fR b.fasta file.fastq .TP mirabait \fB\-I\fR \fB\-j\fR rrna \fB\-p\fR file_1.fastq file_2.fastq .TP mirabait \fB\-b\fR b1.fasta \fB\-b\fR b2.gbk file.fastq .TP mirabait \fB\-b\fR fasta::baits.dat \fB\-p\fR fastq::file_1.dat fastq::file_2.dat .TP mirabait \fB\-b\fR b.fasta \fB\-p\fR file_1.fastq file_2.fastq \fB\-P\fR file3.fasta file4.caf .TP mirabait \fB\-I\fR \fB\-b\fR b.fasta \fB\-p\fR file_1.fastq file_2.fastq \fB\-P\fR file3.fasta file4.caf .TP mirabait \fB\-k\fR 27 \fB\-n\fR 10 \fB\-b\fR b.fasta file.fastq .TP mirabait \fB\-b\fR fasta::b.dat fastq::file.dat .TP mirabait \fB\-o\fR /dev/shm/ \fB\-b\fR b.fasta \fB\-p\fR file_1.fastq file_2.fastq .TP mirabait \fB\-o\fR \fI\,/dev/shm/match\/\fP \fB\-b\fR b.fasta \fB\-p\fR file_1.fastq file_2.fastq .TP mirabait \fB\-b\fR human_genome.fasta \fB\-K\fR HG_kmerstats.mhs.gz \fB\-p\fR file1.fastq file2.fastq .TP mirabait \fB\-B\fR HG_kmerstats.mhs.gz \fB\-p\fR file1.fastq file2.fastq .TP mirabait \fB\-d\fR \fB\-B\fR HG_kmerstats.mhs.gz \fB\-p\fR file1.fastq file2.fastq .SH "SEE ALSO" mira(1), miraconvert(1) .PP A more extensive documentation is provided in the MIRA manual available online at .IP http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html .PP On Debian, this can be installed with the mira-doc package and can then be found at /usr/share/doc/mira-assembler/DefinitiveGuideToMIRA.html. On other systems, you may want to check in /usr/local/share/mira/doc or run "locate DefinitiveGuideToMIRA" to find it locally. .PP You can also subscribe one of the MIRA mailing lists at .IP http://www.chevreux.org/mira_mailinglists.html .PP After subscribing, mail general questions to the MIRA talk mailing list: .IP mira_talk@freelists.org .SH BUGS To report bugs or ask for features, please use the ticketing system at: .IP http://sourceforge.net/projects/mira-assembler/ .SH AUTHOR Bastien Chevreux .PP This manual page was written by Bastien Chevreux but can be freely used for any documentation purpose.