.TH SOAPaligner/soap2 1 "25 May 2009" SOAPaligner-2.1X "Bioinformatics tool"
.SH NAME
.PP
SOAPaligner/soap2 \- Short Oligonucleotide Analysis Package aligner
.SH SYNOPSIS
.PP
soap reference.index short_reads.fast[a|q] alignment.out [options]
.SH DESCRIPTION
.PP
SOAPaligner/soap2 is a member of the SOAP (Short Oligonucleotide Analysis Package). It is an updated version of SOAP software for short oligonucleotide alignment. The new program features in super fast and accurate alignment for huge amounts of short reads generated by Illumina/Solexa Genome Analyzer. Compared to soap v1, it is one order of magnitude faster. It require only 2 minutes aligning one million single-end reads onto the human reference genome. Another remarkable improvement of SOAPaligner is that it now supports a wide range of the read length.
.PP
SOAPaligner benefitted in time and space efficiency by a revolution in the basic data structures and algorithms used.The core algorithms and the indexing data structures (2way-BWT) are developed by the algorithms research group of the Department of Computer Science, the University of Hong Kong (T.W. Lam, Alan Tam, Simon Wong, Edward Wu and S.M. Yiu).
.SH COMMAND AND OPTIONS
.PP
.B soap
-D <in.fasta.index> -a <query.file.a> [-b <query.file.b>] -o <alignment.output> [-2 <unpaired.output>] [options]
.P
.B OPTIONS:
.RS
.TP
.B -D STR
Prefix name for reference index [*.index]. See 
.B APPENDIX
How to build the reference index
.TP
.B -a STR
Query file, for SE reads alignment or one end of PE reads
.TP
.B -b STR
Query b file, one end of PE reads
.TP
.B -o STR
Output file for alignment results
.TP
.B -2 STR
Output file contains mapped but unpaired reads when do PE alignment
.TP
.B -u STR
Output file for unmapped reads, [none]
.TP
.B -m INT
Minimal insert size INT allowed for PE, [400]
.TP
.B -x INT
Maximal insert size INT allowed for PE, [600]
.TP
.B -n INT
Filter low quality reads containing more INT bp Ns, [5]
.TP
.B -t
Output reads id instead reads name, [none]
.TP
.B -r INT
How to report repeat hits, 0=none; 1=random one; 2=all, [1]
.TP
.B -R
RF alignment for long insert size(>= 2k bps) PE data, [none] FR alignment
.TP
.B -l INT
For long reads with high error rate at 3'-end, those can't align whole length, then first align 5' INT bp subsequence as a seed, [256] use whole length of the read
.TP
.B -s INT
minimal alignment length (for soft clip) 
.TP
.B -v INT
Totally allowed mismatches in one read, when use subsequence as a seed, [5]
.TP
.B -g INT
Allow gap size in one read, [0]
.TP
.B -M INT
Match mode for each read or the seed part of read, which shouldn't contain more than 2 mismatches, [4]
.RS
.TP
0: exact match only
.TP
1: 1 mismatch match only
.TP
2: 2 mismatch match only
.TP
4: find the best hits
.RE
.B -p INT
Multithreads, n threads, [1]
.SH OUTPUT FORMAT
.PP
SOAP2 output format contains following column information:
.PP
1. reads name / reads ID (if -t is available)
.P
2. reads sequence (if read align to reverse strand, here is the reverse sequence of original read)
.P
3. quality sequence (if input is fasta reads, the column will be all 'h', and the sequence is backward if reads mapping reverse )
.P
4. 
.SH APPENDIX
.PP
Before use soap2 to do alignment, the reference index must be generated by 2bwt-builder.
.P
.RS
.B 2bwt-builder
<reference.fasta>
.P
.B NOTE:
1. the reference input should only be FASTA format; 2. the program wil auto generate the index files in the directory where the fasta file is located, so confirm the permission at first.
.RE
.SH ENVIRONMENT
.PP
The datastructure is imcompatible with 32bit, so it can't be migrated on any 32bit platforms.
Due to using the MMX instruction to opitimize parts of code, the current version can only run on 
.B x86_64 platform.
We will provide a universal version for most of the 64bit platform later.
.TP
.B HARDWARE REQUIREMENT
.RS
1.8Gb RAM (for a genome as large as human's)
.P
2.at least 8Gb hard disk to store index (for a genome as large as human's)
.RE
.TP
.B SYSTEM REQUIREMENT
.RS
Linux x86_64
.RE
.SH SEE ALSO
.PP
Website for SOAP <http://soap.genomics.org.cn>,
.P
Google Group for SOAP <http://groups.google.com/group/bgi-soap>
.TP
.BR Publication:
"SOAP: short oligonucleotide alignment program" (2008) BIOINFORMATICS,Vol. 24 no.5 2008, pages 713\-714
.SH ATHOUR
.PP
.B BGI Shenzhen
SOAP team. The core algorithm Bidirect-BWT is wrotten by Prof. T.W. Lam and his team at HongKong University.
.SH REPORT BUGS
.PP
Report bugs to <soap@genomics.org.cn>
.SH ACKNOWLEDGEMENTS
.PP
We appreciate Prof. T.W. Lam, Alan Tam, Simon Wong, Edward Wu and S.M. Yiu prominent work on Bidirect-BWT.