.\"                              hey, Emacs:   -*- nroff -*-
.\"
.\"  Copyright © 2009-2016 -- LIRMM/CNRS                                      
.\"                           (Laboratoire d'Informatique, de Robotique et de 
.\"                           Microélectronique de Montpellier /              
.\"                           Centre National de la Recherche Scientifique)   
.\"                           LIFL/INRIA                                      
.\"                           (Laboratoire d'Informatique Fondamentale de     
.\"                           Lille / Institut National de Recherche en       
.\"                           Informatique et Automatique)                    
.\"                           LITIS                                           
.\"                           (Laboratoire d'Informatique, du Traitement de   
.\"                           l'Information et des Systèmes).                 
.\"                                                                           
.\"  Copyright © 2011-2016 -- IRB/INSERM                                      
.\"                           (Institut de Recherches en Biothérapie /        
.\"                           Institut National de la Santé et de la Recherche
.\"                           Médicale).                                      
.\"                                                                           
.\"  Copyright © 2015-2016 -- AxLR/SATT                                       
.\"                           (Lanquedoc Roussilon /                          
.\"                            Societe d'Acceleration de Transfert de         
.\"                            Technologie).	                            
.\"                                                                           
.\"  Programmeurs/Progammers:                                                 
.\"                    Nicolas PHILIPPE <nphilippe.resear@gmail.com>          
.\"                    Mikaël SALSON    <mikael.salson@lifl.fr>               
.\"                    Jérôme Audoux    <jerome.audoux@gmail.com>             
.\"   with additional contribution for the packaging of:	                    
.\"                    Alban MANCHERON  <alban.mancheron@lirmm.fr>            
.\"                                                                           
.\"   Contact:         CRAC list   <crac-bugs@lists.gforge.inria.fr>          
.\"   Paper:           CRAC: An integrated RNA-Seq read analysis              
.\"                    Philippe N., Salson M., Commes T., Rivals E.           
.\"                    Genome Biology 2013; 14:R30.                           
.\"                                                                           
.\"  -------------------------------------------------------------------------
.\"                                                                           
.\"   This File is part of the CRAC program.                                  
.\"                                                                           
.\"   This program is free software: you can redistribute it and/or modify    
.\"   it under the terms of the GNU General Public License as published by    
.\"   the Free Software Foundation, either version 3 of the License, or (at   
.\"   your option) any later version.  This program is distributed in the     
.\"   hope that it will be useful, but WITHOUT ANY WARRANTY; without even     
.\"   the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR     
.\"   PURPOSE.  See the GNU General Public License for more details.  You     
.\"   should have received a copy of the GNU General Public License along     
.\"   with this program.  If not, see <http://www.gnu.org/licenses/>.         
.\"                                                                             

.TH crac 1 "2020-12-23"
.\" Please update the above date whenever this man page is modified.
.\"
.\" Some roff macros, for reference:
.\" .nh        disable hyphenation
.\" .hy        enable hyphenation
.\" .ad l      left justify
.\" .ad b      justify to both left and right margins (default)
.\" .nf        disable filling
.\" .fi        enable filling
.\" .br        insert line break
.\" .sp <n>    insert n+1 empty lines
.\" for manpage-specific macros, see man(7)
.SH NAME
crac v. 2.5.2 \- Crac is a tool to analyse RNA-Seq data provided by NGS.

.SH SYNOPSIS
.B crac [ options ] \fB\-i\fR <index_file> \fB\-r\fR <reads_file1> [reads_file2]
\fB\-k\fR <int> \fB\-o\fR <output_file>
.PP
.B crac \fB\-h\fR|\fB\-\-help\fR
.br
.B crac \fB\-f\fR|\fB\-\-full\-help\fR
.br
.B crac \fB\-v\fR|\fB\-\-version\fR
		
.SH DESCRIPTION
\fBcrac\fP CRAC: an integrated approach to the analysis of RNA-seq reads
.PP
Whatever the biological questions it addresses, each RNA-seq analysis
requires a computational prediction of either small scale mutations,
indels, splice junctions or fusion RNAs. This prediction is currently
performed using complex pipelines involving multiple tools for
mapping, coverage computation, and prediction at distinct steps.  We
propose a novel way of analyzing reads that integrates genomic
locations and local coverage, and delivers all above mentioned
predictions in a single step. Our program, CRAC, uses a double k-mer
profiling approach to detect candidate mutations, indels, splice or
fusion junctions in each single read.  Compared to existing tools,
CRAC provides state of the art sensitivity and improved precision for
all types of predictions, yielding high rates of true positive
candidates (99.5% for splice junctions). When applied to four breast
cancer libraries, CRAC recovered 74% of validated fusion RNAs and
predicted reccurrent fusion junctions that were overseen in previous
studies. Importantly, CRAC improves its predictive performance when
supplied with e.g. 200 nt reads and should fit future needs of read
analyses.
.SH SPECIAL OPTIONS
\fBcrac\fP As a lot of softwares, there are many optional parameters in CRAC but only three are mandatory. This document is intended to guide users of CRAC to choose the more appropriate parameters according to their needs:
.TP
\fB\-h\fR, \fB\-\-help\fR
to print the principal help page of CRAC
.TP
\fB\-f\fR, \fB\-\-full\-help\fR
to print the complete help page of CRAC
.TP
\fB\-v\fR, \fB\-\-version\fR
to print version of CRAC
.SH USUAL ARGUMENTS
.SS
Mandatory options
All these flags must be set.
.TP
\fB\-i\fR \fI<index_file>\fP 
is the name of the index previously built with the crac-index binary
file. Note that crac-index construct the structure <index_file.ssa>
with its configuration <index_file.conf> so only the prefix
<index_file> must be specified (without extension) to consider the
structure and the configuration files both in CRAC
.TP
\fB\-r\fR \fI<reads_file1> [reads_file2]\fP
is the source(s) of the FASTA or FASTQ file(s) containing the
reads. Note that the number of files depends if single or
paired-reads. The input file may also be compressed using gzip
.TP
\fB\-k\fR \fI<int>\fP
is the length of the k-mer to be used to map the reads on the
reference <index_file>. Note that the condition (k < m) is necessary
and reads (or both paired reads) are ignored if m < k. It must be
chosen to ensure (as much as possible) that a k-mer has a very high
probability to occur a single time on the genome
.TP
\fB\-o\fR, \fB\-\-sam\fR \fI<output\_file>\fP
is the output file in SAM format (see the Documentation of SAM format
in CRAC for more details) or print on STDOUT with "\-o \-" argument
.SS
Optional parameters
.TP
\fB\-\-stranded\fP
must be specificied if reads are produced by a stranded protocol of RNA-Seq (not stranded by default)
.TP
\fB\-\-fr/\-\-rf/\-\-ff\fP
set the mates alignement orientation (\-\-rf by default)
.TP
\fB\-m\fR, \fB\-\-reads\-length\fR, \fB\-m\fR \fI<int>\fP
must be specified for reads of fixed length. If the read length is
fixed, we deeply recommend you to specify the read length, by using
the -m parameter. CRAC will therefore be much faster. \-\-reads-length
<int> is specified for variable or longer reads, reads shorter are
ignored and reads longer are trimmed
.TP
\fB\-\-treat\-multiple\fR \fI<int>\fP 
display alignments with multiple locations (with a fixed limit) rather
than a single alignment per read in the SAM file
.TP
\fB\-\-nb\-threads\fR \fI<int>\fP
is the number of threads to run crac, computational time is almost
divided by the number of threads (one thread by default)
.TP
\fB\-\-max\-locs\fR \fI<int>\fP
corresponds to the max number of occurrences retrieved in the index
for a given k-mer: smaller is faster, but with a small value, you may
miss some locations that would help CRAC detecting the right cause
.TP
\fB\-\-no\-ambiguity\fR \fI<none>\fP
discard biological events (splice, svn, indel,
chimera) which have several matches on the reference index.
Indeed, if crac has identified a biological cause in the read that can
match in differents places of the genome we classify this cause as a
biological undetermined event.
.SS
Optional output arguments
.TP
\fB\-\-gz\fR \fI<none>\fP
all output files specified after this argument are gzipped (included for the sam file if \-o/\-\-sam argument is specified after)
.TP
\fB\-\-bam\fR \fI<none>\fP
sam output is encode in binary format(BAM)
.TP
\fB\-\-summary\fR \fI<output\_file>\fP
save some statistics about mapping and classification
.TP
\fB\-\-show\-progressbar\fR \fI<none>\fP
show a progress bar for the process times on STDERR
.TP
\fB\-\-use\-x\-in\-cigar\fR \fI<none>\fP
use X cigar operator when CRAC identifies a mismatch
.SS
Optional output homemade file formats
.TP
\fB\-\-all\fR \fI<base\_filename>\fP
set output base filename for all causes following. Note that only a
base\_filename must be specified. Then, the appropriate file extension
is added for each cause (SNP, chimera, splice, etc) set output base
filename for all causes following
.TP
\fB\-\-normal\fR \fI<output\_file>\fP
save reads that do not contain any break
.TP
\fB\-\-almost\-normal\fR \fI<output\_file>\fP
save reads that do not contain any break but with a variable support
.TP
\fB\-\-single\fR \fI<output\_file>\fP
save reads which are located in this way: at least
\-\-min\-percent\-single\-loc <float> of k\-mers are once located on the
reference index
.TP
\fB\-\-duplicate\fR \fI<output\_file>\fP
save reads which are located in this way: at least
\-\-min\-percent\-duplication\-loc <float> of k\-mers are a few times on the
reference index (ie. between \-\-min\-duplication <int> and
\-\-max\-duplication <int> of locations)
.TP
\fB\-\-multiple\fR \fI<output\_file>\fP
save reads which are located in this way: at least
\-\-min\-percent\-multiple\-loc <float> of k\-mers are a many times on the
reference index (ie. more than \-\-max\-duplication <int> of locations)
.TP
\fB\-\-none\fR \fI<output\_file>\fP
 save reads which are not located on the reference index
.TP
\fB\-\-snv\fR \fI<output\_file>\fP
save reads that contain at least a snv
.TP
\fB\-\-indel\fR \fI<output\_file>\fP
save reads that contain at least a biological indel
.TP
\fB\-\-splice\fR \fI<output\_file>\fP
save reads that contain at least a splicing junction
.TP
\fB\-\-weak\-splice\fR \fI<output\_file>\fP
save reads that contain at least a low coverage splicing junction
.TP
\fB\-\-chimera\fR \fI<output\_file>\fP
save reads that contain at least a chimera junction (junction on
different chromosomes, strands or genes)
.TP
\fB\-\-paired\-end\-chimera\fR \fI<output\_file>\fP
paired-end-chimera <output\_file>= save paired-end reads that contains a
chimera in the non-sequenced part of the original fragment.
.TP
\fB\-\-biological\fR \fI<output\_file>\fP
save reads that contain a biological cause but for which there is not
enough informations to be more specific. Note that the biological
cause is described for each read
.TP
\fB\-\-errors\fR \fI<output\_file>\fP
save reads that contain at least a sequence error
.TP
\fB\-\-repeat\fR \fI<output\_file>\fP
 save reads that contain a repeated sequence: at least
 \-\-min\-percent\-repetition\-loc <float> percent of k-mers of a given
 read are located at least \-\-min\-repetition <int> occurrences on the
 reference index
.TP
\fB\-\-undetermined\fR \fI<output\_file>\fP
save reads that contain an undetermined error: some k-mers are not
located on the genome, but the reason for that could not be
determined. Note that the error is described for each read
.TP
\fB\-\-nothing\fR \fI<output\_file>\fP
save reads that are unclassified
.SS
Optional process for specific research
.TP
\fB\-\-deep\-snv\fR
must be specified to increase sensitivity to find SNVs at the cost of
more computations (only substitution, no indels YET). That process
searches for SNV in border cases reads. Those reads would otherwise be
classified in bioundetermined
.TP
\fB\-\-stringent\-chimera\fR
must be specified to increase accuracy to find chimera junctions in
exchange of sensitivity and computational times
.SS
Optional process launcher (once must be selected)
.TP
\fB\-\-emt\fR 
launch an exact matching processing of reads on the
index. Either the argument specified with -k is equal to 0 which means that the
entire read is perfectly mapped on the genome or only a factor of
length k per read is mapped (the first one with a location) and the
rest is sofclipped. With this process, reads are not indexed and it
provides a low memory consumption. Note this kind of method is very
useful for DGE reads mapping.

.TP
\fB\-\-server\fR
launch a server to query a given read more precisely. That process is
useful for debugging. Note that the output arguments will not be taken
into account. Give an \-\-input\-name\-server <string> to set the input
fifo name (classify.fifo by default) and give an \-\-output\-name\-server
<string> to set the output fifo name (classify.out.fifo by
default). The server can then be used through a client crac\-client
.SS
Additional settings for users
.TP
\fB\-\-detailed\-sam\fR
more informations are added in SAM output file. See the Documentation
of SAM format in CRAC for more details
.TP
\fB\-\-min\-percent\-single\-loc\fR \fI<float>\fR
is, to consider a given read as uniquely mapped, the minimum
proportion of k-mers that are uniquely mapped on the index (0.15 by
default)
.TP
\fB\-\-min\-duplication\fR \fI<int>\fR
is the minimum number of location to consider a duplicated k-mer (2 by
default)
.TP
\fB\-\-max\-duplication\fR \fI<int>\fR
is the maximum number of location to consider a duplicated k-mer (9 by
default)
.TP
\fB\-\-min\-percent\-duplication\-loc\fR \fI<float>\fR
is, to consider a given read as duplicated, the minimum proportion of
k-mers that are duplicated on the index (0.15 by default)
.TP
\fB\-\-min\-percent\-multiple\-loc\fR \fI<float>\fR
is, to consider a given read as “multiple”, the minimum proportion of
k-mers that are multiple mapped on the index (0.50 by default)
.TP
\fB\-\-min\-repetition\fR \fI<int>\fR
is the minimum number of locations to consider a repeated k-mer (20 by
default)
.TP
\fB\-\-max\-percent\-repetition\-loc\fR \fI<float>\fR
is, for a given read, the minimum proportion of k-mers that are
repeated on the index to consider a repetition (0.20 by default)
.TP
\fB\-\-max\-splice\-length\fR \fI<int>\fR
is the threshold to consider a splice, ie. a splice is reported if the
junction length is below max\-splice\-length <int>, a chimera is
considered otherwise (distance by default is 300Kb)
.TP
\fB\-\-max\-bio\-indel\fR \fI<int>\fR
is the threshold to consider a biological indel, ie. an indel is
reported if the gap length is below max\-bio\-indel, a splice is
considered otherwise (distance by default is 15)
.TP
\fB\-\-max\-bases\-retrieved\fR \fI<int>\fR 
is the number of nucleotides to display in outputfile in case of
insertion (15 by default)
.TP
\fB\-\-min\-support\-no\-cover\fR \fI<float>\fR
is the minimum coverage to be able to report a biological cause. Note
that if a single read contains a given substitution, it is difficult
(if not impossible) to distinguish a sequence error and a biological
cause (1.30 by default)
.SS
Additional settings for advanced users
.TP
\fB\-\-min\-break\-length\fR \fI<int>\fR
is the minimal break length (as the percentage of k, the k-mer length)
so that a cause can be reported. Theoretically, for a given cause, the
break length is always >= (kmer_length \- 1). Otherwise, the break may
be merged with a close enough break, or the break will be considered
as undetermined. (0.5 by default)
.TP
\fB\-\-max\-bases\-randomly\-matched\fR \fI<int>\fR
A k-mer overlapping an exon-exon junction, for example, may still
match on the genome if the overlap is at the end of the read (without
loss of generality). This is due to the fact that the nucleotides
starting the second exon may be the same as the nucleotides starting
the intron. Theoretically, there is a 0.25 probability that we have
the same nucleotide at the first position of the intron and the
exon. This option specifies how many nucleotides may be matched
randomly at most
.TP
\fB\-\-max\-extension\-length\fR \fI<int>\fR
is the maximum number of k-mers extended at each side of a read
break. In fact, for a given break, k-mers with false locations can
generate false biological causes, so the consistency is checked for
each side of the break to discard false k-mers and readjust the good
boundaries of the break (10 by default)
.TP
\fB\-\-nb\-tags\-info\-stored\fR \fI<int>\fR
is a buffer to store informations for each thread during the computing
phase (1000 by default). This value must be increased if threads work
below their real capabilities. With \-\-nb\-threads 15, CPU usage must be
about 1400%
.TP
\fB\-\-reads\-index\fR \fI<string>\fR
the reads index data-structure uses by CRAC. Available reads index are: 
JELLYFISH and GKARRAYS. (JELLYFISH by default).
.TP
\fB\-\-nb\-nucleotides\-snp\-comparison\fR \fI<int>\fR
 is the minimum k\-mer length tolerated for the deep SNVs search (8 by
 default)
.TP
\fB\-\-max\-number\-of\-merges\fR \fI<int>\fR
 is the maximum number of merges tolerated during the break merge process 
 for the chimera detection (4 by default)
.TP
\fB\-\-min\-score\-chimera\-stringent\fR \fI<float>\fR
 is the mimimal score to consider a chimera event
 otherwise it is classify as a bioundetermined event (0.6 by default)


.PP
.SH "SEE ALSO"
The full documentation for
.B crac
is maintained as a org manual.  If the
.B info
and
.B crac
programs are properly installed at your site, the command
.IP
.B info crac
.PP
should give you access to the complete manual.
.SH AUTHOR
.SS
About the crac package.
You can contact
Nicolas PHILIPPE, Mikael SALSON, Jerome AUDOUX and Alban MANCHERON by sending an
e-mail to <crac-bugs@lists.gforge.inria.fr>.
.TP 0
Programming:
        Nicolas PHILIPPE <nphilippe.research@gmail.com>
        Mikaël SALSON    <mikael.salson@lifl.fr>
	Jérome AUDOUX    <jerome.audoux@gmail.com>
.br
with additional contribution for the packaging of:
        Alban MANCHERON  <alban.mancheron@lirmm.fr> 
.SS
About the crac publication.
You may cite the following paper if you use our tool:
.TP 0
Gk-arrays: Querying large read collections in main memory: a versatile
data structure
.br
.B Philippe N., Salson M., Lecroq T., Leonard M., Commes T., Rivals E.
.br
BMC Bioinformatics 2011, 12:242.
.TP 0
Crac: An integrated RNA-Seq read analysis
.br
.B Philippe N., Salson M., Commes T., Rivals E.
.br
Genome Biology 2013; 14:R30.