ROCKHOPPER(1) | General Commands Manual | ROCKHOPPER(1) |
NAME¶
rockhopper - system for analyzing bacterial RNA-seq data (command line tool)
SYNOPSIS¶
rockhopper [options]
DESCRIPTION¶
rockhopper is a comprehensive and user-friendly system for computational analysis of bacterial RNA-seq data. As input, it takes RNA sequencing reads output by high-throughput sequencing technology (FASTQ, QSEQ, FASTA, SAM, or BAM files).
REQUIRED ARGUMENTS¶
- exp1A.fastq,exp1B.fastq,exp1C.fastq exp2A.fastq,exp2B.fastq
- a comma separated list of sequencing files (in FASTQ, QSEQ, FASTA, SAM, or BAM format) for replicate experiments, one list per experimental condition (mate-pair files should be delimited by '%')
REFERENCE BASED ASSEMBLY VS. DE NONO ASSEMBLY¶
If the -g option is used, then rockhopper aligns reads to one or more reference genomes, otherwise, rockhopper performs de novo transcript assembly.
- -g <DIR1,DIR2>
- a comma separated list of directories, each containing a genome file (*.fna), gene file (*.ptt), and rna file (*.rnt)
OPTIONAL ARGUMENTS FOR EITHER REFERENCE BASED ASSEMBLY OR DE NOVO ASSEMBLY¶
- -c <boolean>
- reverse complement single-end reads (default is false)
- -ff, -fr, -rf, -rr
- orientation of two mate reads for paired-end read, f=forward and r=reverse_complement (default is fr)
- -d <integer>
- maximum number of bases between mate pairs for paired-end reads (default is 500)
- -a <boolean>
- identify 1 alignment (true) or identify all optimal alignments (false), (default is true)
- -p <integer>
- number of processors (default is self-identification of processors)
- -e <boolean>
- compute differential expression for transcripts in pairs of experimental conditions (default is true)
- -s <boolean>
- RNA-seq experiments are strand specific (true) or strand ambiguous (false), (default is true)
- -L <comma separated list>
- labels for each condition
- -o <DIR>
- directory where output files are written (default is Rockhopper_Results/)
- -v <boolean>
- verbose output including raw/normalized counts aligning to each gene (default is false)
- -SAM
- output a SAM format file
- -TIME
- output time taken to execute program
OPTIONAL ARGUMENTS FOR REFERENCE BASED ASSEMBLY ONLY¶
- -m <number>
- allowed mismatches as percent of read length (default is 0.15)
- -l <number>
- minimum seed as percent of read length (default is 0.33)
- -y <boolean>
- compute operons (default is true)
- -t <boolean>
- identify transcript boundaries including UTRs and ncRNAs (default is true)
- -z <number>
- minimum expression of UTRs and ncRNAs, a number in range [0.0, 1.0] (default is 0.5)
OPTIONAL ARGUMENTS FOR DE NOVO ASSEMBLY ONLY¶
- -k <integer>
- size of k-mer, range of values is 15 to 31 (default is 25)
- -j <integer>
- minimum length required to use a sequencing read after trimming/processing (default is 35)
- -n <integer>
- size of k-mer hashtable is ~ 2^n (default is 25). HINT: should normally be 25 or, if more memory is available, 26. WARNING: if increased above 25 then more than 1.2M of memory must be allocated
- -b <integer>
- minimum number of full length reads required to map to a de novo assembled trancript (default is 20)
- -u <integer>
- minimum length of de novo assembled transcripts (default is 2*k)
- -w <integer>
- minimum count of k-mer to use it to seed a new de novo assembled transcript (default is 50)
- -x <integer>
- minimum count of k-mer to use it to extend an existing de novo assembled transcript (default is 5)
EXAMPLES¶
reference based assembly with single-end reads
% rockhopper <options> -g genome_DIR1,genome_DIR2
aerobic_replicate1.fastq,aerobic_replicate2.fastq
anaerobic_replicate1.fastq,anaerobic_replicate2.fastq
de novo assembly with single-end reads
% rockhopper <options>
aerobic_replicate1.fastq,aerobic_replicate2.fastq
anaerobic_replicate1.fastq,anaerobic_replicate2.fastq
SEE ALSO¶
February 2022 |