.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.38.2. .TH RAY "1" "November 2012" "Ray version 2.1.0" "User Commands" .SH NAME Ray \- assemble genomes in parallel using the message-passing interface .SH SYNOPSIS .B mpiexec \fB\-n\fR NUMBER_OF_RANKS Ray \fB\-k\fR KMERLENGTH \fB\-p\fR l1_1.fastq l1_2.fastq \fB\-p\fR l2_1.fastq l2_2.fastq \fB\-o\fR test .PP .B mpiexec \fB\-n\fR NUMBER_OF_RANKS Ray Ray.conf # with commands in a file .PP .SH DESCRIPTION: The Ray genome assembler is built on top of the RayPlatform, a generic plugin\-based distributed and parallel compute engine that uses the message\-passing interface for passing messages. .PP Ray targets several applications: .IP \- de novo genome assembly (with Ray vanilla) \- de novo meta\-genome assembly (with Ray Meta) \- de novo transcriptome assembly (works, but not tested a lot) \- quantification of contig abundances \- quantification of microbiome consortia members (with Ray Communities) \- quantification of transcript expression \- taxonomy profiling of samples (with Ray Communities) \- gene ontology profiling of samples (with Ray Ontologies) .HP \fB\-help\fR .IP Displays this help page. .HP \fB\-version\fR .IP Displays Ray version and compilation options. .IP Using a configuration file .IP Ray can be launched with mpiexec \fB\-n\fR 16 Ray Ray.conf The configuration file can include comments (starting with #). .IP K\-mer length .HP \fB\-k\fR kmerLength .IP Selects the length of k\-mers. The default value is 21. It must be odd because reverse\-complement vertices are stored together. The maximum length is defined at compilation by MAXKMERLENGTH Larger k\-mers utilise more memory. .IP Inputs .HP \fB\-p\fR leftSequenceFile rightSequenceFile [averageOuterDistance standardDeviation] .IP Provides two files containing paired\-end reads. averageOuterDistance and standardDeviation are automatically computed if not provided. .HP \fB\-i\fR interleavedSequenceFile [averageOuterDistance standardDeviation] .IP Provides one file containing interleaved paired\-end reads. averageOuterDistance and standardDeviation are automatically computed if not provided. .HP \fB\-s\fR sequenceFile .IP Provides a file containing single\-end reads. .IP Outputs .HP \fB\-o\fR outputDirectory .IP Specifies the directory for outputted files. Default is RayOutput .IP Assembly options (defaults work well) .HP \fB\-disable\-recycling\fR .IP Disables read recycling during the assembly reads will be set free in 3 cases: 1. the distance did not match for a pair 2. the read has not met its mate 3. the library population indicates a wrong placement see Constrained traversal of repeats with paired sequences. Sebastien Boisvert, Elenie Godzaridis, Francois Laviolette & Jacques Corbeil. First Annual RECOMB Satellite Workshop on Massively Parallel Sequencing, March 26\-27 2011, Vancouver, BC, Canada. .HP \fB\-disable\-scaffolder\fR .IP Disables the scaffolder. .HP \fB\-minimum\-contig\-length\fR minimumContigLength .IP Changes the minimum contig length, default is 100 nucleotides .HP \fB\-color\-space\fR .IP Runs in color\-space Needs csfasta files. Activated automatically if csfasta files are provided. .HP \fB\-use\-maximum\-seed\-coverage\fR maximumSeedCoverageDepth .IP Ignores any seed with a coverage depth above this threshold. The default is 4294967295. .HP \fB\-use\-minimum\-seed\-coverage\fR minimumSeedCoverageDepth .IP Sets the minimum seed coverage depth. Any path with a coverage depth lower than this will be discarded. The default is 0. .IP Distributed storage engine (all these values are for each MPI rank) .HP \fB\-bloom\-filter\-bits\fR bits .IP Sets the number of bits for the Bloom filter Default is 268435456 bits, 0 bits disables the Bloom filter. .HP \fB\-hash\-table\-buckets\fR buckets .IP Sets the initial number of buckets. Must be a power of 2 ! Default value: 268435456 .HP \fB\-hash\-table\-buckets\-per\-group\fR buckets .IP Sets the number of buckets per group for sparse storage Default value: 64, Must be between >=1 and <= 64 .HP \fB\-hash\-table\-load\-factor\-threshold\fR threshold .IP Sets the load factor threshold for real\-time resizing Default value: 0.75, must be >= 0.5 and < 1 .HP \fB\-hash\-table\-verbosity\fR .IP Activates verbosity for the distributed storage engine .IP Biological abundances .HP \fB\-search\fR searchDirectory .IP Provides a directory containing fasta files to be searched in the de Bruijn graph. Biological abundances will be written to RayOutput/BiologicalAbundances See Documentation/BiologicalAbundances.txt .HP \fB\-one\-color\-per\-file\fR .IP Sets one color per file instead of one per sequence. By default, each sequence in each file has a different color. For files with large numbers of sequences, using one single color per file may be more efficient. .IP Taxonomic profiling with colored de Bruijn graphs .HP \fB\-with\-taxonomy\fR Genome\-to\-Taxon.tsv TreeOfLife\-Edges.tsv Taxon\-Names.tsv .IP Provides a taxonomy. Computes and writes detailed taxonomic profiles. See Documentation/Taxonomy.txt for details. .TP \fB\-gene\-ontology\fR OntologyTerms.txt Annotations.txt .IP Provides an ontology and annotations. OntologyTerms.txt is fetched from http://geneontology.org Annotations.txt is a 2\-column file (EMBL_CDS handle & gene ontology identifier) See Documentation/GeneOntology.txt .IP Other outputs .HP \fB\-enable\-neighbourhoods\fR .IP Computes contig neighborhoods in the de Bruijn graph Output file: RayOutput/NeighbourhoodRelations.txt .HP \fB\-amos\fR .IP Writes the AMOS file called RayOutput/AMOS.afg An AMOS file contains read positions on contigs. Can be opened with software with graphical user interface. .HP \fB\-write\-kmers\fR .IP Writes k\-mer graph to RayOutput/kmers.txt The resulting file is not utilised by Ray. The resulting file is very large. .HP \fB\-write\-read\-markers\fR .IP Writes read markers to disk. .HP \fB\-write\-seeds\fR .IP Writes seed DNA sequences to RayOutput/Rank.RaySeeds.fasta .HP \fB\-write\-extensions\fR .IP Writes extension DNA sequences to RayOutput/Rank.RayExtensions.fasta .HP \fB\-write\-contig\-paths\fR .IP Writes contig paths with coverage values to RayOutput/Rank.RayContigPaths.txt .HP \fB\-write\-marker\-summary\fR .IP Writes marker statistics. .IP Memory usage .HP \fB\-show\-memory\-usage\fR .IP Shows memory usage. Data is fetched from /proc on GNU/Linux Needs __linux__ .HP \fB\-show\-memory\-allocations\fR .IP Shows memory allocation events .IP Algorithm verbosity .HP \fB\-show\-extension\-choice\fR .IP Shows the choice made (with other choices) during the extension. .HP \fB\-show\-ending\-context\fR .IP Shows the ending context of each extension. Shows the children of the vertex where extension was too difficult. .HP \fB\-show\-distance\-summary\fR .IP Shows summary of outer distances used for an extension path. .HP \fB\-show\-consensus\fR .IP Shows the consensus when a choice is done. .IP Checkpointing .HP \fB\-write\-checkpoints\fR checkpointDirectory .IP Write checkpoint files .HP \fB\-read\-checkpoints\fR checkpointDirectory .IP Read checkpoint files .HP \fB\-read\-write\-checkpoints\fR checkpointDirectory .IP Read and write checkpoint files .IP Message routing for large number of cores .HP \fB\-route\-messages\fR .IP Enables the Ray message router. Disabled by default. Messages will be routed accordingly so that any rank can communicate directly with only a few others. Without \fB\-route\-messages\fR, any rank can communicate directly with any other rank. Files generated: Routing/Connections.txt, Routing/Routes.txt and Routing/RelayEvents.txt and Routing/Summary.txt .HP \fB\-connection\-type\fR type .IP Sets the connection type for routes. Accepted values are debruijn, hypercube, polytope, group, random, kautz and complete. Default is debruijn. .IP debruijn: a full de Bruijn graph a given alphabet and diameter hypercube: a hypercube, alphabet is {0,1} and the vertices is a power of 2 polytope: a convex regular polytope, alphabet is {0,1,...,B\-1} and the vertices is a power of B group: silly model where one representative per group can communicate with outsiders random: Erdos-Renyi model kautz: a full de Kautz graph, which is a subgraph of a de Bruijn graph complete: a full graph with all the possible connections .IP With the type debruijn, the number of ranks must be a power of something. Examples: 256 = 16*16, 512=8*8*8, 49=7*7, and so on. Otherwise, don't use debruijn routing but use another one With the type kautz, the number of ranks n must be n=(k+1)*k^(d\-1) for some k and d .HP \fB\-routing\-graph\-degree\fR degree .IP Specifies the outgoing degree for the routing graph. See Documentation/Routing.txt .IP Hardware testing .HP \fB\-test\-network\-only\fR .IP Tests the network and returns. .HP \fB\-write\-network\-test\-raw\-data\fR .IP Writes one additional file per rank detailing the network test. .HP \fB\-exchanges\fR NumberOfExchanges .IP Sets the number of exchanges .HP \fB\-disable\-network\-test\fR .IP Skips the network test. .IP Debugging .HP \fB\-verify\-message\-integrity\fR .IP Checks message data reliability for any non\-empty message. add '\-D CONFIG_SSE_4_2' in the Makefile to use hardware instruction (SSE 4.2) .HP \fB\-run\-profiler\fR .IP Runs the profiler as the code runs. By default, only show granularity warnings. Running the profiler increases running times. .HP \fB\-with\-profiler\-details\fR .IP Shows number of messages sent and received in each methods during in each time slices (epochs). Needs \fB\-run\-profiler\fR. .HP \fB\-show\-communication\-events\fR .IP Shows all messages sent and received. .HP \fB\-show\-read\-placement\fR .IP Shows read placement in the graph during the extension. .HP \fB\-debug\-bubbles\fR .IP Debugs bubble code. Bubbles can be due to heterozygous sites or sequencing errors or other (unknown) events .HP \fB\-debug\-seeds\fR .IP Debugs seed code. Seeds are paths in the graph that are likely unique. .HP \fB\-debug\-fusions\fR .IP Debugs fusion code. .HP \fB\-debug\-scaffolder\fR .IP Debug the scaffolder. .PP FILES .IP Input files .IP Note: file format is determined with file extension. .IP \&.fasta \&.fasta.gz (needs HAVE_LIBZ=y at compilation) \&.fasta.bz2 (needs HAVE_LIBBZ2=y at compilation) \&.fastq \&.fastq.gz (needs HAVE_LIBZ=y at compilation) \&.fastq.bz2 (needs HAVE_LIBBZ2=y at compilation) \&.sff (paired reads must be extracted manually) \&.csfasta (color\-space reads) .IP Outputted files .IP Scaffolds .IP RayOutput/Scaffolds.fasta .IP The scaffold sequences in FASTA format .IP RayOutput/ScaffoldComponents.txt .IP The components of each scaffold .IP RayOutput/ScaffoldLengths.txt .IP The length of each scaffold .IP RayOutput/ScaffoldLinks.txt .IP Scaffold links .IP Contigs .IP RayOutput/Contigs.fasta .IP Contiguous sequences in FASTA format .IP RayOutput/ContigLengths.txt .IP The lengths of contiguous sequences .IP Summary .IP RayOutput/OutputNumbers.txt .IP Overall numbers for the assembly .IP de Bruijn graph .IP RayOutput/CoverageDistribution.txt .IP The distribution of coverage values .IP RayOutput/CoverageDistributionAnalysis.txt .IP Analysis of the coverage distribution .IP RayOutput/degreeDistribution.txt .IP Distribution of ingoing and outgoing degrees .IP RayOutput/kmers.txt .IP k\-mer graph, required option: \fB\-write\-kmers\fR .IP The resulting file is not utilised by Ray. The resulting file is very large. .IP Assembly steps .IP RayOutput/SeedLengthDistribution.txt .IP Distribution of seed length .IP RayOutput/Rank.OptimalReadMarkers.txt .IP Read markers. .IP RayOutput/Rank.RaySeeds.fasta .IP Seed DNA sequences, required option: \fB\-write\-seeds\fR .IP RayOutput/Rank.RayExtensions.fasta .IP Extension DNA sequences, required option: \fB\-write\-extensions\fR .IP RayOutput/Rank.RayContigPaths.txt .IP Contig paths with coverage values, required option: \fB\-write\-contig\-paths\fR .IP Paired reads .IP RayOutput/LibraryStatistics.txt .IP Estimation of outer distances for paired reads .IP RayOutput/Library.txt .IP Frequencies for observed outer distances (insert size + read lengths) .IP Partition .IP RayOutput/NumberOfSequences.txt .IP Number of reads in each file .IP RayOutput/SequencePartition.txt .IP Sequence partition .IP Ray software .IP RayOutput/RayVersion.txt .IP The version of Ray .IP RayOutput/RayCommand.txt .IP The exact same command provided .IP AMOS .IP RayOutput/AMOS.afg .IP Assembly representation in AMOS format, required option: \fB\-amos\fR .IP Communication .IP RayOutput/MessagePassingInterface.txt .IP Number of messages sent .IP RayOutput/NetworkTest.txt .IP Latencies in microseconds .IP RayOutput/RankNetworkTestData.txt .IP Network test raw data .PP DOCUMENTATION .IP \- mpiexec \fB\-n\fR 1 Ray \fB\-help\fR|less (always up\-to\-date) \- This help page (always up\-to\-date) \- The directory Documentation/ \- Manual (Portable Document Format): InstructionManual.tex (in Documentation) \- Mailing list archives: http://sourceforge.net/mailarchive/forum.php?forum_name=denovoassembler\-users .PP AUTHOR .IP Written by Sebastien Boisvert. .PP REPORTING BUGS .IP Report bugs to denovoassembler\-users@lists.sourceforge.net Home page: .PP COPYRIGHT .IP This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3 of the License. .IP This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. .IP You have received a copy of the GNU General Public License along with this program (see LICENSE). .PP Ray 2.1.0 .PP License for Ray: GNU General Public License version 3 RayPlatform version: 1.1.0 License for RayPlatform: GNU Lesser General Public License version 3 .PP MAXKMERLENGTH: 32 KMER_U64_ARRAY_SIZE: 1 Maximum coverage depth stored by CoverageDepth: 4294967295 MAXIMUM_MESSAGE_SIZE_IN_BYTES: 4000 bytes FORCE_PACKING = n ASSERT = n HAVE_LIBZ = y HAVE_LIBBZ2 = y CONFIG_PROFILER_COLLECT = n CONFIG_CLOCK_GETTIME = n __linux__ = y _MSC_VER = n __GNUC__ = y RAY_32_BITS = n RAY_64_BITS = y MPI standard version: MPI 2.1 MPI library: Open\-MPI 1.4.2 Compiler: GNU gcc/g++ 4.4.5