|General Commands Manual
grabix - random access on large compressed sequence data
grabix index bedfile.gz
grabix grab bedfile.gz linenumber
In biomedical research it is increasing practice to study the genetic basis of disease. This now frequently comprises the sequencing of human sequences. The output of the machine however is redundant, and the real sequence is the best sequence to explain the redundancy. The exchange of data happens only with compressed files - to huge and redundant to perform otherwise. One should avoid uncompression whenever possible.
grabix leverages the fantastic BGZF library of the samtools package to provide random access into text files that have been compressed with bgzip (from tabix package). grabix creates it's own index (.gbi) of the bgzipped file. Once indexed, one can extract arbitrary lines from the file with the grab command. Or choose random lines with the, well, random command.
|July 18, 2013