dextract - pull information needed for assembly from source HDF5 files made by PacBio RS II sequencer


dextract [-vq] [-o[<path>]] [-l<int(500)>] [-s<int(750)>] <input:bax_h5> ...


Dextract takes a series of .bax.h5 or .subreads.[bs]am files as input, and depending on the option flags settings produces:

(-f) a.fasta file containing subread sequences, each with a "standard" Pacbio header consisting of the movie name, well number, pulse range, and read quality value.
(-a) a FASTA format .arrow file containing the pulse width stream for each subread, with a header that contains the movie name and the 4 channel SNR values.
(-q) a FASTAQ-like .quiva file containing for each subread the same header as the .fasta file above, save that it starts with an @-sign, followed by the 5 quality value streams used by Quiver, one per line, where the order of the streams is: deletion QVs, deletion Tags, insertion QVs, merge QVs, and last substitution QVs.

If the -v option is set then the program reports the processing of each PacBio input file, otherwise it runs silently. If none of the -f, -a, or -q flags is set, then by default -f is assumed. The destination of the extracted information is controlled by the -o parameter as follows:

If -o is absent, then for each input file X.bax.h5 or X.subreads.[bs]am, dextract will produce X.fasta, X.arrow, and/or X.quiva as per the option flags.
If -o is present and followed by a path Y, then the concatenation of the output for the input files is placed in Y.fasta, Y.arrow, and/or Y.quiva as per the option flags.
If -o is present but with no following path, then the output is sent to the standard output (to enable a UNIX pipe if desired). In this case only one of the flags -f, -a, or -q can be set.


