Scroll to navigation

RG::Blast::Parser(3pm) User Contributed Perl Documentation RG::Blast::Parser(3pm)

NAME

RG::Blast::Parser - fast NCBI BLAST parser

SYNOPSIS

  use Data::Dumper;
  use RG::Blast::Parser;
  my $parser = RG::Blast::Parser->new(); # read from STDIN
  open( EXAMPLE, '<', '/usr/share/doc/librg-blast-parser-perl/examples/converged.ali' ) || confess($!);
  my $parser = RG::Blast::Parser->new( \*EXAMPLE, "converged.ali" ); # read from EXAMPLE, use name "converged.ali" in error messages
  while( my $res = $parser->parse() )
  {
    print Dumper( $res );
  }
  eval {
    my $res = $parser->parse();
    # ...
  };
  if( $@ && $@ =~ /^parser error/ ) { warn("failed to parse blast result - exception caught"); }

DESCRIPTION

This package contains perl binding for a very fast C/C++ library for NCBI BLAST -m 0 (default) output parsing. BLAST results are returned in a convenient hash structure.
Multiple results may be concatenated for input. One result is parsed and returned at a time.

CONSTRUCTOR

new( [FILEHANDLE, [NAME]] )
Creates an "RG::Blast::Parser". Blast results are read from FILEHANDLE, STDIN by default. The input stream may be named NAME in error messages (default: "STDIN").

METHODS

parse( [TRACE_PARSING, [TRACE_SCANNING]] )
Parse one complete BLAST result and return it. If no results on input stream, returns "undef". In case of parser error it die()s with an (at present not very useful) error message.
 
The following structure is returned in a hash reference:
 
  {
    blast_version =>    STRING,
    references =>       [ STRING, ... ],
    rounds => [
        {
            oneline_idx =>      NUM,    # index of first one-line description of
                                        # round in "onelines"
            oneline_cnt =>      NUM,    # number of one-line descriptions of round
                                        # in "onelines"
            hit_idx =>          NUM,    # index of first hit of round in "hits"
            hit_cnt =>          NUM,    # number of hits of round in "hits"
            oneline_new_idx =>  NUM|undef# index of first new (not-seen-before)
                                        # one-line description of round
            oneline_new_cnt =>  NUM     # number of new one-line descriptions of
                                        # round
        }, ...
    ],
    q_name =>       STRING,
    q_desc =>       STRING|undef,
    q_length =>     NUM,
    db_name =>      STRING,
    db_nseq =>      NUM,
    db_nletter =>   NUM,
    onelines =>     [                   # one-line descriptions from all rounds
        {
            name =>         STRING,
            desc =>         STRING|undef,
            bit_score =>    NUM,
            e_value =>      NUM
        }, ...
    ],
    converged =>    BOOLEAN,
    hits =>         [                   # hits from all rounds
        {
            name =>         STRING,
            desc =>         STRING|undef,
            length =>       NUM,
            hsps =>         [
                {
                    bit_score =>    NUM,
                    raw_score =>    NUM,
                    e_value =>      NUM,
                    method =>       STRING,
                    identities =>   NUM,
                    positives =>    NUM,
                    gaps =>         NUM,
                    q_strand =>     STRING|undef,
                    s_strand =>     STRING|undef,
                    q_frame =>      NUM|undef,
                    s_frame =>      NUM|undef,
                    q_start =>      NUM,
                    q_ali =>        STRING,
                    q_end =>        NUM,
                    match_line =>   STRING,
                    s_start =>      NUM,
                    s_ali =>        STRING,
                    s_end =>        NUM
                }, ...
            ]
        }, ...
    ],
    tail =>         STRING              # bulk text after the last hit / one-line
                                        # description
  }
    
 
If you want tracing for parsing and scanning, you can enable them using the parameters of this call.
result()
Returns the last BLAST result parsed or "undef" if no last result.
get_trace_scanning()
Returns scan trace state as a Boolean value.
set_trace_scanning( BOOLEAN )
Set scan trace - debugging aid.

SEE ALSO

Zerg(3pm), Zerg::Report(3pm)

AUTHOR

Laszlo Kajan, <lkajan@rostlab.org>

COPYRIGHT AND LICENSE

Copyright (C) 2012 by Laszlo Kajan
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.
2012-03-29 perl v5.14.2