.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.40)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings.  \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote.  \*(C+ will
.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and
.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
.    ds -- \(*W-
.    ds PI pi
.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
.    ds L" ""
.    ds R" ""
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds -- \|\(em\|
.    ds PI \(*p
.    ds L" ``
.    ds R" ''
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
.    if \nF \{\
.        de IX
.        tm Index:\\$1\t\\n%\t"\\$2"
..
.        if !\nF==2 \{\
.            nr % 0
.            nr F 2
.        \}
.    \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "Zerg 3pm"
.TH Zerg 3pm "2020-11-25" "perl v5.32.0" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
Zerg \- a lexical scanner for BLAST reports.
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
use Zerg;
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
This manpage describes the Zerg library and its interface for use with
Perl.
.PP
The Zerg library contains a C/flex lexical scanner for \s-1BLAST\s0 reports
and a set of supporting functions. It is centered on a \*(L"get_token\*(R"
function that scans the input for specified lexical elements and, when
one is found, returns its code and value to the user.
.PP
It is intended to be fast: for that we used flex, which provides
simple regular expression matching and input buffering in the
generated C scanner. And it is intended to be simple in the sense of
providing just a lexical scanner, with no features whose support could
slow down its main function.
.SS "\s-1FUNCTIONS\s0"
.IX Subsection "FUNCTIONS"
\&\fBzerg_get_token()\fR is the core function of this module. Each time it is
called, it scans the input \s-1BLAST\s0 report for the next \*(L"interesting\*(R"
lexical element and returns its code and value. Codes are listed in
the section \*(L"\s-1EXPORTED CONSTANTS\s0 (\s-1TOKEN CODES\s0)\*(R". Code zero (not listed)
means end of file.
.PP
.Vb 1
\&  ($code, $value) = Zerg::zerg_get_token();
.Ve
.PP
zerg_open_file($filename) opens \f(CW$filename\fR in read-only mode and set it
as the input to the scanner. If this function is not called, the
standard input is used.
.PP
.Vb 1
\&  Zerg::zerg_open_file($filename);
.Ve
.PP
\&\fBzerg_close_file()\fR closes the file opened with \fBzerg_open_file()\fR.
.PP
\&\fBzerg_get_token_offset()\fR returns the byte offset (relative to the
beginning of file) of the last token read. (See section \s-1BUGS\s0).
.PP
zerg_ignore($code) instructs zerg_get_token not to return when it
finds a token with code \f(CW$code\fR.
.PP
\&\fBzerg_ignore_all()\fR does zerg_ignore on all token codes.
.PP
zerg_unignore($code) instructs zerg_get_token to return when it
finds a token with code \f(CW$code\fR.
.PP
\&\fBzerg_unignore_all()\fR does zerg_unignore on all token codes.
.PP
.Vb 4
\&  Example:
\&  Zerg::zerg_ignore_all();
\&  Zerg::zerg_unignore(QUERY_NAME);
\&  Zerg::zerg_unignore(SUBJECT_NAME);
.Ve
.SS "\s-1EXPORTED CONSTANTS\s0 (\s-1TOKEN CODES\s0)"
.IX Subsection "EXPORTED CONSTANTS (TOKEN CODES)"
.Vb 10
\&    ALIGNMENT_LENGTH    
\&    BLAST_VERSION               
\&    CONVERGED           
\&    DATABASE            
\&    DESCRIPTION_ANNOTATION      
\&    DESCRIPTION_EVALUE  
\&    DESCRIPTION_HITNAME         
\&    DESCRIPTION_SCORE   
\&    END_OF_REPORT               
\&    EVALUE                      
\&    GAPS                        
\&    HSP_METHOD          
\&    IDENTITIES          
\&    NOHITS                      
\&    PERCENT_IDENTITIES  
\&    PERCENT_POSITIVES   
\&    POSITIVES           
\&    QUERY_ALI           
\&    QUERY_ANNOTATION    
\&    QUERY_END           
\&    QUERY_FRAME                 
\&    QUERY_LENGTH                
\&    QUERY_NAME          
\&    QUERY_ORIENTATION   
\&    QUERY_START                 
\&    REFERENCE           
\&    ROUND_NUMBER                
\&    ROUND_SEQ_FOUND     
\&    ROUND_SEQ_NEW               
\&    SCORE                       
\&    SCORE_BITS          
\&    SEARCHING           
\&    SUBJECT_ALI                 
\&    SUBJECT_ANNOTATION  
\&    SUBJECT_END                 
\&    SUBJECT_FRAME               
\&    SUBJECT_LENGTH              
\&    SUBJECT_NAME                
\&    SUBJECT_ORIENTATION         
\&    SUBJECT_START               
\&    TAIL_OF_REPORT              
\&    UNMATCHED
.Ve
.SS "\s-1NOTES ON THE SCANNER\s0"
.IX Subsection "NOTES ON THE SCANNER"
Some \s-1BLAST\s0 parsers rely on some simple regular expression matches to
conclude about token types and values. For example: an input line
matching /^Query=\es(\eS+)/ should make such a \*(L"loose\*(R" parser to infer
that a token was found, it is a query name and its value is
\&\f(CW$1\fR. Although improbable, it is perfectly possible for an anotation
field to match /^Query=\es(\eS+)/. Worse than this is the fact that
those parsers are often unable to detect corrupt or truncated \s-1BLAST\s0
reports, possibly producing inaccurate information.
.PP
The scanner provided by this library is much more stringent: for a
token to match it must be in its place in the context of a \s-1BLAST\s0
report. For example: in a single \s-1BLAST\s0 report, a \s-1QUERY_NAME\s0 cannot
follow another \s-1QUERY_NAME.\s0 The scanner can be thought as, and in fact
it is, a big regular expression that matches an entire \s-1BLAST\s0 report.
.PP
A special token code (\s-1UNMATCHED\s0) is provided for cases in which the input
text does not match any other lexical rule of the scanner. When an
umnacthed character is found, either the report is corrupt or the
scanner has a bug.
.PP
If you are interested in only a few token codes, try to \fBzerg_ignore()\fR
as much codes you can. This will avoid unnecessary function calls that
eat a lot of \s-1CPU.\s0
.SH "EXAMPLES"
.IX Header "EXAMPLES"
This program prints the code and the value of each token it finds.
.PP
.Vb 3
\&  #!/usr/bin/perl \-w
\&  use strict;
\&  use Zerg;
\&
\&  my ($code, $value);
\&  while((($code, $value)= Zerg::zerg_get_token()) && $code)
\&  {
\&      print "$code\et$value\en";
\&  }
.Ve
.PP
The program below is a \*(L"syntax checker\*(R". The presence of UNMATCHEDs is
a strong indicator of problems in the \s-1BLAST\s0 report. (See section \s-1NOTES
ON THE SCANNER\s0)
.PP
.Vb 3
\&  #!/usr/bin/perl \-w
\&  use strict;
\&  use Zerg;
\&
\&  my ($code, $value);
\&
\&  Zerg::zerg_ignore_all();
\&  Zerg::zerg_unignore(UNMATCHED);
\&
\&  while((($code, $value)= Zerg::zerg_get_token()) && $code)
\&  {
\&      print "UNMATCHED CHAR:\et$value\en";
\&  }
.Ve
.SH "BUGS"
.IX Header "BUGS"
The tokens \s-1DESCRIPTION_ANNOTATION, DESCRIPTION_SCORE\s0 and
\&\s-1DESCRIPTION_EVALUE\s0 are scanned all at once and released one by one on
user request. So, if the user wants to get any of these fields, they
must be unignored \s-1BEFORE\s0 scanning \s-1DESCRIPTION_ANNOTATION.\s0
.PP
\&\fBzerg_get_token_offset()\fR may return incorrect values for these tokens
and those that are modified by the parser, namely: \s-1QUERY_LENGTH,
SUBJECT_LENGTH, EVALUE, GAPS.\s0
.SH "TODO"
.IX Header "TODO"
Add more tokens to the scanner as the need for that appears.
.SH "AUTHOR"
.IX Header "AUTHOR"
Apuã Paquola, IQ-USP Bioinformatics Lab, apua@iq.usp.br
.PP
Laszlo Kajan <lkajan@rostlab.org>, Technical University of Munich, Germany
.SH "SEE ALSO"
.IX Header "SEE ALSO"
\&\fBperl\fR\|(1), \fBflex\fR\|(1), http://www.bioperl.org, http://www.ncbi.nlm.nih.gov/BLAST