.\" Automatically generated by Pod::Man 4.11 (Pod::Simple 3.35)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings.  \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote.  \*(C+ will
.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and
.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
.    ds -- \(*W-
.    ds PI pi
.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
.    ds L" ""
.    ds R" ""
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds -- \|\(em\|
.    ds PI \(*p
.    ds L" ``
.    ds R" ''
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
.    if \nF \{\
.        de IX
.        tm Index:\\$1\t\\n%\t"\\$2"
..
.        if !\nF==2 \{\
.            nr % 0
.            nr F 2
.        \}
.    \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "Bio::Search::SearchUtils 3pm"
.TH Bio::Search::SearchUtils 3pm "2020-10-28" "perl v5.30.3" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
Bio::Search::SearchUtils \- Utility functions for Bio::Search:: objects
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 1
\&  # This module is just a collection of subroutines, not an object.
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
The SearchUtils.pm module is a collection of subroutines used
primarily by Bio::Search::Hit::HitI objects for some of the additional
functionality, such as \s-1HSP\s0 tiling. Right now, the SearchUtils is just
a collection of methods, not an object.
.SH "AUTHOR"
.IX Header "AUTHOR"
Steve Chervitz <sac@bioperl.org>
.SH "CONTRIBUTORS"
.IX Header "CONTRIBUTORS"
Sendu Bala, bix@sendu.me.uk
.SS "tile_hsps"
.IX Subsection "tile_hsps"
.Vb 10
\& Usage     : tile_hsps( $sbjct );
\&           : This is called automatically by methods in Bio::Search::Hit::GenericHit 
\&           : that rely on having tiled data.
\&           :
\&           : If you are interested in getting data about the constructed HSP contigs:
\&           : my ($qcontigs, $scontigs) = Bio::Search::SearchUtils::tile_hsps($hit);
\&           : if (ref $qcontigs) {
\&           :    print STDERR "Query contigs:\en";
\&           :    foreach (@{$qcontigs}) {
\&           :         print "contig start is $_\->{\*(Aqstart\*(Aq}\en";
\&           :         print "contig stop is $_\->{\*(Aqstop\*(Aq}\en";
\&           :    }
\&           : }
\&           : See below for more information about the contig data structure.
\&           :
\& Purpose   : Collect statistics about the aligned sequences in a set of HSPs.
\&           : Calculates the following data across all HSPs: 
\&           :    \-\- total alignment length 
\&           :    \-\- total identical residues 
\&           :    \-\- total conserved residues
\& Returns   : If there was only a single HSP (so no tiling was necessary)
\&               tile_hsps() returns a list of two non\-zero integers.
\&             If there were multiple HSP, 
\&               tile_hsps() returns a list of two array references containin HSP contig data.
\&             The first array ref contains a list of HSP contigs on the query sequence.
\&             The second array ref contains a list of HSP contigs on the subject sequence.
\&             Each contig is a hash reference with the following data fields:
\&               \*(Aqstart\*(Aq => start coordinate of the contig
\&               \*(Aqstop\*(Aq  => start coordinate of the contig
\&               \*(Aqiden\*(Aq  => number of identical residues in the contig
\&               \*(Aqcons\*(Aq  => number of conserved residues in the contig
\&               \*(Aqstrand\*(Aq=> strand of the contig
\&               \*(Aqframe\*(Aq => frame of the contig
\& Argument  : A Bio::Search::Hit::HitI object 
\& Throws    : n/a
\& Comments  :
\&           : This method performs more careful summing of data across
\&           : all HSPs in the Sbjct object. Only HSPs that are in the same strand 
\&           : and frame are tiled. Simply summing the data from all HSPs
\&           : in the same strand and frame will overestimate the actual 
\&           : length of the alignment if there is overlap between different HSPs 
\&           : (often the case).
\&           :
\&           : The strategy is to tile the HSPs and sum over the
\&           : contigs, collecting data separately from overlapping and
\&           : non\-overlapping regions of each HSP. To facilitate this, the
\&           : HSP.pm object now permits extraction of data from sub\-sections
\&           : of an HSP.
\&           : 
\&           : Additional useful information is collected from the results
\&           : of the tiling. It is possible that sub\-sequences in
\&           : different HSPs will overlap significantly. In this case, it
\&           : is impossible to create a single unambiguous alignment by
\&           : concatenating the HSPs. The ambiguity may indicate the
\&           : presence of multiple, similar domains in one or both of the
\&           : aligned sequences. This ambiguity is recorded using the
\&           : ambiguous_aln() method.
\&           : 
\&           : This method does not attempt to discern biologically
\&           : significant vs. insignificant overlaps. The allowable amount of 
\&           : overlap can be set with the overlap() method or with the \-OVERLAP
\&           : parameter used when constructing the Hit object.
\&           : 
\&           : For a given hit, both the query and the sbjct sequences are
\&           : tiled independently.
\&           : 
\&           :    \-\- If only query sequence HSPs overlap, 
\&           :          this may suggest multiple domains in the sbjct.
\&           :    \-\- If only sbjct sequence HSPs overlap, 
\&           :          this may suggest multiple domains in the query.
\&           :    \-\- If both query & sbjct sequence HSPs overlap, 
\&           :          this suggests multiple domains in both.
\&           :    \-\- If neither query & sbjct sequence HSPs overlap, 
\&           :          this suggests either no multiple domains in either
\&           :          sequence OR that both sequences have the same
\&           :          distribution of multiple similar domains.
\&           : 
\&           : This method can deal with the special case of when multiple
\&           : HSPs exactly overlap.
\&           : 
\&           : Efficiency concerns:
\&           :  Speed will be an issue for sequences with numerous HSPs.
\&           : 
\& Bugs      : Currently, tile_hsps() does not properly account for
\&           : the number of non\-tiled but overlapping HSPs, which becomes a problem
\&           : as overlap() grows. Large values overlap() may thus lead to 
\&           : incorrect statistics for some hits. For best results, keep overlap()
\&           : below 5 (DEFAULT IS 2). For more about this, see the "HSP Tiling and
\&           : Ambiguous Alignments" section in L<Bio::Search::Hit::GenericHit>.
.Ve
.PP
See Also   : _adjust_contigs(), Bio::Search::Hit::GenericHit
.SS "logical_length"
.IX Subsection "logical_length"
.Vb 10
\& Usage     : logical_length( $alg_name, $seq_type, $length );
\& Purpose   : Determine the logical length of an aligned sequence based on 
\&           : algorithm name and sequence type.
\& Returns   : integer representing the logical aligned length.
\& Argument  : $alg_name = name of algorigthm (e.g., blastx, tblastn)
\&           : $seq_type = type of sequence (e.g., query or hit)
\&           : $length = physical length of the sequence in the alignment.
\& Throws    : n/a
\& Comments  : This function is used to account for the fact that number of identities 
\&             and conserved residues is reported in peptide space while the query 
\&             length (in the case of BLASTX and TBLASTX) and/or the hit length 
\&             (in the case of TBLASTN and TBLASTX) are in nucleotide space.
\&             The adjustment affects the values reported by the various frac_XXX 
\&             methods in GenericHit and GenericHSP.
.Ve
.SS "get_exponent"
.IX Subsection "get_exponent"
.Vb 10
\& Usage     : &get_exponent( number );
\& Purpose   : Determines the power of 10 exponent of an integer, float, 
\&           : or scientific notation number.
\& Example   : &get_exponent("4.0e\-206");
\&           : &get_exponent("0.00032");
\&           : &get_exponent("10.");
\&           : &get_exponent("1000.0");
\&           : &get_exponent("e+83");
\& Argument  : Float, Integer, or scientific notation number
\& Returns   : Integer representing the exponent part of the number (+ or \-).
\&           : If argument == 0 (zero), return value is "\-999".
\& Comments  : Exponents are rounded up (less negative) if the mantissa is >= 5.
\&           : Exponents are rounded down (more negative) if the mantissa is <= \-5.
.Ve
.SS "collapse_nums"
.IX Subsection "collapse_nums"
.Vb 10
\& Usage     : @cnums = collapse_nums( @numbers );
\& Purpose   : Collapses a list of numbers into a set of ranges of consecutive terms:
\&           : Useful for condensing long lists of consecutive numbers.
\&           :  EXPANDED:
\&           :     1 2 3 4 5 6 10 12 13 14 15 17 18 20 21 22 24 26 30 31 32
\&           :  COLLAPSED:
\&           :     1\-6 10 12\-15 17 18 20\-22 24 26 30\-32
\& Argument  : List of numbers sorted numerically.
\& Returns   : List of numbers mixed with ranges of numbers (see above).
\& Throws    : n/a
.Ve
.PP
See Also   : \fBBio::Search::Hit::BlastHit::seq_inds()\fR
.SS "strip_blast_html"
.IX Subsection "strip_blast_html"
.Vb 10
\& Usage     : $boolean = &strip_blast_html( string_ref );
\&           : This method is exported.
\& Purpose   : Removes HTML formatting from a supplied string.
\&           : Attempts to restore the Blast report to enable
\&           : parsing by Bio::SearchIO::blast.pm
\& Returns   : Boolean: true if string was stripped, false if not.
\& Argument  : string_ref = reference to a string containing the whole Blast
\&           :              report containing HTML formatting.
\& Throws    : Croaks if the argument is not a scalar reference.
\& Comments  : Based on code originally written by Alex Dong Li
\&           : (ali@genet.sickkids.on.ca).
\&           : This method does some Blast\-specific stripping 
\&           : (adds back a \*(Aq>\*(Aq character in front of each HSP 
\&           : alignment listing).
\&           :   
\&           : THIS METHOD IS VERY SENSITIVE TO BLAST FORMATTING CHANGES!
\&           :
\&           : Removal of the HTML tags and accurate reconstitution of the
\&           : non\-HTML\-formatted report is highly dependent on structure of
\&           : the HTML\-formatted version. For example, it assumes that first 
\&           : line of each alignment section (HSP listing) starts with a
\&           : <a name=..> anchor tag. This permits the reconstruction of the 
\&           : original report in which these lines begin with a ">".
\&           : This is required for parsing.
\&           :
\&           : If the structure of the Blast report itself is not intended to
\&           : be a standard, the structure of the HTML\-formatted version
\&           : is even less so. Therefore, the use of this method to
\&           : reconstitute parsable Blast reports from HTML\-format versions
\&           : should be considered a temporary solution.
.Ve
.SS "result2hash"
.IX Subsection "result2hash"
.Vb 6
\& Title    : result2hash
\& Usage    : my %data = &Bio::Search::SearchUtils($result)
\& Function : converts ResultI data to simple hash
\& Returns  : hash
\& Args     : ResultI
\& Note     : used mainly as a utility for running SearchIO tests
.Ve