.\" Automatically generated by Pod::Man 4.09 (Pod::Simple 3.35) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .if !\nF .nr F 0 .if \nF>0 \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "Bio::Assembly::IO::sam 3pm" .TH Bio::Assembly::IO::sam 3pm "2018-10-27" "perl v5.26.2" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" Bio::Assembly::IO::sam \- An IO module for assemblies in Sam format *BETA* .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 3 \& $aio = Bio::Assembly::IO( \-file => "mysam.bam", \& \-refdb => "myrefseqs.fas"); \& $assy = $aio\->next_assembly; .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" This is a (currently) read-only \s-1IO\s0 module designed to convert Sequence/Alignment Map (\s-1SAM\s0; ) formatted alignments to Bio::Assembly::Scaffold representations, containing .Bio::Assembly::Contig and Bio::Assembly::Singlet objects. It uses lstein's Bio::DB::Sam to parse binary formatted \s-1SAM\s0 (.bam) files guided by a reference sequence fasta database. .PP \&\fB\s-1NB\s0\fR: \f(CW\*(C`Bio::DB::Sam\*(C'\fR is not a BioPerl module; it can be obtained via \&\s-1CPAN.\s0 It in turn requires the \f(CW\*(C`libbam\*(C'\fR library; source can be downloaded at . .SH "DETAILS" .IX Header "DETAILS" .IP "\(bu" 4 Required files .Sp A binary \s-1SAM\s0 (\f(CW\*(C`.bam\*(C'\fR) alignment and a reference sequence database in \&\s-1FASTA\s0 format are required. Various required indexes (\f(CW\*(C`.fai\*(C'\fR, \f(CW\*(C`.bai\*(C'\fR) will be created as necessary (via Bio::DB::Sam). .IP "\(bu" 4 Compressed files .Sp \&...can be specified directly , if IO::Uncompress::Gunzip is installed. Get it from your local \s-1CPAN\s0 mirror. .IP "\(bu" 4 \&\s-1BAM\s0 vs. \s-1SAM\s0 .Sp The input alignment should be in (possibly gzipped) binary \s-1SAM\s0 (\f(CW\*(C`.bam\*(C'\fR) format. If it isn't, you will get a message explaining how to convert it, viz.: .Sp .Vb 1 \& $ samtools view \-Sb mysam.sam > mysam.bam .Ve .Sp The bam file must also be sorted on coordinates: do .Sp .Vb 1 \& $ samtools sort mysam.unsorted.bam > mysam.bam .Ve .IP "\(bu" 4 Contigs .Sp Contigs are calculated by this module, using the 'coverage' feature of the Bio::DB::Sam object. A contig represents a contiguous portion of a reference sequence having non-zero coverage at each base. .Sp The bwa assembler () can assign read sequences to multiple reference sequence locations. The present implementation currently assigns such reads only to the first contig in which they appear. .IP "\(bu" 4 Consensus sequences .Sp Consensus sequence and quality objects are calculated by this module, using the \f(CW\*(C`pileup\*(C'\fR callback feature of \f(CW\*(C`Bio::DB::Sam\*(C'\fR. The consensus is (currently) simply the residue at a position that has the maximum sum of quality values. The consensus quality is the integer portion of the simple average of quality values for the consensus residue. .IP "\(bu" 4 SeqFeatures .Sp Read sequences stored in contigs are accompanied by the following features: .Sp .Vb 2 \& contig : name of associated contig \& cigar : CIGAR string for this read .Ve .Sp If the read is paired with a successfully mapped mate, these features will also be available: .Sp .Vb 4 \& mate_start : coordinate of to which the mate was aligned \& mate_len : length of mate read \& mate_strand : strand of mate (\-1 or 1) \& insert_size : size of insert spanned by the mate pair .Ve .Sp These features are obtained as follows: .Sp .Vb 9 \& @ids = $contig\->get_seq_ids; \& $an_id = $id[0]; # or whatever \& $seq = $contig\->get_seq_by_name($an_id); \& # Bio::LocatableSeq\*(Aqs aren\*(Aqt SeqFeature containers, so... \& $feat = $contig\->get_seq_feat_by_tag( \& $seq, "_aligned_coord:".$s\->id \& ); \& ($cigar) = $feat\->get_tag_values(\*(Aqcigar\*(Aq); \& # etc. .Ve .SH "TODO" .IX Header "TODO" .IP "\(bu" 4 Supporting both text \s-1SAM\s0 (\s-1TAM\s0) and binary \s-1SAM\s0 (\s-1BAM\s0) .SH "FEEDBACK" .IX Header "FEEDBACK" .SS "Mailing Lists" .IX Subsection "Mailing Lists" User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated. .PP .Vb 2 \& bioperl\-l@bioperl.org \- General discussion \&http://bioperl.org/wiki/Mailing_lists \- About the mailing lists .Ve .SS "Support" .IX Subsection "Support" Please direct usage questions or support issues to the mailing list: .PP bioperl\-l@bioperl.org .PP rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. .SS "Reporting Bugs" .IX Subsection "Reporting Bugs" Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the web: .PP .Vb 1 \& https://github.com/bioperl/bioperl\-live/issues .Ve .SH "AUTHOR \- Mark A. Jensen" .IX Header "AUTHOR - Mark A. Jensen" Email maj \-at\- fortinbras \-dot\- us .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ .SH "Bio::Assembly::IO compliance" .IX Header "Bio::Assembly::IO compliance" .SS "\fInext_assembly()\fP" .IX Subsection "next_assembly()" .Vb 5 \& Title : next_assembly \& Usage : my $scaffold = $asmio\->next_assembly(); \& Function: return the next assembly in the sam\-formatted stream \& Returns : Bio::Assembly::Scaffold object \& Args : none .Ve .SS "\fInext_contig()\fP" .IX Subsection "next_contig()" .Vb 5 \& Title : next_contig \& Usage : my $contig = $asmio\->next_contig(); \& Function: return the next contig or singlet from the sam stream \& Returns : Bio::Assembly::Contig or Bio::Assembly::Singlet \& Args : none .Ve .SS "\fIwrite_assembly()\fP" .IX Subsection "write_assembly()" .Vb 5 \& Title : write_assembly \& Usage : \& Function: not implemented (module currrently read\-only) \& Returns : \& Args : .Ve .SH "Internal" .IX Header "Internal" .SS "\fI_store_contig()\fP" .IX Subsection "_store_contig()" .Vb 5 \& Title : _store_contig \& Usage : my $contigobj = $self\->_store_contig(\e%contiginfo); \& Function: create and load a contig object \& Returns : Bio::Assembly::Contig object \& Args : Bio::DB::Sam::Segment object .Ve .SS "\fI_store_read()\fP" .IX Subsection "_store_read()" .Vb 5 \& Title : _store_read \& Usage : my $readobj = $self\->_store_read($readobj, $contigobj); \& Function: store information of a read belonging to a contig in a contig object \& Returns : Bio::LocatableSeq \& Args : Bio::DB::Bam::AlignWrapper, Bio::Assembly::Contig .Ve .SS "\fI_store_singlet()\fP" .IX Subsection "_store_singlet()" .Vb 6 \& Title : _store_singlet \& Usage : my $singletobj = $self\->_store_singlet($contigobj); \& Function: convert a contig object containing a single read into \& a singlet object \& Returns : Bio::Assembly::Singlet \& Args : Bio::Assembly::Contig (previously loaded with only one seq) .Ve .SH "REALLY Internal" .IX Header "REALLY Internal" .SS "\fI_init_sam()\fP" .IX Subsection "_init_sam()" .Vb 9 \& Title : _init_sam \& Usage : $self\->_init_sam($fasfile) \& Function: obtain a Bio::DB::Sam parsing of the associated sam file \& Returns : true on success \& Args : [optional] name of the fasta reference db (scalar string) \& Note : The associated file can be plain text (.sam) or binary (.bam); \& If the fasta file is not specified, and no file is contained in \& the refdb() attribute, a .fas file with the same \& basename as the sam file will be searched for. .Ve .SS "\fI_get_contig_segs_from_coverage()\fP" .IX Subsection "_get_contig_segs_from_coverage()" .Vb 7 \& Title : _get_contig_segs_from_coverage \& Usage : \& Function: calculates separate contigs using coverage info \& in the segment \& Returns : array of Bio::DB::Sam::Segment objects, representing \& each contig \& Args : Bio::DB::Sam::Segment object .Ve .SS "\fI_calc_consensus_quality()\fP" .IX Subsection "_calc_consensus_quality()" .Vb 7 \& Title : _calc_consensus_quality \& Usage : @qual = $aio\->_calc_consensus_quality( $contig_seg ); \& Function: calculate an average or other data\-reduced quality \& over all sites represented by the features contained \& in a Bio::DB::Sam::Segment \& Returns : \& Args : a Bio::DB::Sam::Segment object .Ve .SS "\fI_calc_consensus()\fP" .IX Subsection "_calc_consensus()" .Vb 6 \& Title : _calc_consensus \& Usage : @qual = $aio\->_calc_consensus( $contig_seg ); \& Function: calculate a simple quality\-weighted consensus sequence \& for the segment \& Returns : a SeqWithQuality object \& Args : a Bio::DB::Sam::Segment object .Ve .SS "\fIrefdb()\fP" .IX Subsection "refdb()" .Vb 6 \& Title : refdb \& Usage : $obj\->refdb($newval) \& Function: the name of the reference db fasta file \& Example : \& Returns : value of refdb (a scalar) \& Args : on set, new value (a scalar or undef, optional) .Ve .SS "\fI_segset()\fP" .IX Subsection "_segset()" .Vb 10 \& Title : _segset \& Usage : $segset_hashref = $self\->_segset() \& Function: hash container for the Bio::DB::Sam::Segment objects that \& represent each set of contigs for each seq_id \& { $seq_id => [@contig_segments], ... } \& Example : \& Returns : value of _segset (a hashref) if no arg, \& or the arrayref of contig segments, if arg == a seq id \& Args : [optional] seq id (scalar string) \& Note : readonly; hash elt set in _init_sam() .Ve .SS "\fI_current_refseq_id()\fP" .IX Subsection "_current_refseq_id()" .Vb 6 \& Title : _current_refseq_id \& Usage : $obj\->_current_refseq_id($newval) \& Function: the "current" reference sequence id \& Example : \& Returns : value of _current_refseq (a scalar) \& Args : on set, new value (a scalar or undef, optional) .Ve .SS "\fI_current_contig_seg_idx()\fP" .IX Subsection "_current_contig_seg_idx()" .Vb 6 \& Title : current_contig_seg_idx \& Usage : $obj\->current_contig_seg_idx($newval) \& Function: the "current" segment index in the "current" refseq \& Example : \& Returns : value of current_contig_seg_idx (a scalar) \& Args : on set, new value (a scalar or undef, optional) .Ve .SS "\fIsam()\fP" .IX Subsection "sam()" .Vb 6 \& Title : sam \& Usage : $obj\->sam($newval) \& Function: store the associated Bio::DB::Sam object \& Example : \& Returns : value of sam (a Bio::DB::Sam object) \& Args : on set, new value (a scalar or undef, optional) .Ve