.\" Automatically generated by Pod::Man 4.10 (Pod::Simple 3.35) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "TFBS::Matrix::PFM 3pm" .TH TFBS::Matrix::PFM 3pm "2018-11-02" "perl v5.28.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::Matrix::PFM \- class for raw position frequency matrix patterns .SH "SYNOPSIS" .IX Header "SYNOPSIS" .IP "\(bu" 4 creating a TFBS::Matrix::PFM object manually: .Sp .Vb 10 \& my $matrixref = [ [ 12, 3, 0, 0, 4, 0 ], \& [ 0, 0, 0, 11, 7, 0 ], \& [ 0, 9, 12, 0, 0, 0 ], \& [ 0, 0, 0, 1, 1, 12 ] \& ]; \& my $pfm = TFBS::Matrix::PFM\->new(\-matrix => $matrixref, \& \-name => "MyProfile", \& \-ID => "M0001" \& ); \& # or \& \& my $matrixstring = \& "12 3 0 0 4 0\en0 0 0 11 7 0\en0 9 12 0 0 0\en0 0 0 1 1 12"; \& \& my $pfm = TFBS::Matrix::PFM\->new(\-matrixstring => $matrixstring, \& \-name => "MyProfile", \& \-ID => "M0001" \& ); .Ve .IP "\(bu" 4 retrieving a TFBS::Matix::PFM object from a database: .Sp (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.) .Sp .Vb 6 \& my $db_obj = TFBS::DB::JASPAR2\->new \& (\-connect => ["dbi:mysql:JASPAR2:myhost", \& "myusername", "mypassword"]); \& my $pfm = $db_obj\->get_Matrix_by_ID("M0001", "PFM"); \& # or \& my $pfm = $db_obj\->get_Matrix_by_name("MyProfile", "PFM"); .Ve .IP "\(bu" 4 retrieving list of individual TFBS::Matrix::PFM objects from a TFBS::MatrixSet object .Sp (See the TFBS::MatrixSet to learn how to create objects for storage and manipulation of multiple matrices.) .Sp .Vb 1 \& my @pfm_list = $matrixset\->all_patterns(\-sort_by=>"name"); .Ve .IP "\(bu" 4 convert a raw frequency matrix to other matrix types: .Sp .Vb 2 \& my $pwm = $pfm\->to_PWM(); # convert to position weight matrix \& my $icm = $icm\->to_ICM(); # convert to information con .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::Matrix::PFM is a class whose instances are objects representing raw position frequency matrices (PFMs). A \s-1PFM\s0 is derived from N nucleotide patterns of fixed size, e.g. the set of sequences .PP .Vb 12 \& AGGCCT \& AAGCCT \& AGGCAT \& AAGCCT \& AAGCCT \& AGGCAT \& AGGCCT \& AGGCAT \& AGGTTT \& AGGCAT \& AGGCCT \& AGGCCT .Ve .PP will give the matrix: .PP .Vb 4 \& A:[ 12 3 0 0 4 0 ] \& C:[ 0 0 0 11 7 0 ] \& G:[ 0 9 12 0 0 0 ] \& T:[ 0 0 0 1 1 12 ] .Ve .PP which contains the count of each nucleotide at each position in the sequence. (If you have a set of sequences as above and want to create a TFBS::Matrix::PFM object out of them, have a look at TFBS::PatternGen::SimplePFM module.) .PP PFMs are easily converted to other types of matrices, namely information content matrices and position weight matrices. A TFBS::Matrix::PFM object has the methods to_ICM and to_PWM which do just that, returning a TFBS::Matrix::ICM and TFBS::Matrix::PWM objects, respectively. .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard" .IX Header "AUTHOR - Boris Lenhard" Boris Lenhard .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .SS "new" .IX Subsection "new" .Vb 5 \& Title : new \& Usage : my $pfm = TFBS::Matrix::PFM\->new(%args) \& Function: constructor for the TFBS::Matrix::PFM object \& Returns : a new TFBS::Matrix::PFM object \& Args : # you must specify either one of the following three: \& \& \-matrix, # reference to an array of arrays of integers \& #or \& \-matrixstring,# a string containing four lines \& # of tab\- or space\-delimited integers \& #or \& \-matrixfile, # the name of a file containing four lines \& # of tab\- or space\-delimited integers \& ####### \& \& \-name, # string, OPTIONAL \& \-ID, # string, OPTIONAL \& \-class, # string, OPTIONAL \& \-tags # an array reference, OPTIONAL \&Warnings : Warns if the matrix provided has columns with different \& sums. Columns with different sums contradict the usual \& origin of matrix data and, unless you are absolutely sure \& that column sums _should_ be different, it would be wise to \& check your matrices. .Ve .SS "column_sum" .IX Subsection "column_sum" .Vb 8 \& Title : column_sum \& Usage : my $nr_sequences = $pfm\->column_sum() \& Function: calculates the sum of elements of one column \& (the first one by default) which normally equals the \& number of sequences used to derive the PFM. \& Returns : the sum of elements of one column (an integer) \& Args : columnn number (starting from 1), OPTIONAL \- you DO NOT \& need to specify it unless you are dealing with a matrix .Ve .SS "to_PWM" .IX Subsection "to_PWM" .Vb 9 \& Title : to_PWM \& Usage : my $pwm = $pfm\->to_PWM() \& Function: converts a raw frequency matrix (a TFBS::Matrix::PFM object) \& to position weight matrix. At present it assumes uniform \& background distribution of nucleotide frequencies. \& Returns : a new TFBS::Matrix::PWM object \& Args : none; in the future releases, it should be able to accept \& a user defined background probability of the four \& nucleotides .Ve .SS "to_ICM" .IX Subsection "to_ICM" .Vb 7 \& Title : to_ICM \& Usage : my $icm = $pfm\->to_ICM() \& Function: converts a raw frequency matrix (a TFBS::Matrix::PFM object) \& to information content matrix. At present it assumes uniform \& background distribution of nucleotide frequencies. \& Returns : a new TFBS::Matrix::ICM object \& Args : \-small_sample_correction # undef (default), \*(Aqschneider\*(Aq or \*(Aqpseudocounts\*(Aq .Ve .PP How a \s-1PFM\s0 is converted to \s-1ICM:\s0 .PP For a \s-1PFM\s0 element PFM[i,k], the probability without pseudocounts is estimated to be simply .PP .Vb 1 \& p[i,k] = PFM[i,k] / Z .Ve .PP where \&\- Z equals the column sum of the matrix i.e. the number of motifs used to construct the \s-1PFM.\s0 \&\- i is the column index (position in the motif) \&\- k is the row index (a letter in the alphacer, here k is one of (A,C,G,T) .PP Here is how one normally calculates the pseudocount-corrected positional probability p'[i,j]: .PP .Vb 1 \& p\*(Aq[i,k] = (PFM[i,k] + 0.25*sqrt(Z)) / (Z + sqrt(Z)) .Ve .PP 0.25 is for the flat distribution of nucleotides, and sqrt(Z) is the recommended pseudocount weight. In the general case, .PP .Vb 1 \& p\*(Aq[i,k] = (PFM[i,k] + q[k]*B) / (Z + B) .Ve .PP where q[k] is the background distribution of the letter (nucleotide) k, and B an arbitrary pseudocount value or expression (for no pseudocounts B=0). .PP For a given position i, the deviation from random distribution in bits is calculated as (Baldi and Brunak eq. 1.9 (2ed) or 1.8 (1ed)): .PP \&\- for an arbitrary alphabet of A letters: .PP .Vb 1 \& D[i] = log2(A) + sum_for_all_k(p[i,k]*log2(p[i,k])) .Ve .PP \&\- special case for nucleotides (A=4) .PP .Vb 1 \& D[i] = 2 + sum_for_all_k(p[i,k]*log2(p[i,k])) .Ve .PP D[i] equals the information content of the position i in the motif. To calculate the entire \s-1ICM,\s0 you have to calculate the contrubution of each nucleotide at a position i to D[i], i.e. .PP ICM[i,k] = p'[i,k] * D[i] .SS "draw_logo" .IX Subsection "draw_logo" .Vb 10 \& Title : draw_logo \& Usage : my $gd_image = $pfm\->draw_logo() \& Function: draws a sequence logo; similar to the \& method in TFBS::Matrix::ICM, but can automatically calculate \& error bars for drawing \& Returns : a GD image object (see documentation of GD module) \& Args : many; PFM\-specific options are: \& \-small_sample_correction # One of \& # "Schneider" (uses correction \& # described by Schneider et al. \& # (Schneider t et al. (1986) J.Biol.Chem. \& # "pseudocounts" \- standard pseudocount \& # correction, more suitable for \& # PFMs with large r column sums \& # If the parameter is omitted, small \& # sample correction is not applied \& \& \-draw_error_bars # if true, adds error bars to each position \& # in the logo. To calculate the error bars, \& # it uses the \-small_sample_connection \& # argument if explicitly set, \& # or "Schneider" by default \&For other args, see draw_logo entry in TFBS::Matrix::ICM documentation .Ve .SS "add_PFM" .IX Subsection "add_PFM" .Vb 5 \& Title : add_PFM \& Usage : $pfm\->add_PFM($another_pfm) \& Function: adds the values of $pnother_pfm matrix to $pfm \& Returns : reference to the updated $pfm object \& Args : a TFBS::Matrix::PFM object .Ve .SS "name" .IX Subsection "name" .SS "\s-1ID\s0" .IX Subsection "ID" .SS "class" .IX Subsection "class" .SS "matrix" .IX Subsection "matrix" .SS "length" .IX Subsection "length" .SS "revcom" .IX Subsection "revcom" .SS "rawprint" .IX Subsection "rawprint" .SS "prettyprint" .IX Subsection "prettyprint" The above methods are common to all matrix objects. Please consult TFBS::Matrix to find out how to use them.