.TH PBSSCOREMATRIX "1" "May 2016" "pbsScoreMatrix 1.4" "User Commands" .SH NAME pbsScoreMatrix \- Generate log-odds score matrices for use in alignment of .SH DESCRIPTION Generate log\-odds score matrices for use in alignment of probabilistic biological sequences (PBSs). By default, generates a matrix for every branch of the tree (as defined in tree.mod), but can also generate a matrix for a given branch length (see \fB\-\-branch\-length\fR). For a code size of N, an N x N matrix is generated by default; \fB\-\-half\-pbs\fR will produce an N x 4 matrix, and \fB\-\-no\-pbs\fR will produce a 4 x 4 matrix (assuming a four\-character nucleotide alphabet). .PP Two sequences are assumed to have evolved from a common ancestor by a reversible continuous\-time Markov substitution process, and to be separated by a branch of length t. The conditional probability of a base j in one sequence given a base i in the other, P(j | i, t) is given by element (i, j) of the matrix .IP P(t) = exp(Qt) .PP where Q is the rate matrix defining the substitution process, and element (i, j) of Q is the instantaneous rate at which base i changes to base j. .PP Let S_t(i, j) be a log odds score for the alignment of two bases, i and j, based on P(t): .IP S_t(i, j) = log P(i, j | t) / (pi(i) * pi(j)) .IP = log P(j | i, t) pi(i) / (pi(i) * pi(j)) .IP = log P(j | i, t) / pi(j) (1) .PP where pi(x) is the "equilibrium" or "background" probability of base x. Because of reversibility, S(i, j) = S(j, i), and the S(i, j) form a symmetric 4 x 4 matrix. This is the matrix that is generated by pbsScoreMatrix with the \fB\-\-no\-pbs\fR option. If each "letter" in each sequence represents a probability distribution over bases, as in a PBS, then the score for two letters k and l can be shown to be .IP S'_t(k, l) = log sum_i sum_j p_k(i) p_l(j) exp S_t(i, j) (2) .PP where the two sums are over the four bases, p_k(i) is the probability of base i under the distribution for k, and p_l(j) is the probability of base j under the distribution for l. .PP Notice that (2) reduces to (1) when p_k(i) = p_l(j) = 1 for some i and j and for all other i' and j' p_k(i') = p_l(j') = 0 (i.e., when all of the probability mass is on a single base in both distributions and the PBS reduces to an ordinary nucleotide sequence). The special case of p_l(j) = 1 only is also of interest when aligning a PBS and a nucleotide sequence: .IP S''_t(k, j) = log sum_i p_k(i) exp S_t(i, j) (3) .PP This is the matrix generated by pbsScoreMatrix with the \fB\-\-half\-pbs\fR option. Note: all logs are base 2. .SH EXAMPLE Generate an N x N matrix for every branch of the tree, using a code file "code" (generated by pbsTrain) and a tree model file "mytree.mod" (generated by phyloFit): .IP pbsScoreMatrix mytree.mod code > matrices.dat .PP Generate an N x N matrix for a branch length of 0.2 expected substitutions per site. .IP pbsScoreMatrix \fB\-\-branch\-length\fR 0.2 mytree.mod code > matrix.dat .PP Generate an N x 4 matrix: .IP pbsScoreMatrix \fB\-\-branch\-length\fR 0.2 \fB\-\-half\-pbs\fR mytree.mod code > matrix.dat .PP Generate a 4 x 4 matrix: .IP pbsScoreMatrix \fB\-\-branch\-length\fR 0.2 \fB\-\-no\-pbs\fR code mytree.mod \f(CW> matrix.dat\fR .PP (In this case, a code file is not needed.) .SH OPTIONS .HP \fB\-\-branch\-length\fR, \fB\-t\fR .IP Output a matrix for a branch of the specified length, rather than a matrix for every branch of the tree. The given length must be non\-negative and in units of expected substitutions per site. .HP \fB\-\-half\-pbs\fR, \fB\-H\fR .IP Output an N x 4 matrix, as described above. .HP \fB\-\-no\-pbs\fR, \fB\-N\fR Output a 4 x 4 matrix, as described above. With this option, a code file is not needed. .HP \fB\-\-help\fR, \fB\-h\fR .IP Show this help message.