.\" Automatically generated by Pod::Man 4.11 (Pod::Simple 3.35)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings.  \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote.  \*(C+ will
.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and
.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
.    ds -- \(*W-
.    ds PI pi
.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
.    ds L" ""
.    ds R" ""
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds -- \|\(em\|
.    ds PI \(*p
.    ds L" ``
.    ds R" ''
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
.    if \nF \{\
.        de IX
.        tm Index:\\$1\t\\n%\t"\\$2"
..
.        if !\nF==2 \{\
.            nr % 0
.            nr F 2
.        \}
.    \}
.\}
.rr rF
.\"
.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
.\" Fear.  Run.  Save yourself.  No user-serviceable parts.
.    \" fudge factors for nroff and troff
.if n \{\
.    ds #H 0
.    ds #V .8m
.    ds #F .3m
.    ds #[ \f1
.    ds #] \fP
.\}
.if t \{\
.    ds #H ((1u-(\\\\n(.fu%2u))*.13m)
.    ds #V .6m
.    ds #F 0
.    ds #[ \&
.    ds #] \&
.\}
.    \" simple accents for nroff and troff
.if n \{\
.    ds ' \&
.    ds ` \&
.    ds ^ \&
.    ds , \&
.    ds ~ ~
.    ds /
.\}
.if t \{\
.    ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
.    ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
.    ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
.    ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
.    ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
.    ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
.\}
.    \" troff and (daisy-wheel) nroff accents
.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
.ds ae a\h'-(\w'a'u*4/10)'e
.ds Ae A\h'-(\w'A'u*4/10)'E
.    \" corrections for vroff
.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
.    \" for low resolution devices (crt and lpr)
.if \n(.H>23 .if \n(.V>19 \
\{\
.    ds : e
.    ds 8 ss
.    ds o a
.    ds d- d\h'-1'\(ga
.    ds D- D\h'-1'\(hy
.    ds th \o'bp'
.    ds Th \o'LP'
.    ds ae ae
.    ds Ae AE
.\}
.rm #[ #] #H #V #F C
.\" ========================================================================
.\"
.IX Title "Genome::Model::Tools::Music::PathScan::CombinePvals 3pm"
.TH Genome::Model::Tools::Music::PathScan::CombinePvals 3pm "2020-11-06" "perl v5.30.3" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
CombinePvals \- combining probabilities from independent tests of
significance into a single aggregate figure
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 1
\&        use CombinePvals;
\&
\&        my $obj = CombinePvals\->new ($reference_to_list_of_pvals);
\&
\&        my $pval = $obj\->method_name;
\&
\&        my $pval = $obj\->method_name (@arguments);
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
There are a variety of circumstances under which one might have a number
of different kinds of tests and/or separate instances of the same kind
of test for one particular null hypothesis, where each of these tests returns a
p\-value.
The problem is how to properly condense this list of
probabilities into a single value so as to be able to make
a statistical inference, e.g. whether to reject the null
hypothesis.
This problem was examined heavily starting about the 1930s, during
which time numerous mathematical contintencies were treated, e.g. dependence
vs. independence of tests, optimality, inter-test weighting, computational
efficiency, continuous vs. discrete tests and combinations thereof,
etc.
There is quite a large mathematical literature on
this topic (see \*(L"\s-1REFERENCES\*(R"\s0 below) and any one
particular situation might incur some of the above
subtleties.
This package concentrates on some of the more straightforward
scenarios, furnishing various methods for combining
p\-vals.
The main consideration will usually be the trade-off between
the exactness of the p\-value (according to strict frequentist
modeling) and the computational efficiency, or even its actual
feasibility.
Tests should be chosen with this factor in
mind.
.PP
Note also that this scenario of combining p\-values (many
tests of a single hypothesis) is fundamentally different
from that where a given hypothesis is tested multiple
times.
The latter instance usually calls for some method of multiple testing
correction.
.SH "REFERENCES"
.IX Header "REFERENCES"
Here is an abbreviated list of the substantive works on the topic of combining
probabilities.
.IP "\(bu" 4
Birnbaum, A. (1954)
\&\fICombining Independent Tests of Significance\fR,
Journal of the American Statistical Association \fB49\fR(267), 559\-574.
.IP "\(bu" 4
David, F. N. and Johnson, N. L. (1950)
\&\fIThe Probability Integral Transformation When the Variable is Discontinuous\fR,
Biometrika \fB37\fR(1/2), 42\-49.
.IP "\(bu" 4
Fisher, R. A. (1958)
\&\fIStatistical Methods for Research Workers\fR, 13\-th Ed. Revised,
Hafner Publishing Co., New York.
.IP "\(bu" 4
Lancaster, H. O. (1949)
\&\fIThe Combination of Probabilities Arising from Data in Discrete Distributions\fR,
Biometrika \fB36\fR(3/4), 370\-382.
.IP "\(bu" 4
Littell, R. C. and Folks, J. L. (1971)
\&\fIAsymptotic Optimality of Fisher's Method of Combining Independent Tests\fR,
Journal of the American Statistical Association \fB66\fR(336), 802\-806.
.IP "\(bu" 4
Pearson, E. S. (1938)
\&\fIThe Probability Integral Transformation for Testing Goodness of Fit and
Combining Independent Tests of Significance\fR,
Biometrika \fB30\fR(12), 134\-148.
.IP "\(bu" 4
Pearson, E. S. (1950)
\&\fIOn Questions Raised by the Combination of Tests Based on Discontonuous
Distributions\fR,
Biometrika \fB37\fR(3/4), 383\-398.
.IP "\(bu" 4
Pearson, K. (1933)
\&\fIOn a Method of Determining Whether a Sample Of Size N Supposed to
Have Been Drawn From a Parent Population Having a Known Probability
Integral Has Probably Been Drawn at Random\fR
Biometrika \fB25\fR(3/4), 379\-410.
.IP "\(bu" 4
Van Valen, L. (1964)
\&\fICombining the Probabilities from Significance Tests\fR,
Nature \fB201\fR(4919), 642.
.IP "\(bu" 4
Wallis, W. A. (1942)
\&\fICompounding Probabilities from Independent Significance Tests\fR,
Econometrica \fB10\fR(3/4), 229\-248.
.IP "\(bu" 4
Zelen, M. and Joel, L. S. (1959)
\&\fIThe Weighted Compounding of Two Independent Significance Tests\fR,
Annals of Mathematical Statistics \fB30\fR(4), 885\-895.
.SH "AUTHOR"
.IX Header "AUTHOR"
Michael C. Wendl
.PP
mwendl@wustl.edu
.PP
Copyright (C) 2009 Washington University
.PP
This program is free software; you can redistribute it and/or modify
it under the terms of the \s-1GNU\s0 General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
.PP
This program is distributed in the hope that it will be useful,
but \s-1WITHOUT ANY WARRANTY\s0; without even the implied warranty of
\&\s-1MERCHANTABILITY\s0 or \s-1FITNESS FOR A PARTICULAR PURPOSE.\s0  See the
\&\s-1GNU\s0 General Public License for more details.
.PP
You should have received a copy of the \s-1GNU\s0 General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place \- Suite 330, Boston, \s-1MA 02111\-1307, USA.\s0
.SH "GENERAL REMARKS ON METHODS"
.IX Header "GENERAL REMARKS ON METHODS"
The available methods are listed
below.
Each of computational techniques assumes that tests, as
well as their associated p\-values, are independent of one
another and none considers any form of differential
weighting.
.SH "CONSTRUCTOR METHODS"
.IX Header "CONSTRUCTOR METHODS"
These methods return an object in the CombinePvals
class.
.SS "new"
.IX Subsection "new"
This is the usual object constructor, which takes a mandatory,
but otherwise un-ordered (reference to a) list of the p\-values
obtained by a set of independent
tests.
.PP
.Vb 1
\&        my $obj = CombinePvals\->new ([0.103, 0.078, 0.03, 0.2,...]);
.Ve
.PP
The method checks to make sure that all elements are actual
p\-values, i.e. they are real numbers and they have values bounded by 0 and
1.
.SH "EXACT ENUMERATIVE PROCEDURES FOR STRICTLY DISCRETE DISTRIBUTIONS"
.IX Header "EXACT ENUMERATIVE PROCEDURES FOR STRICTLY DISCRETE DISTRIBUTIONS"
When all the individual p\-vals are derived from tests based on discrete
distributions, the \*(L"standard\*(R" continuum methods cannot be used in the strictest
sense.
Both Wallis (1942) and Lancaster (1949) discuss the option of
full enumeration, which will only be feasible when there are a
limited number of p\-values and their range is not too
large.
Feasibility experiments are suggested, depending
upon the type of hardware and size of
calculation.
.SS "exact_enum_arbitrary"
.IX Subsection "exact_enum_arbitrary"
This routine is designed for combining p\-values
from completely arbitrary discrete probability
distributions.
It takes a list-of-lists data structure, each list being the probability
tails \fIordered from most extreme to least extreme\fR (i.e. as a probability
cumulative density function) associated with each individual
test.
However, the ordering of the lists themselves is not
important.
For instance, Wallis (1942) gives the example of two binomials, a one-tailed
test having tail values of 0.0625, 0.3125, 0.6875, 0.9375, and 1, and a
two-tailed test having tail values 0.125, 0.625, and
1.
We would then call this method using
.PP
.Vb 4
\&        my $pval = $obj\->exact_enum_arbitrary (
\&           [0.0625, 0.3125, 0.6875, 0.9375, 1],
\&           [0.125, 0.625, 1]
\&        );
.Ve
.PP
The internal computational method is relatively
straightforard and described in detail by
Wallis (1942).
Note that this method does \*(L"all-by-all\*(R" multiplication,
so it is the least efficient, although entirely
exact.
.SS "exact_enum_identical"
.IX Subsection "exact_enum_identical"
This routine is designed for combining a set of p\-values that
all come from a single probability
distribution.
.PP
.Vb 1
\&        NOT IMPLEMENTED YET
.Ve
.SH "TRANSFORMS FOR CONTINUOUS DISTRIBUTIONS"
.IX Header "TRANSFORMS FOR CONTINUOUS DISTRIBUTIONS"
The mathematical literature furnishes several straightforward
options for combining p\-vals if all of the distributions
underlying all of the individual tests are
continuous.
.SS "fisher_chisq_transform"
.IX Subsection "fisher_chisq_transform"
This routine implements R.A. Fisher's (1958, originally 1932) chi-square
transform method for combining p\-vals from continuous
distributions, which is essentially a CPU-efficient approximation of
K. Pearson's log-based result (see e.g. Wallis (1942) pp
232).
Note that the underlying distributions are
not actually relevant, so no arguments are
passed.
.PP
.Vb 1
\&        my $pval = $obj\->fisher_chisq_transform;
.Ve
.PP
This is certainly the fastest and easiest method for combining p\-vals,
but its accuracy for discrete distributions will not usually be very
good.
For such cases, an exact or a corrected method are better
choices.
.SH "CORRECTION PROCEDURES FOR DISCRETE DISTRIBUTIONS: LANCASTER'S MODELS"
.IX Header "CORRECTION PROCEDURES FOR DISCRETE DISTRIBUTIONS: LANCASTER'S MODELS"
Enumerative procedures quickly become infeasible if the
number of tests and/or the support of each test grow
large.
A number of procedures have been described for
correcting the methodologies designed for continuum
testing, mostly in the context of applying so-called continuity
corrections.
Essentially, these seek to \*(L"spread\*(R" dicrete data out into a pseudo-continuous
configuration as appropriate as possible, and then apply standard
transforms.
Accuracy varies and should be suitably established in each
case.
.PP
The methods in this section are due to H.O. Lancaster (1949), who discussed
two corrections based upon the idea of describing how a chi-square transformed
statistic varies between the points of a discrete
distribution.
Unfortunately, these methods require one to pass some extra information
to the routines, i.e. not only the \s-1CDF\s0 (the p\-val of each test), but the
\&\s-1CDF\s0 value associated with the next-most-extreme
statistic.
These two pieces of information are the basis of
interpolating.
For example, if an underlying distribution has the possible tail values of
0.0625, 0.3125, 0.6875, 0.9375, 1 and the test itself has a value of
0.6875, then you would pass \fIboth\fR 0.3125 \fIand\fR 0.6875 to the
routine.
\&\fIIn all cases, the lower value, i.e. the more
extreme one, precedes higher value in the argument
list.\fR
While there generally will be some extra inconvenience in obtaining
this information, the accuracy is much improved over Fisher's
method.
.SS "lancaster_mean_corrected_transform"
.IX Subsection "lancaster_mean_corrected_transform"
This method is based on the mean value of the chi-squared transformed
statistic.
.PP
.Vb 1
\&        my $pval = $obj\->lancaster_mean_corrected_transform (@cdf_pairs);
.Ve
.PP
Its accuracy is good, but the method is not strictly defined if one
of the tests has either the most extreme or second-to-most-extreme
statistic.
.SS "lancaster_median_corrected_transform"
.IX Subsection "lancaster_median_corrected_transform"
This method is based on the median value of the chi-squared transformed
statistic.
.PP
.Vb 1
\&        my $pval = $obj\->lancaster_median_corrected_transform (@cdf_pairs);
.Ve
.PP
Its accuracy may sometimes be not quite as good as when using the
average, but the method is strictly defined for \fIall\fR values of the
statistic.
.SS "lancaster_mixed_corrected_transform"
.IX Subsection "lancaster_mixed_corrected_transform"
This method is a mixture of both the mean and median
methods.
Specifically, mean correction is used wherever it
is well-defined, otherwise median correction is
used.
.PP
.Vb 1
\&        my $pval = $obj\->lancaster_mixed_corrected_transform (@cdf_pairs);
.Ve
.PP
This will be a good way to handle certain
cases.
.SS "additional methods"
.IX Subsection "additional methods"
The basic functionality of this package is encompassed in the methods described
above.
However, some lower-level functions can also sometimes be
useful.
.PP
\fIexact_enum_arbitrary_2\fR
.IX Subsection "exact_enum_arbitrary_2"
.PP
Hard-wired precursor of \fIexact_enum_arbitrary\fR for 2
distributions.
Does no pre-checking, but may be useful for
comparing to the output of the general
program.
.PP
\fIexact_enum_arbitrary_3\fR
.IX Subsection "exact_enum_arbitrary_3"
.PP
Hard-wired precursor of \fIexact_enum_arbitrary\fR for 3
distributions.
Does no pre-checking, but may be useful for
comparing to the output of the general
program.
.PP
\fIbinom_coeffs\fR
.IX Subsection "binom_coeffs"
.PP
Calculates the binomial coefficients needed
in the binomial (convolution) approximate
solution.
.PP
.Vb 1
\&        $pmobj\->binom_coeffs;
.Ve
.PP
The internal data structure is essentially the
symmetric half of the appropriately-sized Pascal
triangle.
Considerable memory is saved by not storing the full
triangle.