NAME¶
pymvpa2-crossval - cross-validation of a learner's performance
SYNOPSIS¶
pymvpa2 crossval [
--version] [
-h]
-i DATASET
[
DATASET ...]
--learner LEARNER [
--learner-space
LEARNER_SPACE]
--partitioner PARTITIONER [
--errorfx ERRORFX]
[
--avg-datafold-results] [
--balance-training BALANCE_TRAINING]
[
--sampling-repetitions SAMPLING_REPETITIONS] [
--permutations
PERMUTATIONS] [
--prob-tail {left,right}]
-o OUTPUT
[
--hdf5-compression TYPE]
DESCRIPTION¶
Cross-validation of a learner's performance
A learner is repeatedly trained and tested on partitions of an input dataset
that are generated by a configurable partitioning scheme. Partition usually
constitute training and testing portions. The learner is trained on training
portion of the dataset and then learner's generalization is tested by
comparing its predictions on the testing portion.
A summary of a learner performance is written to STDOUT. Depending on the
particular setup of the cross-validation analysis, either the learner's raw
predictions or summary statistics are returned in an output dataset.
If Monte-Carlo permutation testing is enabled (see
--permutations) a
second output dataset with the corresponding p-values is stored as well
(filename suffix '_nullprob').
OPTIONS¶
- --version
- show program's version and license information and exit
- -h, --help, --help-np
- show this help message and exit. --help-np forcefully disables the
use of a pager for displaying the help.
- -i DATASET [DATASET ...], --input DATASET [DATASET ...]
- path(s) to one or more PyMVPA dataset files. All datasets will be merged
into a single dataset (vstack'ed) in order of specification. In some cases
this option may need to be specified more than once if multiple, but
separate, input datasets are required.
Options for cross-validation setup:¶
- --learner LEARNER
- select a learner (trainable node) via its description in the learner
warehouse (see 'info' command for a listing), a colon-separated list of
capabilities, or by a file path to a Python script that creates a
classifier instance (advanced).
- --learner-space LEARNER_SPACE
- name of a sample attribute that defines the model to be learned by a
learner. By default this is an attribute named 'targets'.
- --partitioner PARTITIONER
- select a data folding scheme. Supported arguments are: 'half' for
split-half partitioning, 'oddeven' for partitioning into odd and even
chunks, 'group-X' where X can be any positive integer for partitioning in
X groups, 'n-X' where X can be any positive integer for leave-X-chunks out
partitioning. By default partitioners operate on dataset chunks that are
defined by a 'chunks' sample attribute. The name of the
"chunking" attribute can be changed by appending a colon and the
name of the attribute (e.g. 'oddeven:run'). optionally an argument to this
option can also be a file path to a Python script that creates a custom
partitioner instance (advanced).
- --errorfx ERRORFX
- error function to be applied to the targets and predictions of each
cross-validation data fold. This can either be a name of any error
function in PyMVPA's mvpa2.misc.errorfx module, or a file path to a Python
script that creates a custom error function (advanced).
- --avg-datafold-results
- average result values across data folds generated by the partitioner. For
example to compute a mean prediction error across all folds of a
crossvalidation procedure.
- --balance-training BALANCE_TRAINING
- If enabled, training samples are balanced within each data fold. If the
keyword 'equal' is given as argument an equal number of random samples for
each unique target value is chosen. The number of samples per category is
determined by the category with the least number of samples in the
respective training set. An integer argument will cause the a
corresponding number of samples per category to be randomly selected. A
floating point number argument (interval [0,1]) indicates what fraction of
the available samples shall be selected.
- --sampling-repetitions SAMPLING_REPETITIONS
- If training set balancing is enabled, how often should random sample
selection be performed for each data fold. Default: 1
- --permutations PERMUTATIONS
- Number of Monte-Carlo permutation runs to be computed for estimating an H0
distribution for all crossvalidation results. Enabling this option will
make reports of corresponding p-values available in the result summary and
output.
- --prob-tail {left,right}
- which tail of the probability distribution to report p-values from when
evaluating permutation test results. For example, a cross-validation
computing mean prediction error could report left-tail p-value for a
single-sided test.
Output options:¶
- -o OUTPUT, --output OUTPUT
- output filename ('.hdf5' extension is added automatically if necessary).
NOTE: The output format is suitable for data exchange between PyMVPA
commands, but is not recommended for long-term storage or exchange as its
specific content may vary depending on the actual software environment.
For long-term storage consider conversion into other data formats (see
'dump' command).
- --hdf5-compression TYPE
- compression type for HDF5 storage. Available values depend on the specific
HDF5 installation. Typical values are: 'gzip', 'lzf', 'szip', or integers
from 1 to 9 indicating gzip compression levels.
AUTHOR¶
Written by Michael Hanke & Yaroslav Halchenko, and numerous other
contributors.
COPYRIGHT¶
Copyright © 2006-2014 PyMVPA developers
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation the
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO
EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.