mlpack_preprocess_split(26 December 2016) | mlpack_preprocess_split(26 December 2016) |
NAME¶
mlpack_preprocess_split - split dataSYNOPSIS¶
mlpack_preprocess_split [-h] [-v]
DESCRIPTION¶
This utility takes a dataset and optionally labels and splits them into a training set and a test set. Before the split, the points in the dataset are randomly reordered. The percentage of the dataset to be used as the test set can be specified with the --test_ratio (-r) option; the default is 0.2 (20%).The program does not modify the original file, but instead makes separate files to save the training and test files; The program requires you to specify the file names with --training_file (-t) and --test_file (-T).
Optionally, labels can be also be split along with the data by specifying the --input_labels_file (-I) option. Splitting labels works the same way as splitting the data. The output training and test labels will be saved to the files specified by --training_labels_file (-l) and --test_labels_file (-L), respectively.
So, a simple example where we want to split dataset.csv into train.csv and test.csv with 60% of the data in the training set and 40% of the dataset in the test set, we could run
$ mlpack_preprocess_split -i dataset.csv -t train.csv -T test.csv -r 0.4
If we had a dataset in dataset.csv and associated labels in labels.csv, and we wanted to split these into training_set.csv, training_labels.csv, test_set.csv, and test_labels.csv, with 30% of the data in the test set, we could run
$ mlpack_preprocess_split -i dataset.csv -I labels.csv -r 0.3 > -t training_set.csv -l training_labels.csv -T test_set.csv > -L test_labels.csv
REQUIRED INPUT OPTIONS¶
- --input_file (-i) [string]
- File containing data,
OPTIONAL INPUT OPTIONS¶
- --help (-h)
- Default help info.
- --info [string]
- Get help on a specific module or option. Default value ''. --input_labels_file (-I) [string] File containing labels Default value ''.
- --test_ratio (-r) [double]
- Ratio of test set; if not set,the ratio defaults to 0.2 Default value 0.2.
- --verbose (-v)
- Display informational messages and the full list of parameters and timers at the end of execution.
- --version (-V)
- Display the version of mlpack.
OPTIONAL OUTPUT OPTIONS¶
- --test_file (-T) [string]
- File name to save test data Default value ''. --test_labels_file (-L) [string] File name to save test label Default value ''. --training_file (-t) [string] File name to save train data Default value ''. --training_labels_file (-l) [string] File name to save train label Default value ’'.