mlpack_preprocess_split(1) | User Commands | mlpack_preprocess_split(1) |
NAME¶
mlpack_preprocess_split - split dataSYNOPSIS¶
mlpack_preprocess_split -i string [-I string] [-s int] [-r double] [-V bool] [-T string] [-L string] [-t string] [-l string] [-h -v]
DESCRIPTION¶
This utility takes a dataset and optionally labels and splits them into a training set and a test set. Before the split, the points in the dataset are randomly reordered. The percentage of the dataset to be used as the test set can be specified with the '--test_ratio (-r)' parameter; the default is 0.2 (20%).The output training and test matrices may be saved with the '--training_file (-t)' and '--test_file (-T)' output parameters.
Optionally, labels can be also be split along with the data by specifying the ’--input_labels_file (-I)' parameter. Splitting labels works the same way as splitting the data. The output training and test labels may be saved with the ’--training_labels_file (-l)' and '--test_labels_file (-L)' output parameters, respectively.
So, a simple example where we want to split the dataset 'X.csv' into ’X_train.csv' and 'X_test.csv' with 60% of the data in the training set and 40% of the dataset in the test set, we could run
$ preprocess_split --input_file X.csv --training_file X_train.csv --test_file X_test.csv --test_ratio 0.4
If we had a dataset 'X.csv' and associated labels 'y.csv', and we wanted to split these into 'X_train.csv', 'y_train.csv', 'X_test.csv', and 'y_test.csv', with 30% of the data in the test set, we could run
$ preprocess_split --input_file X.csv --input_labels_file y.csv --test_ratio 0.3 --training_file X_train.csv --training_labels_file y_train.csv --test_file X_test.csv --test_labels_file y_test.csv
REQUIRED INPUT OPTIONS¶
- --input_file (-i) [string]
- Matrix containing data.
OPTIONAL INPUT OPTIONS¶
- --help (-h) [bool]
- Default help info.
- --info [string]
- Get help on a specific module or option. Default value ''.
- --input_labels_file (-I) [string]
- Matrix containing labels. Default value ''.
- --seed (-s) [int]
- Random seed (0 for std::time(NULL)). Default value 0.
- --test_ratio (-r) [double]
- Ratio of test set; if not set,the ratio defaults to 0.2 Default value 0.2.
- --verbose (-v) [bool]
- Display informational messages and the full list of parameters and timers at the end of execution.
- --version (-V) [bool]
- Display the version of mlpack.
OPTIONAL OUTPUT OPTIONS¶
- --test_file (-T) [string]
- Matrix to save test data to. Default value ''.
- --test_labels_file (-L) [string]
- Matrix to save test labels to. Default value ''.
- --training_file (-t) [string]
- Matrix to save training data to. Default value ''.
- --training_labels_file (-l) [string]
- Matrix to save train labels to. Default value ''.
ADDITIONAL INFORMATION¶
For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your distribution of mlpack.18 November 2018 | mlpack-3.0.4 |