.\" Text automatically generated by txt2man .TH mlpack_decision_tree 1 "18 November 2018" "mlpack-3.0.4" "User Commands" .SH NAME \fBmlpack_decision_tree \fP- decision tree .SH SYNOPSIS .nf .fam C \fBmlpack_decision_tree\fP [\fB-m\fP \fIunknown\fP] [\fB-l\fP \fIstring\fP] [\fB-g\fP \fIdouble\fP] [\fB-n\fP \fIint\fP] [\fB-e\fP \fIbool\fP] [\fB-T\fP \fIstring\fP] [\fB-L\fP \fIstring\fP] [\fB-t\fP \fIstring\fP] [\fB-V\fP \fIbool\fP] [\fB-w\fP \fIstring\fP] [\fB-M\fP \fIunknown\fP] [\fB-p\fP \fIstring\fP] [\fB-P\fP \fIstring\fP] [\fB-h\fP \fB-v\fP] .fam T .fi .fam T .fi .SH DESCRIPTION Train and evaluate using a decision tree. Given a dataset containing numeric or categorical features, and associated labels for each point in the dataset, this program can train a decision tree on that data. .PP The training set and associated labels are specified with the '\fB--training_file\fP (\fB-t\fP)' and '\fB--labels_file\fP (\fB-l\fP)' parameters, respectively. The labels should be in the range [0, num_classes - 1]. Optionally, if '\fB--labels_file\fP (\fB-l\fP)' is not specified, the labels are assumed to be the last dimension of the training dataset. .PP When a model is trained, the '\fB--output_model_file\fP (\fB-M\fP)' output parameter may be used to save the trained model. A model may be loaded for predictions with the '\fB--input_model_file\fP (\fB-m\fP)' parameter. The '\fB--input_model_file\fP (\fB-m\fP)' parameter may not be specified when the '\fB--training_file\fP (\fB-t\fP)' parameter is specified. The '\fB--minimum_leaf_size\fP (\fB-n\fP)' parameter specifies the minimum number of training points that must fall into each leaf for it to be split. The '\fB--minimum_gain_split\fP (\fB-g\fP)' parameter specifies the minimum gain that is needed for the node to split. If '\fB--print_training_error\fP (\fB-e\fP)' is specified, the training error will be printed. .PP Test data may be specified with the '\fB--test_file\fP (\fB-T\fP)' parameter, and if performance numbers are desired for that test set, labels may be specified with the '\fB--test_labels_file\fP (\fB-L\fP)' parameter. Predictions for each test point may be saved via the '\fB--predictions_file\fP (\fB-p\fP)' output parameter. Class probabilities for each prediction may be saved with the '\fB--probabilities_file\fP (\fB-P\fP)' output parameter. .PP For example, to train a decision tree with a minimum leaf size of 20 on the dataset contained in 'data.csv' with labels 'labels.csv', saving the output model to 'tree.bin' and printing the training error, one could call .PP $ decision_tree \fB--training_file\fP data.arff \fB--labels_file\fP labels.csv \fB--output_model_file\fP tree.bin \fB--minimum_leaf_size\fP 20 \fB--minimum_gain_split\fP 0.001 \fB--print_training_error\fP .PP Then, to use that model to classify points in 'test_set.csv' and print the test error given the labels 'test_labels.csv' using that model, while saving the predictions for each point to 'predictions.csv', one could call .PP $ decision_tree \fB--input_model_file\fP tree.bin \fB--test_file\fP test_set.arff \fB--test_labels_file\fP test_labels.csv \fB--predictions_file\fP predictions.csv .RE .PP .SH OPTIONAL INPUT OPTIONS .TP .B \fB--help\fP (\fB-h\fP) [\fIbool\fP] Default help info. .TP .B \fB--info\fP [\fIstring\fP] Get help on a specific module or option. Default value ''. .TP .B \fB--input_model_file\fP (\fB-m\fP) [\fIunknown\fP] Pre-trained decision tree, to be used with test points. Default value ''. .TP .B \fB--labels_file\fP (\fB-l\fP) [\fIstring\fP] Training labels. Default value ''. .TP .B \fB--minimum_gain_split\fP (\fB-g\fP) [\fIdouble\fP] Minimum gain for node splitting. Default value 1e-07. .TP .B \fB--minimum_leaf_size\fP (\fB-n\fP) [\fIint\fP] Minimum number of points in a leaf. Default value 20. .TP .B \fB--print_training_error\fP (\fB-e\fP) [\fIbool\fP] Print the training error. .TP .B \fB--test_file\fP (\fB-T\fP) [\fIstring\fP] Testing dataset (may be categorical). Default value ''. .TP .B \fB--test_labels_file\fP (\fB-L\fP) [\fIstring\fP] Test point labels, if accuracy calculation is desired. Default value ''. .TP .B \fB--training_file\fP (\fB-t\fP) [\fIstring\fP] Training dataset (may be categorical). Default value ''. .TP .B \fB--verbose\fP (\fB-v\fP) [\fIbool\fP] Display informational messages and the full list of parameters and timers at the end of execution. .TP .B \fB--version\fP (\fB-V\fP) [\fIbool\fP] Display the version of mlpack. .TP .B \fB--weights_file\fP (\fB-w\fP) [\fIstring\fP] The weight of labels Default value ''. .SH OPTIONAL OUTPUT OPTIONS .TP .B \fB--output_model_file\fP (\fB-M\fP) [\fIunknown\fP] Output for trained decision tree. Default value ''. .TP .B \fB--predictions_file\fP (\fB-p\fP) [\fIstring\fP] Class predictions for each test point. Default value ''. .TP .B \fB--probabilities_file\fP (\fB-P\fP) [\fIstring\fP] Class probabilities for each test point. Default value ''. .SH ADDITIONAL INFORMATION For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your distribution of mlpack.