.\" Text automatically generated by txt2man
.TH mlpack_random_forest 1 "12 December 2020" "mlpack-3.4.2" "User Commands"
.SH NAME
\fBmlpack_random_forest \fP- random forests
.SH SYNOPSIS
.nf
.fam C
 \fBmlpack_random_forest\fP [\fB-m\fP \fIunknown\fP] [\fB-l\fP \fIstring\fP] [\fB-D\fP \fIint\fP] [\fB-g\fP \fIdouble\fP] [\fB-n\fP \fIint\fP] [\fB-N\fP \fIint\fP] [\fB-a\fP \fIbool\fP] [\fB-s\fP \fIint\fP] [\fB-d\fP \fIint\fP] [\fB-T\fP \fIstring\fP] [\fB-L\fP \fIstring\fP] [\fB-t\fP \fIstring\fP] [\fB-V\fP \fIbool\fP] [\fB-M\fP \fIunknown\fP] [\fB-p\fP \fIstring\fP] [\fB-P\fP \fIstring\fP] [\fB-h\fP \fB-v\fP] 
.fam T
.fi
.fam T
.fi
.SH DESCRIPTION


This program is an implementation of the standard random forest classification
algorithm by Leo Breiman. A random forest can be trained and saved for later
use, or a random forest may be loaded and predictions or class probabilities
for points may be generated.
.PP
The training set and associated labels are specified with the '\fB--training_file\fP
(\fB-t\fP)' and '\fB--labels_file\fP (\fB-l\fP)' parameters, respectively. The labels should be
in the range [0, num_classes - 1]. Optionally, if '\fB--labels_file\fP (\fB-l\fP)' is not
specified, the labels are assumed to be the last dimension of the training
dataset.
.PP
When a model is trained, the '\fB--output_model_file\fP (\fB-M\fP)' output parameter may
be used to save the trained model. A model may be loaded for predictions with
the '\fB--input_model_file\fP (\fB-m\fP)'parameter. The '\fB--input_model_file\fP (\fB-m\fP)'
parameter may not be specified when the '\fB--training_file\fP (\fB-t\fP)' parameter is
specified. The '\fB--minimum_leaf_size\fP (\fB-n\fP)' parameter specifies the minimum
number of training points that must fall into each leaf for it to be split. 
The '\fB--num_trees\fP (\fB-N\fP)' controls the number of trees in the random forest. The
\(cq\fB--minimum_gain_split\fP (\fB-g\fP)' parameter controls the minimum required gain for a
decision tree node to split. Larger values will force higher-confidence
splits. The '\fB--maximum_depth\fP (\fB-D\fP)' parameter specifies the maximum depth of
the tree. The '\fB--subspace_dim\fP (\fB-d\fP)' parameter is used to control the number
of random dimensions chosen for an individual node's split. If
\(cq\fB--print_training_accuracy\fP (\fB-a\fP)' is specified, the calculated accuracy on the
training set will be printed.
.PP
Test data may be specified with the '\fB--test_file\fP (\fB-T\fP)' parameter, and if
performance measures are desired for that test set, labels for the test points
may be specified with the '\fB--test_labels_file\fP (\fB-L\fP)' parameter. Predictions
for each test point may be saved via the '\fB--predictions_file\fP (\fB-p\fP)'output
parameter. Class probabilities for each prediction may be saved with the
\(cq\fB--probabilities_file\fP (\fB-P\fP)' output parameter.
.PP
For example, to train a random forest with a minimum leaf size of 20 using 10
trees on the dataset contained in 'data.csv'with labels 'labels.csv', saving
the output random forest to 'rf_model.bin' and printing the training error,
one could call
.PP
$ \fBmlpack_random_forest\fP \fB--training_file\fP data.csv \fB--labels_file\fP labels.csv
\fB--minimum_leaf_size\fP 20 \fB--num_trees\fP 10 \fB--output_model_file\fP rf_model.bin
\fB--print_training_accuracy\fP
.PP
Then, to use that model to classify points in 'test_set.csv' and print the
test error given the labels 'test_labels.csv' using that model, while saving
the predictions for each point to 'predictions.csv', one could call 
.PP
$ \fBmlpack_random_forest\fP \fB--input_model_file\fP rf_model.bin \fB--test_file\fP
test_set.csv \fB--test_labels_file\fP test_labels.csv \fB--predictions_file\fP
predictions.csv
.RE
.PP

.SH OPTIONAL INPUT OPTIONS 

.TP
.B
\fB--help\fP (\fB-h\fP) [\fIbool\fP]
Default help info. 
.TP
.B
\fB--info\fP [\fIstring\fP]
Print help on a specific option. Default value ''. 
.TP
.B
\fB--input_model_file\fP (\fB-m\fP) [\fIunknown\fP]
Pre-trained random forest to use for classification. 
.TP
.B
\fB--labels_file\fP (\fB-l\fP) [\fIstring\fP]
Labels for training dataset. 
.TP
.B
\fB--maximum_depth\fP (\fB-D\fP) [\fIint\fP]
Maximum depth of the tree (0 means no limit).  Default value 0. 
.TP
.B
\fB--minimum_gain_split\fP (\fB-g\fP) [\fIdouble\fP]
Minimum gain needed to make a split when building a tree. Default value 0. 
.TP
.B
\fB--minimum_leaf_size\fP (\fB-n\fP) [\fIint\fP]
Minimum number of points in each leaf node.  Default value 1. 
.TP
.B
\fB--num_trees\fP (\fB-N\fP) [\fIint\fP]
Number of trees in the random forest. Default value 10. 
.TP
.B
\fB--print_training_accuracy\fP (\fB-a\fP) [\fIbool\fP]
If set, then the accuracy of the model on the training set will be predicted (verbose must also be specified). 
.TP
.B
\fB--seed\fP (\fB-s\fP) [\fIint\fP]
Random seed. If 0, 'std::time(NULL)' is used.  Default value 0. 
.TP
.B
\fB--subspace_dim\fP (\fB-d\fP) [\fIint\fP]
Dimensionality of random subspace to use for each split. '0' will autoselect the square root of data dimensionality. Default value 0. 
.TP
.B
\fB--test_file\fP (\fB-T\fP) [\fIstring\fP]
Test dataset to produce predictions for. 
.TP
.B
\fB--test_labels_file\fP (\fB-L\fP) [\fIstring\fP]
Test dataset labels, if accuracy calculation is desired. 
.TP
.B
\fB--training_file\fP (\fB-t\fP) [\fIstring\fP]
Training dataset. 
.TP
.B
\fB--verbose\fP (\fB-v\fP) [\fIbool\fP]
Display informational messages and the full list of parameters and timers at the end of execution. 
.TP
.B
\fB--version\fP (\fB-V\fP) [\fIbool\fP]
Display the version of mlpack.  
.SH OPTIONAL OUTPUT OPTIONS 

.TP
.B
\fB--output_model_file\fP (\fB-M\fP) [\fIunknown\fP]
Model to save trained random forest to. 
.TP
.B
\fB--predictions_file\fP (\fB-p\fP) [\fIstring\fP]
Predicted classes for each point in the test set. 
.TP
.B
\fB--probabilities_file\fP (\fB-P\fP) [\fIstring\fP]
Predicted class probabilities for each point in the test set.
.SH ADDITIONAL INFORMATION

For further information, including relevant papers, citations, and theory,
consult the documentation found at http://www.mlpack.org or included with your
distribution of mlpack.