NAME¶
logistic_regression - l2-regularized logistic regression and prediction
SYNOPSIS¶
logistic_regression [-h] [-v] [-d double] [-i string] [-r string] [-l double] [-M int] [-m string] [-O string] [-o string] [-p string] [-s double] [-t string] [-T double] -V
DESCRIPTION¶
An implementation of L2-regularized logistic regression using either the L-BFGS
optimizer or SGD (stochastic gradient descent). This solves the regression
problem
y = (1 / 1 + e^-(X * b))
where y takes values 0 or 1. Training the model is done by giving labeled data
and iteratively training the parameters vector b. The matrix of predictors (or
features) X is specified with the
--input_file option, and the vector
of responses y is either the last column of the matrix given with
--input_file, or a separate one-column vector given with the
--input_responses option. After training, the calculated b is saved to
the file specified by
--output_file. An initial guess for b can be
specified when the
--model_file parameter is given with
--input_file or
--input_responses. The tolerance of the
optimizer can be set with
--tolerance; the maximum number of iterations
of the optimizer can be set with
--max_iterations; and the type of the
optimizer (SGD / L-BFGS) can be set with the
--optimizer option. Both
the SGD and L-BFGS optimizers have more options, but the C++ interface must be
used for those. For the SGD optimizer, the
--step_size parameter
controls the step size taken at each iteration by the optimizer. If the
objective function for your data is oscillating between Inf and 0, the step
size is probably too large.
This implementation of logistic regression supports L2-regularization, which can
help the parameter vector b from overfitting. This parameter is specified with
the
--lambda option; by default, it is 0 (which means no regularization
is performed).
Optionally, the calculated value of b is used to predict the responses for
another matrix of data points, if
--test_file is specified. The
--test_file option can be specified without
--input_file, so
long as an existing logistic regression model is given with
--model_file. The output predictions from the logistic regression model
are stored in the file given with
--output_predictions.
This implementation of logistic regression does not support the general
multi-class case but instead only the two-class case. Any responses must be
either 0 or 1.
REQUIRED OPTIONS¶
OPTIONS¶
- --decision_boundary (-d) [double]
- Decision boundary for prediction; if the logistic function for a point is
less than the boundary, the class is taken to be 0; otherwise, the class
is 1. Default value 0.5.
- --help (-h)
- Default help info.
- --info [string]
- Get help on a specific module or option. Default value ''.
- --input_file (-i) [string]
- File containing X (predictors). Default value ''.
- --input_responses (-r) [string]
- Optional file containing y (responses). If not given, the responses are
assumed to be the last column of the input file. Default value ''.
- --lambda (-l) [double]
- L2-regularization parameter for training. Default value 0.
- --max_iterations (-M) [int]
- Maximum iterations for optimizer (0 indicates no limit). Default value
0.
- --model_file (-m) [string]
- File containing existing model (parameters). Default value ''.
- --optimizer (-O) [string]
- Optimizer to use for training ('lbfgs' or 'sgd'). Default value
'lbfgs'.
- --output_file (-o) [string]
- File where parameters (b) will be saved. Default value ''.
- --output_predictions (-p) [string]
- If --test_file is specified, this file is where the predicted
responses will be saved. Default value 'predictions.csv'.
- --step_size (-s) [double]
- Step size for SGD optimizer. Default value 0.01.
- --test_file (-t) [string]
- File containing test dataset. Default value ''.
- --tolerance (-T) [double]
- Convergence tolerance for optimizer. Default value 1e-10.
- --verbose (-v)
- Display informational messages and the full list of parameters and timers
at the end of execution.
- --version (-V)
- Display the version of mlpack.
For further information, including relevant papers, citations, and theory,
consult the documentation found at
http://www.mlpack.org or included with your
distribution of MLPACK.