.\" Text automatically generated by txt2man
.TH mlpack_approx_kfn 1 "12 December 2020" "mlpack-3.4.2" "User Commands"
.SH NAME
\fBmlpack_approx_kfn \fP- approximate furthest neighbor search
.SH SYNOPSIS
.nf
.fam C
 \fBmlpack_approx_kfn\fP [\fB-a\fP \fIstring\fP] [\fB-e\fP \fIbool\fP] [\fB-x\fP \fIstring\fP] [\fB-m\fP \fIunknown\fP] [\fB-k\fP \fIint\fP] [\fB-p\fP \fIint\fP] [\fB-t\fP \fIint\fP] [\fB-q\fP \fIstring\fP] [\fB-r\fP \fIstring\fP] [\fB-V\fP \fIbool\fP] [\fB-d\fP \fIstring\fP] [\fB-n\fP \fIstring\fP] [\fB-M\fP \fIunknown\fP] [\fB-h\fP \fB-v\fP] 
.fam T
.fi
.fam T
.fi
.SH DESCRIPTION


This program implements two strategies for furthest neighbor search. These
strategies are:
.RS
.IP \(bu 3
The 'qdafn' algorithm from "Approximate Furthest Neighbor in High
Dimensions" by R. Pagh, F. Silvestri, J. Sivertsen, and M. Skala, in
Similarity Search and Applications 2015 (SISAP).
.IP \(bu 3
The 'DrusillaSelect' algorithm from "Fast approximate furthest neighbors
with data-dependent candidate selection", by R.R. Curtin and A.B. Gardner, in
Similarity Search and Applications 2016 (SISAP).
.RE
.PP
These two strategies give approximate results for the furthest neighbor search
problem and can be used as fast replacements for other furthest neighbor
techniques such as those found in the mlpack_kfn program. Note that
typically, the 'ds' algorithm requires far fewer tables and projections than
the 'qdafn' algorithm.
.PP
Specify a reference set (set to search in) with '\fB--reference_file\fP (\fB-r\fP)',
specify a query set with '\fB--query_file\fP (\fB-q\fP)', and specify algorithm parameters
with '\fB--num_tables\fP (\fB-t\fP)' and '\fB--num_projections\fP (\fB-p\fP)' (or don't and defaults
will be used). The algorithm to be used (either 'ds'\fB---the\fP default-\fB--or\fP
\(cqqdafn') may be specified with '\fB--algorithm\fP (\fB-a\fP)'. Also specify the number
of neighbors to search for with '\fB--k\fP (\fB-k\fP)'.
.PP
Note that for 'qdafn' in lower dimensions, '\fB--num_projections\fP (\fB-p\fP)' may need
to be set to a high value in order to return results for each query point.
.PP
If no query set is specified, the reference set will be used as the query set.
The '\fB--output_model_file\fP (\fB-M\fP)' output parameter may be used to store the
built model, and an input model may be loaded instead of specifying a
reference set with the '\fB--input_model_file\fP (\fB-m\fP)' option.
.PP
Results for each query point can be stored with the '\fB--neighbors_file\fP (\fB-n\fP)'
and '\fB--distances_file\fP (\fB-d\fP)' output parameters. Each row of these output
matrices holds the k distances or neighbor indices for each query point.
.PP
For example, to find the 5 approximate furthest neighbors with
\(cqreference_set.csv' as the reference set and 'query_set.csv' as the query set
using DrusillaSelect, storing the furthest neighbor indices to 'neighbors.csv'
and the furthest neighbor distances to 'distances.csv', one could call
.PP
$ \fBmlpack_approx_kfn\fP \fB--query_file\fP query_set.csv \fB--reference_file\fP
reference_set.csv \fB--k\fP 5 \fB--algorithm\fP ds \fB--neighbors_file\fP neighbors.csv
\fB--distances_file\fP distances.csv
.PP
and to perform approximate all-furthest-neighbors search with k=1 on the set
\(cqdata.csv' storing only the furthest neighbor distances to 'distances.csv',
one could call
.PP
$ \fBmlpack_approx_kfn\fP \fB--reference_file\fP reference_set.csv \fB--k\fP 1 \fB--distances_file\fP
distances.csv
.PP
A trained model can be re-used. If a model has been previously saved to
\(cqmodel.bin', then we may find 3 approximate furthest neighbors on a query set
\(cqnew_query_set.csv' using that model and store the furthest neighbor indices
into 'neighbors.csv' by calling
.PP
$ \fBmlpack_approx_kfn\fP \fB--input_model_file\fP model.bin \fB--query_file\fP
new_query_set.csv \fB--k\fP 3 \fB--neighbors_file\fP neighbors.csv
.RE
.PP

.SH OPTIONAL INPUT OPTIONS 

.TP
.B
\fB--algorithm\fP (\fB-a\fP) [\fIstring\fP]
Algorithm to use: 'ds' or 'qdafn'. Default value 'ds'. 
.TP
.B
\fB--calculate_error\fP (\fB-e\fP) [\fIbool\fP]
If set, calculate the average distance error for the first furthest neighbor only. 
.TP
.B
\fB--exact_distances_file\fP (\fB-x\fP) [\fIstring\fP]
Matrix containing exact distances to furthest neighbors; this can be used to avoid explicit 
calculation when \fB--calculate_error\fP is set. 
.TP
.B
\fB--help\fP (\fB-h\fP) [\fIbool\fP]
Default help info. 
.TP
.B
\fB--info\fP [\fIstring\fP]
Print help on a specific option. Default value ''. 
.TP
.B
\fB--input_model_file\fP (\fB-m\fP) [\fIunknown\fP]
File containing input model. 
.TP
.B
\fB--k\fP (\fB-k\fP) [\fIint\fP]
Number of furthest neighbors to search for.  Default value 0. 
\fB--num_projections\fP (\fB-p\fP) [\fIint\fP] Number of projections to use in each hash table. Default value 5. 
.TP
.B
\fB--num_tables\fP (\fB-t\fP) [\fIint\fP]
Number of hash tables to use. Default value 5. 
.TP
.B
\fB--query_file\fP (\fB-q\fP) [\fIstring\fP]
Matrix containing query points. 
.TP
.B
\fB--reference_file\fP (\fB-r\fP) [\fIstring\fP]
Matrix containing the reference dataset. 
.TP
.B
\fB--verbose\fP (\fB-v\fP) [\fIbool\fP]
Display informational messages and the full list of parameters and timers at the end of execution. 
.TP
.B
\fB--version\fP (\fB-V\fP) [\fIbool\fP]
Display the version of mlpack.  
.SH OPTIONAL OUTPUT OPTIONS 

.TP
.B
\fB--distances_file\fP (\fB-d\fP) [\fIstring\fP]
Matrix to save furthest neighbor distances to. 
.TP
.B
\fB--neighbors_file\fP (\fB-n\fP) [\fIstring\fP]
Matrix to save neighbor indices to. 
.TP
.B
\fB--output_model_file\fP (\fB-M\fP) [\fIunknown\fP]
File to save output model to.
.SH ADDITIONAL INFORMATION

For further information, including relevant papers, citations, and theory,
consult the documentation found at http://www.mlpack.org or included with your
distribution of mlpack.