NAME¶
lsh - all k-approximate-nearest-neighbor search with lsh
SYNOPSIS¶
lsh [-h] [-v] -k int -r string [-B int] [-d string] [-H double] [-n string] [-K int] [-q string] [-M int] [-s int] [-L int] -V
DESCRIPTION¶
This program will calculate the k approximate-nearest-neighbors of a set of
points using locality-sensitive hashing. You may specify a separate set of
reference points and query points, or just a reference set which will be used
as both the reference and query set.
For example, the following will return 5 neighbors from the data for each point
in 'input.csv' and store the distances in 'distances.csv' and the neighbors in
the file 'neighbors.csv':
$
lsh -k 5
-r input.csv
-d distances.csv
-n
neighbors.csv
The output files are organized such that row i and column j in the neighbors
output file corresponds to the index of the point in the reference set which
is the i'th nearest neighbor from the point in the query set with index j. Row
i and column j in the distances output file corresponds to the distance
between those two points.
Because this is approximate-nearest-neighbors search, results may be different
from run to run. Thus, the
--seed option can be specified to set the
random seed.
REQUIRED OPTIONS¶
- --k (-k) [int]
- Number of nearest neighbors to find.
- --reference_file (-r) [string]
- File containing the reference dataset.
OPTIONS¶
- --bucket_size (-B) [int]
- The size of a bucket in the second level hash. Default value 500.
- --distances_file (-d) [string]
- File to output distances into. Default value ''.
- --hash_width (-H) [double]
- The hash width for the first-level hashing in the LSH preprocessing. By
default, the LSH class automatically estimates a hash width for its use.
Default value 0.
- --help (-h)
- Default help info.
- --info [string]
- Get help on a specific module or option. Default value ''.
- --neighbors_file (-n) [string]
- File to output neighbors into. Default value ''.
- --projections (-K) [int]
- The number of hash functions for each table Default value 10.
- --query_file (-q) [string]
- File containing query points (optional). Default value ''.
- --second_hash_size (-M) [int]
- The size of the second level hash table. Default value 99901.
- --seed (-s) [int]
- Random seed. If 0, 'std::time(NULL)' is used. Default value 0.
- --tables (-L) [int]
- The number of hash tables to be used. Default value 30.
- --verbose (-v)
- Display informational messages and the full list of parameters and timers
at the end of execution.
- --version (-V)
- Display the version of mlpack.
For further information, including relevant papers, citations, and theory,
consult the documentation found at
http://www.mlpack.org or included with your
distribution of MLPACK.