other versions
- jessie 1.0.10-1
mlpack::neighbor::LSHSearch< SortPolicy >(3) | MLPACK | mlpack::neighbor::LSHSearch< SortPolicy >(3) |
NAME¶
mlpack::neighbor::LSHSearch< SortPolicy > - The LSHSearch class -- This class builds a hash on the reference set and uses this hash to compute the distance-approximate nearest-neighbors of the given queries.SYNOPSIS¶
Public Member Functions¶
LSHSearch (const arma::mat &referenceSet, const arma::mat & querySet, const size_t numProj, const size_t numTables, const double hashWidth=0.0, const size_t secondHashSize=99901, const size_t bucketSize=500)
Private Member Functions¶
double BaseCase (const size_t queryIndex, const size_t referenceIndex)
Private Attributes¶
arma::Col< size_t > bucketContentSize
Detailed Description¶
template<typename SortPolicy = NearestNeighborSort>class mlpack::neighbor::LSHSearch< SortPolicy >¶
The LSHSearch class -- This class builds a hash on the reference set and uses this hash to compute the distance-approximate nearest-neighbors of the given queries. Template Parameters:SortPolicy The sort policy for distances; see
NearestNeighborSort.
Definition at line 59 of file lsh_search.hpp.
Constructor & Destructor Documentation¶
template<typename SortPolicy = NearestNeighborSort> mlpack::neighbor::LSHSearch< SortPolicy >:: LSHSearch (const arma::mat &referenceSet, const arma::mat &querySet, const size_tnumProj, const size_tnumTables, const doublehashWidth = 0.0, const size_tsecondHashSize = 99901, const size_tbucketSize = 500)¶
This function initializes the LSH class. It builds the hash on the reference set with 2-stable distributions. See the individual functions performing the hashing for details on how the hashing is done. Parameters:referenceSet Set of reference points.
querySet Set of query points.
numProj Number of projections in each hash table (anything between 10-50
might be a decent choice).
numTables Total number of hash tables (anything between 10-20 should
suffice).
hashWidth The width of hash for every table. If 0 (the default) is
provided, then the hash width is automatically obtained by computing the
average pairwise distance of 25 pairs. This should be a reasonable upper bound
on the nearest-neighbor distance in general.
secondHashSize The size of the second hash table. This should be a large
prime number.
bucketSize The size of the bucket in the second hash table. This is the
maximum number of points that can be hashed into single bucket. Default values
are already provided here.
template<typename SortPolicy = NearestNeighborSort> mlpack::neighbor::LSHSearch< SortPolicy >:: LSHSearch (const arma::mat &referenceSet, const size_tnumProj, const size_tnumTables, const doublehashWidth = 0.0, const size_tsecondHashSize = 99901, const size_tbucketSize = 500)¶
This function initializes the LSH class. It builds the hash on the reference set with 2-stable distributions. See the individual functions performing the hashing for details on how the hashing is done. Parameters:referenceSet Set of reference points and the set
of queries.
numProj Number of projections in each hash table (anything between 10-50
might be a decent choice).
numTables Total number of hash tables (anything between 10-20 should
suffice).
hashWidth The width of hash for every table. If 0 (the default) is
provided, then the hash width is automatically obtained by computing the
average pairwise distance of 25 pairs. This should be a reasonable upper bound
on the nearest-neighbor distance in general.
secondHashSize The size of the second hash table. This should be a large
prime number.
bucketSize The size of the bucket in the second hash table. This is the
maximum number of points that can be hashed into single bucket. Default values
are already provided here.
Member Function Documentation¶
template<typename SortPolicy = NearestNeighborSort> double mlpack::neighbor::LSHSearch< SortPolicy >::BaseCase (const size_tqueryIndex, const size_treferenceIndex) [private]¶
This is a helper function that computes the distance of the query to the neighbor candidates and appropriately stores the best 'k' candidates. Parameters:queryIndex The index of the query in question
referenceIndex The index of the neighbor candidate in question
template<typename SortPolicy = NearestNeighborSort> void mlpack::neighbor::LSHSearch< SortPolicy >::BuildHash () [private]¶
This function builds a hash table with two levels of hashing as presented in the paper. This function first hashes the points with 'numProj' random projections to a single hash table creating (key, point ID) pairs where the key is a 'numProj'-dimensional integer vector. Then each key in this hash table is hashed into a second hash table using a standard hash. This function does not have any parameters and relies on parameters which are private members of this class, initialized during the class initialization.template<typename SortPolicy = NearestNeighborSort> void mlpack::neighbor::LSHSearch< SortPolicy >::InsertNeighbor (const size_tqueryIndex, const size_tpos, const size_tneighbor, const doubledistance) [private]¶
This is a helper function that efficiently inserts better neighbor candidates into an existing set of neighbor candidates. This function is only called by the 'BaseCase' function. Parameters:queryIndex This is the index of the query being
processed currently
pos The position of the neighbor candidate in the current list of
neighbor candidates.
neighbor The neighbor candidate that is being inserted into the list of
the best 'k' candidates for the query in question.
distance The distance of the query to the neighbor candidate.
template<typename SortPolicy = NearestNeighborSort> void mlpack::neighbor::LSHSearch< SortPolicy >::ReturnIndicesFromTable (const size_tqueryIndex, arma::uvec &referenceIndices, size_tnumTablesToSearch) [private]¶
This function takes a query and hashes it into each of the hash tables to get keys for the query and then the key is hashed to a bucket of the second hash table and all the points (if any) in those buckets are collected as the potential neighbor candidates. Parameters:queryIndex The index of the query currently being
processed.
referenceIndices The list of neighbor candidates obtained from hashing
the query into all the hash tables and eventually into multiple buckets of the
second hash table.
template<typename SortPolicy = NearestNeighborSort> void mlpack::neighbor::LSHSearch< SortPolicy >::Search (const size_tk, arma::Mat< size_t > &resultingNeighbors, arma::mat &distances, const size_tnumTablesToSearch = 0)¶
Compute the nearest neighbors and store the output in the given matrices. The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for. Parameters:k Number of neighbors to search for.
resultingNeighbors Matrix storing lists of neighbors for each query
point.
distances Matrix storing distances of neighbors for each query point.
numTablesToSearch This parameter allows the user to have control over the
number of hash tables to be searched. This allows the user to pick the number
of tables it can afford for the time available without having to build hashing
for every table size. By default, this is set to zero in which case all tables
are considered.
template<typename SortPolicy = NearestNeighborSort> std::string mlpack::neighbor::LSHSearch< SortPolicy >::ToString () const¶
Member Data Documentation¶
template<typename SortPolicy = NearestNeighborSort> arma::Col<size_t> mlpack::neighbor::LSHSearch< SortPolicy >::bucketContentSize [private]¶
The number of elements present in each hash bucket; should be secondHashSize. Definition at line 237 of file lsh_search.hpp.template<typename SortPolicy = NearestNeighborSort> arma::Col<size_t> mlpack::neighbor::LSHSearch< SortPolicy >::bucketRowInHashTable [private]¶
For a particular hash value, points to the row in secondHashTable corresponding to this value. Should be secondHashSize. Definition at line 241 of file lsh_search.hpp.template<typename SortPolicy = NearestNeighborSort> const size_t mlpack::neighbor::LSHSearch< SortPolicy >::bucketSize [private]¶
The bucket size of the second hash. Definition at line 227 of file lsh_search.hpp.template<typename SortPolicy = NearestNeighborSort> arma::mat* mlpack::neighbor::LSHSearch< SortPolicy >::distancePtr [private]¶
The pointer to the nearest neighbor distances. Definition at line 244 of file lsh_search.hpp.template<typename SortPolicy = NearestNeighborSort> double mlpack::neighbor::LSHSearch< SortPolicy >::hashWidth [private]¶
The hash width. Definition at line 218 of file lsh_search.hpp.template<typename SortPolicy = NearestNeighborSort> metric::SquaredEuclideanDistance mlpack::neighbor::LSHSearch< SortPolicy >::metric [private]¶
Instantiation of the metric. Definition at line 230 of file lsh_search.hpp.template<typename SortPolicy = NearestNeighborSort> arma::Mat<size_t>* mlpack::neighbor::LSHSearch< SortPolicy >::neighborPtr [private]¶
The pointer to the nearest neighbor indices. Definition at line 247 of file lsh_search.hpp.template<typename SortPolicy = NearestNeighborSort> const size_t mlpack::neighbor::LSHSearch< SortPolicy >::numProj [private]¶
The number of projections. Definition at line 206 of file lsh_search.hpp.template<typename SortPolicy = NearestNeighborSort> const size_t mlpack::neighbor::LSHSearch< SortPolicy >::numTables [private]¶
The number of hash tables. Definition at line 209 of file lsh_search.hpp.template<typename SortPolicy = NearestNeighborSort> arma::mat mlpack::neighbor::LSHSearch< SortPolicy >::offsets [private]¶
The list of the offset 'b' for each of the projection for each table. Definition at line 215 of file lsh_search.hpp.template<typename SortPolicy = NearestNeighborSort> std::vector<arma::mat> mlpack::neighbor::LSHSearch< SortPolicy >::projections [private]¶
The std::vector containing the projection matrix of each table. Definition at line 212 of file lsh_search.hpp.template<typename SortPolicy = NearestNeighborSort> const arma::mat& mlpack::neighbor::LSHSearch< SortPolicy >::querySet [private]¶
Query dataset (may not be given). Definition at line 203 of file lsh_search.hpp.template<typename SortPolicy = NearestNeighborSort> const arma::mat& mlpack::neighbor::LSHSearch< SortPolicy >::referenceSet [private]¶
Reference dataset. Definition at line 200 of file lsh_search.hpp.template<typename SortPolicy = NearestNeighborSort> const size_t mlpack::neighbor::LSHSearch< SortPolicy >::secondHashSize [private]¶
The big prime representing the size of the second hash. Definition at line 221 of file lsh_search.hpp.template<typename SortPolicy = NearestNeighborSort> arma::Mat<size_t> mlpack::neighbor::LSHSearch< SortPolicy >::secondHashTable [private]¶
The final hash table; should be (< secondHashSize) x bucketSize. Definition at line 233 of file lsh_search.hpp.template<typename SortPolicy = NearestNeighborSort> arma::vec mlpack::neighbor::LSHSearch< SortPolicy >::secondHashWeights [private]¶
The weights of the second hash. Definition at line 224 of file lsh_search.hpp.Author¶
Generated automatically by Doxygen for MLPACK from the source code.Tue Sep 9 2014 | Version 1.0.10 |