.TH "mlpack::kernel::PSpectrumStringKernel" 3 "Tue Sep 9 2014" "Version 1.0.10" "MLPACK" \" -*- nroff -*- .ad l .nh .SH NAME mlpack::kernel::PSpectrumStringKernel \- .PP The p-spectrum string kernel\&. .SH SYNOPSIS .br .PP .SS "Public Member Functions" .in +1c .ti -1c .RI "\fBPSpectrumStringKernel\fP (const std::vector< std::vector< std::string > > &\fBdatasets\fP, const size_t \fBp\fP)" .br .RI "\fIInitialize the \fBPSpectrumStringKernel\fP with the given string datasets\&. \fP" .ti -1c .RI "const std::vector< std::vector .br < std::map< std::string, int > > > & \fBCounts\fP () const " .br .RI "\fIAccess the lists of substrings\&. \fP" .ti -1c .RI "std::vector< std::vector .br < std::map< std::string, int > > > & \fBCounts\fP ()" .br .RI "\fIModify the lists of substrings\&. \fP" .ti -1c .RI "template double \fBEvaluate\fP (const VecType &a, const VecType &b) const " .br .RI "\fIEvaluate the kernel for the string indices given\&. \fP" .ti -1c .RI "size_t \fBP\fP () const " .br .RI "\fIAccess the value of p\&. \fP" .ti -1c .RI "size_t & \fBP\fP ()" .br .RI "\fIModify the value of p\&. \fP" .ti -1c .RI "std::string \fBToString\fP () const " .br .in -1c .SS "Private Attributes" .in +1c .ti -1c .RI "std::vector< std::vector .br < std::map< std::string, int > > > \fBcounts\fP" .br .RI "\fIMappings of the datasets to counts of substrings\&. \fP" .ti -1c .RI "const std::vector< std::vector .br < std::string > > & \fBdatasets\fP" .br .RI "\fIThe datasets\&. \fP" .ti -1c .RI "size_t \fBp\fP" .br .RI "\fIThe value of p to use in calculation\&. \fP" .in -1c .SH "Detailed Description" .PP The p-spectrum string kernel\&. Given a length p, the p-spectrum kernel finds the contiguous subsequence match count between two strings\&. The kernel will take every possible substring of length p of one string and count how many times it appears in the other string\&. .PP The string kernel, when created, must be passed a reference to a series of string datasets (std::vector >&)\&. This is because MLPACK only supports datasets which are Armadillo matrices -- and a dataset of variable-length strings cannot be easily cast into an Armadillo matrix\&. .PP Therefore, once the \fBPSpectrumStringKernel\fP is created with a reference to the string datasets, a 'fake' Armadillo data matrix must be created, which simply holds indices to the strings they represent\&. This 'fake' matrix has two rows and n columns (where n is the number of strings in the dataset)\&. The first row holds the index of the dataset (remember, the kernel can have multiple datasets), and the second row holds the index of the string\&. A fake matrix containing only strings from dataset 0 might look like this: .PP [[0 0 0 0 0 0 0 0 0] [0 1 2 3 4 5 6 7 8]] .PP This fake matrix is then given to the machine learning method, which will eventually call PSpectrumStringKernel::Evaluate(a, b), where a and b are two columns of the fake matrix\&. The string kernel will then map these fake columns back to the strings they represent, and then correctly evaluate the kernel\&. .PP Unfortunately, not every machine learning method will work with this kernel\&. Only machine learning methods which do not ever operate on the explicit representation of points can use this kernel\&. So, for instance, one cannot build a kd-tree on strings, because the BinarySpaceTree<> class will split the data according to the fake data matrix -- resulting in a meaningless tree\&. This kernel was originally written for the FastMKS method; so, at the very least, it will work with that\&. .PP Definition at line 74 of file pspectrum_string_kernel\&.hpp\&. .SH "Constructor & Destructor Documentation" .PP .SS "mlpack::kernel::PSpectrumStringKernel::PSpectrumStringKernel (const std::vector< std::vector< std::string > > &datasets, const size_tp)" .PP Initialize the \fBPSpectrumStringKernel\fP with the given string datasets\&. For more information on this, see the general class documentation\&. .PP \fBParameters:\fP .RS 4 \fIdatasets\fP Sets of string data\&. .br \fIp\fP The length of substrings to search\&. .RE .PP .SH "Member Function Documentation" .PP .SS "const std::vector > >& mlpack::kernel::PSpectrumStringKernel::Counts () const\fC [inline]\fP" .PP Access the lists of substrings\&. .PP Definition at line 102 of file pspectrum_string_kernel\&.hpp\&. .PP References counts\&. .SS "std::vector > >& mlpack::kernel::PSpectrumStringKernel::Counts ()\fC [inline]\fP" .PP Modify the lists of substrings\&. .PP Definition at line 105 of file pspectrum_string_kernel\&.hpp\&. .PP References counts\&. .SS "template double mlpack::kernel::PSpectrumStringKernel::Evaluate (const VecType &a, const VecType &b) const" .PP Evaluate the kernel for the string indices given\&. As mentioned in the class documentation, a and b should be 2-element vectors, where the first element contains the index of the dataset and the second element contains the index of the string\&. Therefore, if [2 3] is passed for a, the string used will be datasets[2][3] (datasets is of type std::vector >&)\&. .PP \fBParameters:\fP .RS 4 \fIa\fP Index of string and dataset for first string\&. .br \fIb\fP Index of string and dataset for second string\&. .RE .PP .SS "size_t mlpack::kernel::PSpectrumStringKernel::P () const\fC [inline]\fP" .PP Access the value of p\&. .PP Definition at line 109 of file pspectrum_string_kernel\&.hpp\&. .PP References p\&. .SS "size_t& mlpack::kernel::PSpectrumStringKernel::P ()\fC [inline]\fP" .PP Modify the value of p\&. .PP Definition at line 111 of file pspectrum_string_kernel\&.hpp\&. .PP References p\&. .SS "std::string mlpack::kernel::PSpectrumStringKernel::ToString () const\fC [inline]\fP" .PP Definition at line 116 of file pspectrum_string_kernel\&.hpp\&. .PP References datasets, and mlpack::util::Indent()\&. .SH "Member Data Documentation" .PP .SS "std::vector > > mlpack::kernel::PSpectrumStringKernel::counts\fC [private]\fP" .PP Mappings of the datasets to counts of substrings\&. Such a huge structure is not wonderful\&.\&.\&. .PP Definition at line 133 of file pspectrum_string_kernel\&.hpp\&. .PP Referenced by Counts()\&. .SS "const std::vector >& mlpack::kernel::PSpectrumStringKernel::datasets\fC [private]\fP" .PP The datasets\&. .PP Definition at line 129 of file pspectrum_string_kernel\&.hpp\&. .PP Referenced by ToString()\&. .SS "size_t mlpack::kernel::PSpectrumStringKernel::p\fC [private]\fP" .PP The value of p to use in calculation\&. .PP Definition at line 136 of file pspectrum_string_kernel\&.hpp\&. .PP Referenced by P()\&. .SH "Author" .PP Generated automatically by Doxygen for MLPACK from the source code\&.