NAME¶
Sphinx::Search - Sphinx search engine API Perl client
VERSION¶
Please note that you *MUST* install a version which is compatible with your
version of Sphinx.
Use version 0.28 for Sphinx-2.0.3-release (svn-r3043)
Use version 0.26.1 for Sphinx-2.0.1-beta (svn-r2792)
Use version 0.25_03 for Sphinx svn-r2575
Use version 0.24.1 for Sphinx-1.10-beta (svn-r2420)
Use version 0.23_02 for Sphinx svn-r2269 (experimental)
Use version 0.22 for Sphinx 0.9.9-rc2 and later (Please read the Compatibility
Note under SetEncoders regarding encoding changes)
Use version 0.15 for Sphinx 0.9.9-svn-r1674
Use version 0.12 for Sphinx 0.9.8
Use version 0.11 for Sphinx 0.9.8-rc1
Use version 0.10 for Sphinx 0.9.8-svn-r1112
Use version 0.09 for Sphinx 0.9.8-svn-r985
Use version 0.08 for Sphinx 0.9.8-svn-r871
Use version 0.06 for Sphinx 0.9.8-svn-r820
Use version 0.05 for Sphinx 0.9.8-cvs-20070907
Use version 0.02 for Sphinx 0.9.8-cvs-20070818
SYNOPSIS¶
use Sphinx::Search;
$sph = Sphinx::Search->new();
$results = $sph->SetMatchMode(SPH_MATCH_ALL)
->SetSortMode(SPH_SORT_RELEVANCE)
->Query("search terms");
DESCRIPTION¶
This is the Perl API client for the Sphinx open-source SQL full-text indexing
search engine, <
http://www.sphinxsearch.com>.
CONSTRUCTOR¶
new¶
$sph = Sphinx::Search->new;
$sph = Sphinx::Search->new(\%options);
Create a new Sphinx::Search instance.
OPTIONS
- log
- Specify an optional logger instance. This can be any class that provides
error, warn, info, and debug methods (e.g. see Log::Log4perl). Logging is
disabled if no logger instance is provided.
- debug
- Debug flag. If set (and a logger instance is specified), debugging
messages will be generated.
METHODS¶
GetLastError¶
$error = $sph->GetLastError;
Get last error message (string)
GetLastWarning¶
$warning = $sph->GetLastWarning;
Get last warning message (string)
IsConnectError¶
Check connection error flag (to differentiate between network connection errors
and bad responses). Returns true value on connection error.
SetEncoders¶
$sph->SetEncoders(\&encode_function, \&decode_function)
COMPATIBILITY NOTE:
SetEncoders() was introduced in version 0.17. Prior
to that, all strings were considered to be sequences of bytes which may have
led to issues with multi-byte characters. If you were previously
encoding/decoding strings external to Sphinx::Search, you will need to disable
encoding/decoding by setting Sphinx::Search to use raw values as explained
below (or modify your code and let Sphinx::Search do the recoding).
Set the string encoder/decoder functions for transferring strings between perl
and Sphinx. The encoder should take the perl internal representation and
convert to the bytestream that searchd expects, and the decoder should take
the bytestream returned by searchd and convert to perl format.
The searchd format will depend on the 'charset_type' index setting in the Sphinx
configuration file.
The coders default to encode_utf8 and decode_utf8 respectively, which are
compatible with the 'utf8' charset_type.
If either the encoder or decoder functions are left undefined in the call to
SetEncoders, they return to their default values.
If you wish to send raw values (no encoding/decoding), supply a function that
simply returns its argument, e.g.
$sph->SetEncoders( sub { shift }, sub { shift });
Returns $sph.
SetServer¶
$sph->SetServer($host, $port);
$sph->SetServer($path, $port);
In the first form, sets the host (string) and port (integer) details for the
searchd server using a network (INET) socket (default is localhost:9312).
In the second form, where $path is a local filesystem path (optionally prefixed
by '
unix://'), sets the client to access the searchd server via a local (UNIX
domain) socket at the specified path.
Returns $sph.
SetConnectTimeout¶
$sph->SetConnectTimeout($timeout)
Set server connection timeout (in seconds).
Returns $sph.
SetConnectRetries¶
$sph->SetConnectRetries($retries)
Set server connection retries (in case of connection fail).
Returns $sph.
SetLimits¶
$sph->SetLimits($offset, $limit);
$sph->SetLimits($offset, $limit, $max);
Set match offset/limits, and optionally the max number of matches to return.
Returns $sph.
SetMaxQueryTime¶
$sph->SetMaxQueryTime($millisec);
Set maximum query time, in milliseconds, per index.
The value may not be negative; 0 means "do not limit".
Returns $sph.
SetMatchMode¶
$sph->SetMatchMode($mode);
Set match mode, which may be one of:
- •
- SPH_MATCH_ALL
Match all words
- •
- SPH_MATCH_ANY
Match any words
- •
- SPH_MATCH_PHRASE
Exact phrase match
- •
- SPH_MATCH_BOOLEAN
Boolean match, using AND (&), OR (|), NOT (!,-) and parenthetic
grouping.
- •
- SPH_MATCH_EXTENDED
Extended match, which includes the Boolean syntax plus field, phrase and
proximity operators.
Returns $sph.
SetRankingMode¶
$sph->SetRankingMode(SPH_RANK_BM25, $rank_exp);
Set ranking mode, which may be one of:
- •
- SPH_RANK_PROXIMITY_BM25
Default mode, phrase proximity major factor and BM25 minor one
- •
- SPH_RANK_BM25
Statistical mode, BM25 ranking only (faster but worse quality)
- •
- SPH_RANK_NONE
No ranking, all matches get a weight of 1
- •
- SPH_RANK_WORDCOUNT
Simple word-count weighting, rank is a weighted sum of per-field keyword
occurence counts
- •
- SPH_RANK_MATCHANY
Returns rank as it was computed in SPH_MATCH_ANY mode earlier, and is
internally used to emulate SPH_MATCH_ANY queries.
- •
- SPH_RANK_FIELDMASK
Returns a 32-bit mask with N-th bit corresponding to N-th fulltext field,
numbering from 0. The bit will only be set when the respective field has
any keyword occurences satisfiying the query.
- •
- SPH_RANK_SPH04
SPH_RANK_SPH04 is generally based on the default SPH_RANK_PROXIMITY_BM25
ranker, but additionally boosts the matches when they occur in the very
beginning or the very end of a text field.
- •
- SPH_RANK_EXPR
Allows the ranking formula to be specified at run time. It exposes a number
of internal text factors and lets you define how the final weight should
be computed from those factors. $rank_exp should be set to the ranking
expression string, e.g. to emulate SPH_RANK_PROXIMITY_BM25, use
"sum(lcs*user_weight)*1000+bm25".
Returns $sph.
SetSortMode¶
$sph->SetSortMode(SPH_SORT_RELEVANCE);
$sph->SetSortMode($mode, $sortby);
Set sort mode, which may be any of:
- SPH_SORT_RELEVANCE - sort by relevance
- SPH_SORT_ATTR_DESC, SPH_SORT_ATTR_ASC
- Sort by attribute descending/ascending. $sortby specifies the sorting
attribute.
- SPH_SORT_TIME_SEGMENTS
- Sort by time segments (last hour/day/week/month) in descending order, and
then by relevance in descending order. $sortby specifies the time
attribute.
- SPH_SORT_EXTENDED
- Sort by SQL-like syntax. $sortby is the sorting specification.
- SPH_SORT_EXPR
Returns $sph.
SetWeights¶
$sph->SetWeights([ 1, 2, 3, 4]);
This method is deprecated. Use SetFieldWeights instead.
Set per-field (integer) weights. The ordering of the weights correspond to the
ordering of fields as indexed.
Returns $sph.
SetFieldWeights¶
$sph->SetFieldWeights(\%weights);
Set per-field (integer) weights by field name. The weights hash provides field
name to weight mappings.
Takes precedence over SetWeights.
Unknown names will be silently ignored. Missing fields will be given a weight of
1.
Returns $sph.
SetIndexWeights¶
$sph->SetIndexWeights(\%weights);
Set per-index (integer) weights. The weights hash is a mapping of index name to
integer weight.
Returns $sph.
SetIDRange¶
$sph->SetIDRange($min, $max);
Set IDs range only match those records where document ID is between $min and
$max (including $min and $max)
Returns $sph.
SetFilter¶
$sph->SetFilter($attr, \@values);
$sph->SetFilter($attr, \@values, $exclude);
Sets the results to be filtered on the given attribute. Only results which have
attributes matching the given values will be returned.
This may be called multiple times with different attributes to select on
multiple attributes.
If 'exclude' is set, excludes results that match the filter.
Returns $sph.
SetFilterRange¶
$sph->SetFilterRange($attr, $min, $max);
$sph->SetFilterRange($attr, $min, $max, $exclude);
Sets the results to be filtered on a range of values for the given attribute.
Only those records where $attr column value is between $min and $max
(including $min and $max) will be returned.
If 'exclude' is set, excludes results that fall within the given range.
Returns $sph.
SetFilterFloatRange¶
$sph->SetFilterFloatRange($attr, $min, $max, $exclude);
Same as SetFilterRange, but allows floating point values.
Returns $sph.
SetGeoAnchor¶
$sph->SetGeoAnchor($attrlat, $attrlong, $lat, $long);
Setup anchor point for using geosphere distance calculations in filters and
sorting. Distance will be computed with respect to this point
- $attrlat is the name of latitude attribute
- $attrlong is the name of longitude attribute
- $lat is anchor point latitude, in radians
- $long is anchor point longitude, in radians
Returns $sph.
SetGroupBy¶
$sph->SetGroupBy($attr, $func);
$sph->SetGroupBy($attr, $func, $groupsort);
Sets attribute and function of results grouping.
In grouping mode, all matches are assigned to different groups based on grouping
function value. Each group keeps track of the total match count, and the best
match (in this group) according to current sorting function. The final result
set contains one best match per group, with grouping function value and
matches count attached.
$attr is any valid attribute. Use ResetGroupBy to disable grouping.
$func is one of:
- •
- SPH_GROUPBY_DAY
Group by day (assumes timestamp type attribute of form YYYYMMDD)
- •
- SPH_GROUPBY_WEEK
Group by week (assumes timestamp type attribute of form YYYYNNN)
- •
- SPH_GROUPBY_MONTH
Group by month (assumes timestamp type attribute of form YYYYMM)
- •
- SPH_GROUPBY_YEAR
Group by year (assumes timestamp type attribute of form YYYY)
- •
- SPH_GROUPBY_ATTR
Group by attribute value
- •
- SPH_GROUPBY_ATTRPAIR
Group by two attributes, being the given attribute and the attribute that
immediately follows it in the sequence of indexed attributes. The
specified attribute may therefore not be the last of the indexed
attributes.
Groups in the set of results can be sorted by any SQL-like sorting clause,
including both document attributes and the following special internal Sphinx
attributes:
- @id - document ID;
- @weight, @rank, @relevance - match weight;
- @group - group by function value;
- @count - number of matches in group.
The default mode is to sort by groupby value in descending order, ie. by
"@group desc".
In the results set, "total_found" contains the total amount of
matching groups over the whole index.
WARNING: grouping is done in fixed memory and thus its results are only
approximate; so there might be more groups reported in total_found than
actually present. @count might also be underestimated.
For example, if sorting by relevance and grouping by a "published"
attribute with SPH_GROUPBY_DAY function, then the result set will contain only
the most relevant match for each day when there were any matches published,
with day number and per-day match count attached, and sorted by day number in
descending order (ie. recent days first).
SetGroupDistinct¶
$sph->SetGroupDistinct($attr);
Set count-distinct attribute for group-by queries
SetRetries¶
$sph->SetRetries($count, $delay);
Set distributed retries count and delay
SetOverride¶
$sph->SetOverride($attrname, $attrtype, $values);
Set attribute values override. There can be only one override per attribute.
$values must be a hash that maps document IDs to attribute values
SetSelect¶
$sph->SetSelect($select)
Set select list (attributes or expressions). SQL-like syntax.
ResetFilters¶
$sph->ResetFilters;
Clear all filters.
ResetGroupBy¶
$sph->ResetGroupBy;
Clear all group-by settings (for multi-queries)
ResetOverrides¶
Clear all attribute value overrides (for multi-queries)
Query¶
$results = $sph->Query($query, $index);
Connect to searchd server and run given search query.
- query is query string
- index is index name to query, default is "*" which means to
query all indexes. Use a space or comma separated list to search multiple
indexes.
Returns undef on failure
Returns hash which has the following keys on success:
- matches
- Array containing hashes with found documents ( "doc",
"weight", "group", "stamp" )
- total
- Total amount of matches retrieved (upto SPH_MAX_MATCHES, see
sphinx.h)
- total_found
- Total amount of matching documents in index
- time
- Search time
- words
- Hash which maps query terms (stemmed!) to ( "docs",
"hits" ) hash
Returns the results array on success, undef on error.
AddQuery¶
$sph->AddQuery($query, $index);
Add a query to a batch request.
Batch queries enable searchd to perform internal optimizations, if possible; and
reduce network connection overheads in all cases.
For instance, running exactly the same query with different groupby settings
will enable searched to perform expensive full-text search and ranking
operation only once, but compute multiple groupby results from its output.
Parameters are exactly the same as in
Query() call.
Returns corresponding index to the results array returned by
RunQueries()
call.
RunQueries¶
$sph->RunQueries
Run batch of queries, as added by AddQuery.
Returns undef on network IO failure.
Returns an array of result sets on success.
Each result set in the returned array is a hash which contains the same keys as
the hash returned by Query, plus:
- •
- error
Errors, if any, for this query.
- •
- warning
Any warnings associated with the query.
BuildExcerpts¶
$excerpts = $sph->BuildExcerpts($docs, $index, $words, $opts)
Generate document excerpts for the specified documents.
- docs
- An array reference of strings which represent the document contents
- index
- A string specifiying the index whose settings will be used for stemming,
lexing and case folding
- words
- A string which contains the words to highlight
- opts
- A hash which contains additional optional highlighting parameters:
- before_match - a string to insert before a set of matching words, default
is "<b>"
- after_match - a string to insert after a set of matching words, default is
"<b>"
- chunk_separator - a string to insert between excerpts chunks, default is
" ... "
- limit - max excerpt size in symbols (codepoints), default is 256
- limit_passages - Limits the maximum number of passages that can be
included into the snippet. Integer, default is 0 (no limit).
- limit_words - Limits the maximum number of keywords that can be included
into the snippet. Integer, default is 0 (no limit).
- around - how many words to highlight around each match, default is 5
- exact_phrase - whether to highlight exact phrase matches only, default is
false
- single_passage - whether to extract single best passage only, default is
false
- use_boundaries
- weight_order - Whether to sort the extracted passages in order of
relevance (decreasing weight), or in order of appearance in the document
(increasing position). Boolean, default is false.
- query_mode - Whether to handle $words as a query in extended syntax, or as
a bag of words (default behavior). For instance, in query mode ("one
two" | "three four") will only highlight and include those
occurrences "one two" or "three four" when the two words
from each pair are adjacent to each other. In default mode, any single
occurrence of "one", "two", "three", or
"four" would be highlighted. Boolean, default is false.
- force_all_words - Ignores the snippet length limit until it includes all
the keywords. Boolean, default is false.
- start_passage_id - Specifies the starting value of %PASSAGE_ID% macro
(that gets detected and expanded in before_match, after_match strings).
Integer, default is 1.
- load_files - Whether to handle $docs as data to extract snippets from
(default behavior), or to treat it as file names, and load data from
specified files on the server side. Boolean, default is false.
- html_strip_mode - HTML stripping mode setting. Defaults to
"index", which means that index settings will be used. The other
values are "none" and "strip", that forcibly skip or
apply stripping irregardless of index settings; and "retain", that
retains HTML markup and protects it from highlighting. The
"retain" mode can only be used when highlighting full documents
and thus requires that no snippet size limits are set. String, allowed
values are "none", "strip", "index", and
"retain".
- allow_empty - Allows empty string to be returned as highlighting result
when a snippet could not be generated (no keywords match, or no passages fit
the limit). By default, the beginning of original text would be returned
instead of an empty string. Boolean, default is false.
- passage_boundary
- emit_zones
- load_files_scattered
Returns undef on failure.
Returns an array ref of string excerpts on success.
BuildKeywords¶
$results = $sph->BuildKeywords($query, $index, $hits)
Generate keyword list for a given query Returns undef on failure, Returns an
array of hashes, where each hash describes a word in the query with the
following keys:
- •
- tokenized
Tokenised term from query
- •
- normalized
Normalised term from query
- •
- docs
Number of docs in which word was found (if $hits is true)
- •
- hits
Number of occurrences of word (if $hits is true)
EscapeString¶
$escaped = $sph->EscapeString('abcde!@#$%')
Inserts backslash before all non-word characters in the given string.
UpdateAttributes¶
$sph->UpdateAttributes($index, \@attrs, \%values);
$sph->UpdateAttributes($index, \@attrs, \%values, $mva);
Update specified attributes on specified documents
- index
- Name of the index to be updated
- attrs
- Array of attribute name strings
- values
- A hash with key as document id, value as an array of new attribute
values
Returns number of actually updated documents (0 or more) on success
Returns undef on failure
Usage example:
$sph->UpdateAttributes("test1", [ qw/group_id/ ], { 1 => [ 456] }) );
Open¶
$sph->Open()
Opens a persistent connection for subsequent queries.
To reduce the network connection overhead of making Sphinx queries, you can call
$sph->
Open(), then run any number of queries, and call
$sph->
Close() when finished.
Returns 1 on success, 0 on failure.
Close¶
$sph->Close()
Closes a persistent connection.
Returns 1 on success, 0 on failure.
Status¶
$status = $sph->Status()
Queries searchd status, and returns a hash of status variable name and value
pairs.
Returns undef on failure.
FlushAttributes¶
SEE ALSO¶
<
http://www.sphinxsearch.com>
NOTES¶
There is (or was) a bundled Sphinx.pm in the contrib area of the Sphinx source
distribution, which was used as the starting point of Sphinx::Search.
Maintenance of that version appears to have lapsed at sphinx-0.9.7, so many of
the newer API calls are not available there. Sphinx::Search is mostly
compatible with the old Sphinx.pm except:
- On failure, Sphinx::Search returns undef rather than 0 or -1.
- Sphinx::Search 'Set' functions are cascadable, e.g. you can do
Sphinx::Search->new ->SetMatchMode(SPH_MATCH_ALL)
->SetSortMode(SPH_SORT_RELEVANCE) ->Query("search
terms")
Sphinx::Search also provides documentation and unit tests, which were the main
motivations for branching from the earlier work.
AUTHOR¶
Jon Schutz
<
http://notes.jschutz.net>
BUGS¶
Please report any bugs or feature requests to "bug-sphinx-search at
rt.cpan.org", or through the web interface at
<
http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Sphinx-Search>. I will
be notified, and then you'll automatically be notified of progress on your bug
as I make changes.
SUPPORT¶
You can find documentation for this module with the perldoc command.
perldoc Sphinx::Search
You can also look for information at:
- •
- AnnoCPAN: Annotated CPAN documentation
<http://annocpan.org/dist/Sphinx-Search>
- •
- CPAN Ratings
<http://cpanratings.perl.org/d/Sphinx-Search>
- •
- RT: CPAN's request tracker
<http://rt.cpan.org/NoAuth/Bugs.html?Dist=Sphinx-Search>
- •
- Search CPAN
<http://search.cpan.org/dist/Sphinx-Search>
ACKNOWLEDGEMENTS¶
This module is based on Sphinx.pm (not deployed to CPAN) for Sphinx version
0.9.7-rc1, by Len Kranendonk, which was in turn based on the Sphinx PHP API.
Thanks to Alexey Kholodkov for contributing a significant patch for handling
persistent connections.
COPYRIGHT & LICENSE¶
Copyright 2012 Jon Schutz, all rights reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License.