.\" Automatically generated by Pod::Man 4.09 (Pod::Simple 3.35) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .if !\nF .nr F 0 .if \nF>0 \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "Lucy::Search::Compiler 3pm" .TH Lucy::Search::Compiler 3pm "2017-08-02" "perl v5.26.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" Lucy::Search::Compiler \- Query\-to\-Matcher compiler. .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 3 \& # (Compiler is an abstract base class.) \& package MyCompiler; \& use base qw( Lucy::Search::Compiler ); \& \& sub make_matcher { \& my $self = shift; \& return MyMatcher\->new( @_, compiler => $self ); \& } .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" The purpose of the Compiler class is to take a specification in the form of a Query object and compile a Matcher object that can do real work. .PP The simplest Compiler subclasses \*(-- such as those associated with constant-scoring Query types \*(-- might simply implement a \fImake_matcher()\fR method which passes along information verbatim from the Query to the Matcher's constructor. .PP However it is common for the Compiler to perform some calculations which affect it's \*(L"weight\*(R" \*(-- a floating point multiplier that the Matcher will factor into each document's score. If that is the case, then the Compiler subclass may wish to override \fIget_weight()\fR, \fIsum_of_squared_weights()\fR, and \&\fIapply_norm_factor()\fR. .PP Compiling a Matcher is a two stage process. .PP The first stage takes place during the Compiler's construction, which is where the Query object meets a Searcher object for the first time. Searchers operate on a specific document collection and they can tell you certain statistical information about the collection \*(-- such as how many total documents are in the collection, or how many documents in the collection a particular term is present in. Lucy's core Compiler classes plug this information into the classic \&\s-1TF/IDF\s0 weighting algorithm to adjust the Compiler's weight; custom subclasses might do something similar. .PP The second stage of compilation is \fImake_matcher()\fR, method, which is where the Compiler meets a SegReader object. SegReaders are associated with a single segment within a single index on a single machine, and are thus lower-level than Searchers, which may represent a document collection spread out over a search cluster (comprising several indexes and many segments). The Compiler object can use new information supplied by the SegReader \*(-- such as whether a term is missing from the local index even though it is present within the larger collection represented by the Searcher \*(-- when figuring out what to feed to the Matchers's constructor, or whether \fImake_matcher()\fR should return a Matcher at all. .SH "CONSTRUCTORS" .IX Header "CONSTRUCTORS" .SS "new( \fI[labeled params]\fP )" .IX Subsection "new( [labeled params] )" .Vb 6 \& my $compiler = MyCompiler\->SUPER::new( \& parent => $my_query, \& searcher => $searcher, \& similarity => $sim, # default: undef \& boost => undef, # default: see below \& ); .Ve .PP Abstract constructor. .IP "\(bu" 4 \&\fBparent\fR \- The parent Query. .IP "\(bu" 4 \&\fBsearcher\fR \- A Lucy::Search::Searcher, such as an IndexSearcher. .IP "\(bu" 4 \&\fBsimilarity\fR \- A Similarity. .IP "\(bu" 4 \&\fBboost\fR \- An arbitrary scoring multiplier. Defaults to the boost of the parent Query. .SH "ABSTRACT METHODS" .IX Header "ABSTRACT METHODS" .SS "make_matcher( \fI[labeled params]\fP )" .IX Subsection "make_matcher( [labeled params] )" Factory method returning a Matcher. .IP "\(bu" 4 \&\fBreader\fR \- A SegReader. .IP "\(bu" 4 \&\fBneed_score\fR \- Indicate whether the Matcher must implement \fIscore()\fR. .PP Returns: a Matcher, or undef if the Matcher would have matched no documents. .SH "METHODS" .IX Header "METHODS" .SS "\fIget_weight()\fP" .IX Subsection "get_weight()" Return the Compiler's numerical weight, a scoring multiplier. By default, returns the object's boost. .SS "\fIsum_of_squared_weights()\fP" .IX Subsection "sum_of_squared_weights()" Compute and return a raw weighting factor. (This quantity is used by \&\fInormalize()\fR). By default, simply returns 1.0. .SS "apply_norm_factor(factor)" .IX Subsection "apply_norm_factor(factor)" Apply a floating point normalization multiplier. For a TermCompiler, this involves multiplying its own weight by the supplied factor; combining classes such as ORCompiler would apply the factor recursively to their children. .PP The default implementation is a no-op; subclasses may wish to multiply their internal weight by the supplied factor. .IP "\(bu" 4 \&\fBfactor\fR \- The multiplier. .SS "\fInormalize()\fP" .IX Subsection "normalize()" Take a newly minted Compiler object and apply query-specific normalization factors. Should be invoked by Query subclasses during \&\fImake_compiler()\fR for top-level nodes. .PP For a TermQuery, the scoring formula is approximately: .PP .Vb 1 \& (tf_d * idf_t / norm_d) * (tf_q * idf_t / norm_q) .Ve .PP \&\fInormalize()\fR is theoretically concerned with applying the second half of that formula to a the Compiler's weight. What actually happens depends on how the Compiler and Similarity methods called internally are implemented. .SS "\fIget_parent()\fP" .IX Subsection "get_parent()" Accessor for the Compiler's parent Query object. .SS "\fIget_similarity()\fP" .IX Subsection "get_similarity()" Accessor for the Compiler's Similarity object. .SS "highlight_spans( \fI[labeled params]\fP )" .IX Subsection "highlight_spans( [labeled params] )" Return an array of Span objects, indicating where in the given field the text that matches the parent Query occurs and how well each snippet matches. The Span's offset and length are measured in Unicode code points. .PP The default implementation returns an empty array. .IP "\(bu" 4 \&\fBsearcher\fR \- A Searcher. .IP "\(bu" 4 \&\fBdoc_vec\fR \- A DocVector. .IP "\(bu" 4 \&\fBfield\fR \- The name of the field. .SH "INHERITANCE" .IX Header "INHERITANCE" Lucy::Search::Compiler isa Lucy::Search::Query isa Lucy::Object::Obj.