NAME¶
KinoSearch1::InvIndexer - build inverted indexes
SYNOPSIS¶
use KinoSearch1::InvIndexer;
use KinoSearch1::Analysis::PolyAnalyzer;
my $analyzer
= KinoSearch1::Analysis::PolyAnalyzer->new( language => 'en' );
my $invindexer = KinoSearch1::InvIndexer->new(
invindex => '/path/to/invindex',
create => 1,
analyzer => $analyzer,
);
$invindexer->spec_field(
name => 'title'
boost => 3,
);
$invindexer->spec_field( name => 'bodytext' );
while ( my ( $title, $bodytext ) = each %source_documents ) {
my $doc = $invindexer->new_doc($title);
$doc->set_value( title => $title );
$doc->set_value( bodytext => $bodytext );
$invindexer->add_doc($doc);
}
$invindexer->finish;
DESCRIPTION¶
The InvIndexer class is KinoSearch1's primary tool for creating and modifying
inverted indexes, which may be searched using KinoSearch1::Searcher.
METHODS¶
new¶
my $invindexer = KinoSearch1::InvIndexer->new(
invindex => '/path/to/invindex', # required
create => 1, # default: 0
analyzer => $analyzer, # default: no-op Analyzer
);
Create an InvIndexer object.
- •
- invindex - can be either a filepath, or an InvIndex subclass such
as KinoSearch1::Store::FSInvIndex or KinoSearch1::Store::RAMInvIndex.
- •
- create - create a new invindex, clobbering an existing one if
necessary.
- •
- analyzer - an object which subclasses
KinoSearch1::Analysis::Analyzer, such as a PolyAnalyzer.
spec_field¶
$invindexer->spec_field(
name => 'url', # required
boost => 1, # default: 1,
analyzer => undef, # default: analyzer spec'd in new()
indexed => 0, # default: 1
analyzed => 0, # default: 1
stored => 1, # default: 1
compressed => 0, # default: 0
vectorized => 0, # default: 1
);
Define a field.
- •
- name - the field's name.
- •
- boost - A multiplier which determines how much a field contributes
to a document's score.
- •
- analyzer - By default, all indexed fields are analyzed using the
analyzer that was supplied to new(). Supplying an alternate for a
given field overrides the primary analyzer.
- •
- indexed - index the field, so that it can be searched later.
- •
- analyzed - analyze the field, using the relevant Analyzer. Fields
such as "category" or "product_number" might be
indexed but not analyzed.
- •
- stored - store the field, so that it can be retrieved when the
document turns up in a search.
- •
- compressed - compress the stored field, using the zlib compression
algorithm.
- •
- vectorized - store the field's "term vectors", which are
required by KinoSearch1::Highlight::Highlighter for excerpt selection and
search term highlighting.
new_doc¶
my $doc = $invindexer->new_doc;
Spawn an empty KinoSearch1::Document::Doc object, primed to accept values for
the fields spec'd by spec_field.
add_doc¶
$invindexer->add_doc($doc);
Add a document to the invindex.
add_invindexes¶
my $invindexer = KinoSearch1::InvIndexer->new(
invindex => $invindex,
analyzer => $analyzer,
);
$invindexer->add_invindexes( $another_invindex, $yet_another_invindex );
$invindexer->finish;
Absorb existing invindexes into this one. May only be called once per
InvIndexer.
add_invindexes() and
add_doc() cannot be called on
the same InvIndexer.
delete_docs_by_term¶
my $term = KinoSearch1::Index::Term->new( 'id', $unique_id );
$invindexer->delete_docs_by_term($term);
Mark any document which contains the supplied term as deleted, so that it will
be excluded from search results. For more info, see Deletions in
KinoSearch1::Docs::FileFormat.
finish¶
$invindexer->finish(
optimize => 1, # default: 0
);
Finish the invindex. Invalidates the InvIndexer. Takes one hash-style parameter.
- •
- optimize - If optimize is set to 1, the invindex will be collapsed
to its most compact form, which will yield the fastest queries.
COPYRIGHT¶
Copyright 2005-2010 Marvin Humphrey
LICENSE, DISCLAIMER, BUGS, etc.¶
See KinoSearch1 version 1.01.