NAME¶
Lucy::Index::Indexer - Build inverted indexes.
SYNOPSIS¶
my $indexer = Lucy::Index::Indexer->new(
schema => $schema,
index => '/path/to/index',
create => 1,
);
while ( my ( $title, $content ) = each %source_docs ) {
$indexer->add_doc({
title => $title,
content => $content,
});
}
$indexer->commit;
DESCRIPTION¶
The Indexer class is Apache Lucy's primary tool for managing the content of
inverted indexes, which may later be searched using IndexSearcher.
In general, only one Indexer at a time may write to an index safely. If a write
lock cannot be secured,
new() will throw an exception.
If an index is located on a shared volume, each writer application must identify
itself by supplying an IndexManager with a unique "host" id to
Indexer's constructor or index corruption will occur. See
Lucy::Docs::FileLocking for a detailed discussion.
Note: at present,
delete_by_term() and
delete_by_query() only
affect documents which had been previously committed to the index -- and not
any documents added this indexing session but not yet committed. This may
change in a future update.
CONSTRUCTORS¶
new( [labeled params] )¶
my $indexer = Lucy::Index::Indexer->new(
schema => $schema, # required at index creation
index => '/path/to/index', # required
create => 1, # default: 0
truncate => 1, # default: 0
manager => $manager # default: created internally
);
- •
- schema - A Schema. Required when index is being created; if not
supplied, will be extracted from the index folder.
- •
- index - Either a filepath to an index or a Folder.
- •
- create - If true and the index directory does not exist, attempt to
create it.
- •
- truncate - If true, proceed with the intention of discarding all
previous indexing data. The old data will remain intact and visible until
commit() succeeds.
- •
- manager - An IndexManager.
METHODS¶
add_doc(...)¶
$indexer->add_doc($doc);
$indexer->add_doc( { field_name => $field_value } );
$indexer->add_doc(
doc => { field_name => $field_value },
boost => 2.5, # default: 1.0
);
Add a document to the index. Accepts either a single argument or labeled params.
- •
- doc - Either a Lucy::Document::Doc object, or a hashref (which will
be attached to a Lucy::Document::Doc object internally).
- •
- boost - A floating point weight which affects how this document
scores.
add_index(index)¶
Absorb an existing index into this one. The two indexes must have matching
Schemas.
- •
- index - Either an index path name or a Folder.
optimize()¶
Optimize the index for search-time performance. This may take a while, as it can
involve rewriting large amounts of data.
commit()¶
Commit any changes made to the index. Until this is called, none of the changes
made during an indexing session are permanent.
Calling
commit() invalidates the Indexer, so if you want to make more
changes you'll need a new one.
prepare_commit()¶
Perform the expensive setup for
commit() in advance, so that
commit() completes quickly. (If
prepare_commit() is not called
explicitly by the user,
commit() will call it internally.)
delete_by_term( [labeled params] )¶
Mark documents which contain the supplied term as deleted, so that they will be
excluded from search results and eventually removed altogether. The change is
not apparent to search apps until after
commit() succeeds.
- •
- field - The name of an indexed field. (If it is not spec'd as
"indexed", an error will occur.)
- •
- term - The term which identifies docs to be marked as deleted. If
"field" is associated with an Analyzer, "term" will be
processed automatically (so don't pre-process it yourself).
delete_by_query(query)¶
Mark documents which match the supplied Query as deleted.
- •
- query - A Query.
INHERITANCE¶
Lucy::Index::Indexer isa Lucy::Object::Obj.