NAME¶
Lucy::Analysis::Normalizer - Unicode normalization, case folding and accent
stripping
Normalizer is an Analyzer which normalizes tokens to one of the Unicode
normalization forms.
SYNOPSIS¶
my $normalizer = Lucy::Analysis::Normalizer->new;
my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
analyzers => [ $normalizer, $tokenizer, $stemmer ],
);
DESCRIPTION¶
Optionally, it performs Unicode case folding and converts accented characters to
their base character.
If you use highlighting, Normalizer should be run after tokenization because it
might add or remove characters.
CONSTRUCTORS¶
new( [labeled params] )¶
my $normalizer = Lucy::Analysis::Normalizer->new(
normalization_form => 'NFKC',
case_fold => 1,
strip_accents => 0,
);
- •
- normalization_form - Unicode normalization form, can be one of
'NFC', 'NFKC', 'NFD', 'NFKD'. Defaults to 'NFKC'.
- •
- case_fold - Perform case folding, default is true.
- •
- strip_accents - Strip accents, default is false.
INHERITANCE¶
Lucy::Analysis::Normalizer isa Lucy::Analysis::Analyzer isa
Lucy::Object::Obj.