table of contents
other versions
- stretch 0.3.3-7+b1
Lucy::Analysis::Normalizer(3pm) | User Contributed Perl Documentation | Lucy::Analysis::Normalizer(3pm) |
NAME¶
Lucy::Analysis::Normalizer - Unicode normalization, case folding and accent strippingNormalizer is an Analyzer which normalizes tokens to one of the Unicode normalization forms.
SYNOPSIS¶
my $normalizer = Lucy::Analysis::Normalizer->new; my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new( analyzers => [ $normalizer, $tokenizer, $stemmer ], );
DESCRIPTION¶
Optionally, it performs Unicode case folding and converts accented characters to their base character.If you use highlighting, Normalizer should be run after tokenization because it might add or remove characters.
CONSTRUCTORS¶
new( [labeled params] )¶
my $normalizer = Lucy::Analysis::Normalizer->new( normalization_form => 'NFKC', case_fold => 1, strip_accents => 0, );
- normalization_form - Unicode normalization form, can be one of 'NFC', 'NFKC', 'NFD', 'NFKD'. Defaults to 'NFKC'.
- case_fold - Perform case folding, default is true.
- strip_accents - Strip accents, default is false.
INHERITANCE¶
Lucy::Analysis::Normalizer isa Lucy::Analysis::Analyzer isa Lucy::Object::Obj.2016-07-07 | perl v5.24.1 |