.\" -*- mode: troff; coding: utf-8 -*-
.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
.ie n \{\
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
.    if \nF \{\
.        de IX
.        tm Index:\\$1\t\\n%\t"\\$2"
..
.        if !\nF==2 \{\
.            nr % 0
.            nr F 2
.        \}
.    \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "KinoSearch1::Analysis::Token 3pm"
.TH KinoSearch1::Analysis::Token 3pm 2024-03-10 "perl v5.38.2" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH NAME
KinoSearch1::Analysis::Token \- unit of text
.SH SYNOPSIS
.IX Header "SYNOPSIS"
.Vb 1
\&    # private class \- no public API
.Ve
.SH "PRIVATE CLASS"
.IX Header "PRIVATE CLASS"
You can't actually instantiate a Token object at the Perl level \-\- however,
you can affect individual Tokens within a TokenBatch by way of TokenBatch's
(experimental) API.
.SH DESCRIPTION
.IX Header "DESCRIPTION"
Token is the fundamental unit used by KinoSearch1's Analyzer subclasses.  Each
Token has 4 attributes: text, start_offset, end_offset, and pos_inc (for
position increment).
.PP
The text of a token is a string.
.PP
A Token's start_offset and end_offset locate it within a larger text, even if
the Token's text attribute gets modified \-\- by stemming, for instance.  The
Token for "beating" in the text "beating a dead horse" begins life with a
start_offset of 0 and an end_offset of 7; after stemming, the text is "beat",
but the end_offset is still 7.
.PP
The position increment, which defaults to 1, is a an advanced tool for
manipulating phrase matching.  Ordinarily, Tokens are assigned consecutive
position numbers: 0, 1, and 2 for "three blind mice".  However, if you set the
position increment for "blind" to, say, 1000, then the three tokens will end
up assigned to positions 0, 1, and 1001 \-\- and will no longer produce a phrase
match for the query '"three blind mice"'.
.SH COPYRIGHT
.IX Header "COPYRIGHT"
Copyright 2006\-2010 Marvin Humphrey
.SH "LICENSE, DISCLAIMER, BUGS, etc."
.IX Header "LICENSE, DISCLAIMER, BUGS, etc."
See KinoSearch1 version 1.01.