.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.40) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "Text::English 3pm" .TH Text::English 3pm "2021-01-01" "perl v5.32.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" Text::English \- Porter's stemming algorithm .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 2 \& use Text::English; \& @stems = Text::English::stem( @words ); .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" This routine applies the Porter Stemming Algorithm to its parameters, returning the stemmed words. It is derived from the C program \*(L"stemmer.c\*(R" as found in freewais and elsewhere, which contains these notes: .PP .Vb 4 \& Purpose: Implementation of the Porter stemming algorithm documented \& in: Porter, M.F., "An Algorithm For Suffix Stripping," \& Program 14 (3), July 1980, pp. 130\-137. \& Provenance: Written by B. Frakes and C. Cox, 1986. .Ve .PP I have re-interpreted areas that use Frakes and Cox's \*(L"WordSize\*(R" function. My version may misbehave on short words starting with \*(L"y\*(R", but I can't think of any examples. .PP The step numbers correspond to Frakes and Cox, and are probably in Porter's article (which I've not seen). Porter's algorithm still has rough spots (e.g current/currency, \-ings words), which I've not attempted to cure, although I have added support for the British \-ise suffix. .SH "NOTES" .IX Header "NOTES" This is version 0.1. I would welcome feedback, especially improvements to the punctuation-stripping step. .SH "AUTHOR" .IX Header "AUTHOR" Ian Phillipps .SH "COPYRIGHT" .IX Header "COPYRIGHT" Copyright Public \s-1IP\s0 Exchange Ltd (\s-1PIPEX\s0). Available for use under the same terms as perl.