.\" Automatically generated by Pod::Man 2.28 (Pod::Simple 3.28) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{ . if \nF \{ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "Text::Levenshtein 3pm" .TH Text::Levenshtein 3pm "2014-10-26" "perl v5.20.1" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" Text::Levenshtein \- calculate the Levenshtein edit distance between two strings .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use Text::Levenshtein qw(distance); \& \& print distance("foo","four"); \& # prints "2" \& \& my @words = qw/ four foo bar /; \& my @distances = distance("foo",@words); \& \& print "@distances"; \& # prints "2 0 3" .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" This module implements the Levenshtein edit distance, which measures the difference between two strings, in terms of the \fIedit distance\fR. This distance is the number of substitutions, deletions or insertions (\*(L"edits\*(R") needed to transform one string into the other one (and vice versa). When two strings have distance 0, they are the same. .PP To learn more about the Levenshtein metric, have a look at the wikipedia page . .SS "\fIdistance()\fP" .IX Subsection "distance()" The simplest usage will take two strings and return the edit distance: .PP .Vb 2 \& $distance = distance(\*(Aqbrown\*(Aq, \*(Aqgreen\*(Aq); \& # returns 3, as \*(Aqr\*(Aq and \*(Aqn\*(Aq don\*(Aqt change .Ve .PP Instead of a single second string, you can pass a list of strings. Each string will be compared to the first string passed, and a list of the edit distances returned: .PP .Vb 3 \& @words = qw/ green trainee brains /; \& @distances = distances(\*(Aqbrown\*(Aq, @words); \& # returns (3, 5, 3) .Ve .SS "\fIfastdistance()\fP" .IX Subsection "fastdistance()" Previous versions of this module provided an alternative implementation, in the function \f(CW\*(C`fastdistance()\*(C'\fR. This function is still provided, for backwards compatibility, but they now run the same function to calculate the edit distance. .PP Unlike \f(CW\*(C`distance()\*(C'\fR, \f(CW\*(C`fastdistance()\*(C'\fR only takes two strings, and returns the edit distance between them. .SH "ignore_diacritics" .IX Header "ignore_diacritics" Both the \f(CW\*(C`distance()\*(C'\fR and \f(CW\*(C`fastdistance()\*(C'\fR functions can take a hashref with optional arguments, as the final argument. At the moment the only option is \f(CW\*(C`ignore_diacritics\*(C'\fR. If this is true, then any diacritics are ignored when calculating edit distance. For example, \*(L"cafe\*(R" and \*(L"cafe\*'\*(R" normally have an edit distance of 1, but when diacritics are ignored, the distance will be 0: .PP .Vb 2 \& use Text::Levenshtein 0.11 qw/ distance /; \& $distance = distance($word1, $word2, {ignore_diacritics => 1}); .Ve .PP If you turn on this option, then Unicode::Collate will be loaded, and used when comparing characters in the words. .PP Early version of \f(CW\*(C`Text::Levenshtein\*(C'\fR didn't support this version, so you should require version 0.11 or later, as above. .SH "SEE ALSO" .IX Header "SEE ALSO" There are many different modules on \s-1CPAN\s0 for calculating the edit distance between two strings. Here's just a selection. .PP Text::LevenshteinXS and Text::Levenshtein::XS are both versions of the Levenshtein algorithm that require a C compiler, but will be a lot faster than this module. .PP The Damerau-Levenshtein edit distance is like the Levenshtein distance, but in addition to insertion, deletion and substitution, it also considers the transposition of two adjacent characters to be a single edit. The module Text::Levenshtein::Damerau defaults to using a pure perl implementation, but if you've installed Text::Levenshtein::Damerau::XS then it will be a lot quicker. .PP Text::WagnerFischer is an implementation of the Wagner-Fischer edit distance, which is similar to the Levenshtein, but applies different weights to each edit type. .PP Text::Brew is an implementation of the Brew edit distance, which is another algorithm based on edit weights. .PP Text::Fuzzy provides a number of operations for partial or fuzzy matching of text based on edit distance. Text::Fuzzy::PP is a pure perl implementation of the same interface. .PP String::Similarity takes two strings and returns a value between 0 (meaning entirely different) and 1 (meaning identical). Apparently based on edit distance. .SH "REPOSITORY" .IX Header "REPOSITORY" .SH "AUTHOR" .IX Header "AUTHOR" Dree Mistrut originally wrote this module and released it to \s-1CPAN\s0 in 2002. .PP Josh Goldberg then took over maintenance and released versions between 2004 and 2008. .PP Neil Bowers (\s-1NEILB\s0 on \s-1CPAN\s0) is now maintaining this module. Version 0.07 was a complete rewrite, based on one of the algorithms on the wikipedia page. .SH "COPYRIGHT AND LICENSE" .IX Header "COPYRIGHT AND LICENSE" This software is copyright (C) 2002\-2004 Dree Mistrut. Copyright (C) 2004\-2014 Josh Goldberg. Copyright (C) 2014\- Neil Bowers. .PP This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.