.\" Automatically generated by Pod::Man 4.10 (Pod::Simple 3.35) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "re::engine::RE2 3pm" .TH re::engine::RE2 3pm "2019-06-06" "perl v5.28.1" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" re::engine::RE2 \- RE2 regex engine .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use re::engine::RE2; \& \& if ("Hello, world" =~ /Hello, (world)/) { \& print "Greetings, $1!"; \& } .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" This module replaces perl's regex engine in a given lexical scope with \s-1RE2.\s0 .PP \&\s-1RE2\s0 is a primarily \s-1DFA\s0 based regexp engine from Google that is very fast at matching large amounts of text. However it does not support look behind and some other Perl regular expression features. See \&\s-1RE2\s0's website for more information. .PP Fallback to normal Perl regexp is implemented by this module. If \s-1RE2\s0 is unable to compile a regexp it will use Perl instead, therefore features not implemented by \s-1RE2\s0 don't suddenly stop working, they will just use Perl's regexp implementation. .SH "METHODS" .IX Header "METHODS" To access extra functionality of \s-1RE2\s0 methods can be called on a compiled regular expression (i.e. a \f(CW\*(C`qr//\*(C'\fR). .IP "\(bu" 4 \&\f(CW\*(C`possible_match_range([length = 10])\*(C'\fR .Sp Returns an array of two strings: where the expression will start matching and just after where it will finish matching. See \s-1RE2\s0's documentation on PossibleMatchRange for further details. .Sp Example: .Sp .Vb 3 \& my($min, $max) = qr/^(a|b)/\->possible_match_range; \& is $min, \*(Aqa\*(Aq; \& is $max, \*(Aqc\*(Aq;\*(Aq .Ve .SH "PRAGMA OPTIONS" .IX Header "PRAGMA OPTIONS" Various options can be set by providing options to the \f(CW\*(C`use\*(C'\fR line. These will be pragma scoped. .IP "\(bu" 4 \&\f(CW\*(C`\-max_mem => 1<<24\*(C'\fR .Sp Configure \s-1RE2\s0's memory limit. .IP "\(bu" 4 \&\f(CW\*(C`\-strict => 1\*(C'\fR .Sp Be strict, i.e. don't allow regexps that are not supported by \s-1RE2.\s0 .IP "\(bu" 4 \&\f(CW\*(C`\-longest_match => 1\*(C'\fR .Sp Match on the longest match in alternations. For example with this option set matching \f(CW"abc"\fR against \f(CW\*(C`(a|abc)\*(C'\fR will match \f(CW"abc"\fR, without depending on order. .IP "\(bu" 4 \&\f(CW\*(C`\-never_nl => 1\*(C'\fR .Sp Never match a newline (\f(CW"\en"\fR) even if the provided regexp contains it. .SH "PERFORMANCE" .IX Header "PERFORMANCE" Performance is really the primary reason for using \s-1RE2,\s0 so here's some benchmarks. Like any benchmark take them with a pinch of salt. .SS "Simple matching" .IX Subsection "Simple matching" .Vb 3 \& my $foo = "foo bar baz"; \& $foo =~ /foo/; \& $foo =~ /foox/; .Ve .PP On this very simple match \s-1RE2\s0 is actually slower: .PP .Vb 3 \& Rate re2 re \& re2 674634/s \-\- \-76% \& re 2765739/s 310% \-\- .Ve .SS "\s-1URL\s0 matching" .IX Subsection "URL matching" Matching \f(CW\*(C`m{([a\-zA\-Z][a\-zA\-Z0\-9]*)://([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)}\*(C'\fR against a several \s-1KB\s0 file: .PP .Vb 3 \& Rate re re2 \& re 35.2/s \-\- \-99% \& re2 2511/s 7037% \-\- .Ve .SS "Many alternatives" .IX Subsection "Many alternatives" Matching a string against a regexp with 17,576 alternatives (\f(CW\*(C`aaa .. zzz\*(C'\fR). .PP This uses trie matching on Perl (obviously \s-1RE2\s0 does similar by default). .PP .Vb 4 \& $ perl misc/altern.pl \& Rate re re2 \& re 52631/s \-\- \-91% \& re2 554938/s 954% \-\- .Ve .SH "NOTES" .IX Header "NOTES" .IP "\(bu" 4 No support for \f(CW\*(C`m//x\*(C'\fR .Sp The \f(CW\*(C`/x\*(C'\fR modifier is not supported. (There's no particular reason for this, just \s-1RE2\s0 itself doesn't support it). Fallback to Perl regexp will happen automatically if \f(CW\*(C`//x\*(C'\fR is used. .IP "\(bu" 4 \&\*(L"re2/dfa.cc:447: \s-1DFA\s0 out of memory: prog size xxx mem yyy\*(R" .Sp If you attempt to compile a really large regular expression you may get this error. \s-1RE2\s0 has an internal limit on memory consumption for the \s-1DFA\s0 state tables. By default this is 8 MiB. .Sp If you need to increase this size then use the max_mem parameter: .Sp .Vb 1 \& use re::engine::RE2 \-max_mem => 8<<23; # 64MiB .Ve .IP "\(bu" 4 How do I tell if \s-1RE2\s0 will be used? .Sp See if your regexp is matching quickly or slowly ;). .Sp Alternatively normal \s-1OO\s0 concepts apply and you may examine the object returned by \f(CW\*(C`qr//\*(C'\fR: .Sp .Vb 1 \& use re::engine::RE2; \& \& ok qr/foo/\->isa("re::engine::RE2"); \& \& # Perl Regexp used instead \& ok not qr/(?<=foo)bar/\->isa("re::engine::RE2"); .Ve .Sp If you wish to force \s-1RE2,\s0 use the \f(CW\*(C`\-strict\*(C'\fR option. .SH "BUGS" .IX Header "BUGS" Known issues: .IP "\(bu" 4 Unicode handling .Sp Currently the Unicode handling of re::engine::RE2 does not fully match Perl's behaviour. .Sp The \s-1UTF\-8\s0 flag of the regexp currently determines how the string is matched. This is obviously broken, so will be fixed at some point. .IP "\(bu" 4 Final newline matching differs to Perl .Sp .Vb 1 \& "\en" =~ /$/ .Ve .Sp The above is true in Perl, false in \s-1RE2.\s0 To work around the issue you can write \&\f(CW\*(C`\en?\ez\*(C'\fR when you mean Perl's \f(CW\*(C`$\*(C'\fR. .PP Please report bugs via \s-1RT\s0 in the normal way. (Or a patch at would be most welcome.) .SH "AUTHORS" .IX Header "AUTHORS" David Leadbeater .SH "COPYRIGHT" .IX Header "COPYRIGHT" Copyright 2010 David Leadbeater. .PP Based on re::engine::PCRE: .PP Copyright 2007 Ævar Arnfjörð Bjarmason. .PP The original version was copyright 2006 Audrey Tang and Yves Orton. .PP This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. .PP (However the bundled copy of \s-1RE2\s0 has a different copyright owner and is under a BSD-like license, see \fIre2/LICENSE\fR.)