.\" Automatically generated by Pod::Man 4.11 (Pod::Simple 3.35) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "Perl::Critic::Policy::RegularExpressions::ProhibitComplexRegexes 3pm" .TH Perl::Critic::Policy::RegularExpressions::ProhibitComplexRegexes 3pm "2020-05-17" "perl v5.30.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" Perl::Critic::Policy::RegularExpressions::ProhibitComplexRegexes \- Split long regexps into smaller "qr//" chunks. .SH "AFFILIATION" .IX Header "AFFILIATION" This Policy is part of the core Perl::Critic distribution. .SH "DESCRIPTION" .IX Header "DESCRIPTION" Big regexps are hard to read, perhaps even the hardest part of Perl. A good practice to write digestible chunks of regexp and put them together. This policy flags any regexp that is longer than \f(CW\*(C`N\*(C'\fR characters, where \f(CW\*(C`N\*(C'\fR is a configurable value that defaults to 60. If the regexp uses the \f(CW\*(C`x\*(C'\fR flag, then the length is computed after parsing out any comments or whitespace. .PP Unfortunately the use of descriptive (and therefore longish) variable names can cause regexps to be in violation of this policy, so interpolated variables are counted as 4 characters no matter how long their names actually are. .SH "CASE STUDY" .IX Header "CASE STUDY" As an example, look at the regexp used to match email addresses in Email::Valid::Loose (tweaked lightly to wrap for \s-1POD\s0) .PP .Vb 8 \& (?x\-ism:(?:[^(\e040)<>@,;:".\e\e\e[\e]\e000\-\e037\ex80\-\exff]+(?![^(\e040)<>@,;:".\e\e\e[\e] \& \e000\-\e037\ex80\-\exff])|"[^\e\e\ex80\-\exff\en\e015"]*(?:\e\e[^\ex80\-\exff][^\e\e\ex80\-\exff\en\e015 \& "]*)*")(?:(?:[^(\e040)<>@,;:".\e\e\e[\e]\e000\-\e037\ex80\-\exff]+(?![^(\e040)<>@,;:".\e\e\e[ \& \e]\e000\-\e037\ex80\-\exff])|"[^\e\e\ex80\-\exff\en\e015"]*(?:\e\e[^\ex80\-\exff][^\e\e\ex80\-\exff\en \& \e015"]*)*")|\e.)*\e@(?:[^(\e040)<>@,;:".\e\e\e[\e]\e000\-\e037\ex80\-\exff]+(?![^(\e040)<>@, \& ;:".\e\e\e[\e]\e000\-\e037\ex80\-\exff])|\e[(?:[^\e\e\ex80\-\exff\en\e015\e[\e]]|\e\e[^\ex80\-\exff])*\e] \& )(?:\e.(?:[^(\e040)<>@,;:".\e\e\e[\e]\e000\-\e037\ex80\-\exff]+(?![^(\e040)<>@,;:".\e\e\e[\e]\e000 \& \-\e037\ex80\-\exff])|\e[(?:[^\e\e\ex80\-\exff\en\e015\e[\e]]|\e\e[^\ex80\-\exff])*\e]))*) .Ve .PP which is constructed from the following code: .PP .Vb 10 \& my $esc = \*(Aq\e\e\e\e\*(Aq; \& my $period = \*(Aq\e.\*(Aq; \& my $space = \*(Aq\e040\*(Aq; \& my $open_br = \*(Aq\e[\*(Aq; \& my $close_br = \*(Aq\e]\*(Aq; \& my $nonASCII = \*(Aq\ex80\-\exff\*(Aq; \& my $ctrl = \*(Aq\e000\-\e037\*(Aq; \& my $cr_list = \*(Aq\en\e015\*(Aq; \& my $qtext = qq/[^$esc$nonASCII$cr_list\e"]/; # " \& my $dtext = qq/[^$esc$nonASCII$cr_list$open_br$close_br]/; \& my $quoted_pair = qq<$esc>.qq<[^$nonASCII]>; \& my $atom_char = qq/[^($space)<>\e@,;:\e".$esc$open_br$close_br$ctrl$nonASCII]/;# " \& my $atom = qq<$atom_char+(?!$atom_char)>; \& my $quoted_str = qq<\e"$qtext*(?:$quoted_pair$qtext*)*\e">; # " \& my $word = qq<(?:$atom|$quoted_str)>; \& my $domain_ref = $atom; \& my $domain_lit = qq<$open_br(?:$dtext|$quoted_pair)*$close_br>; \& my $sub_domain = qq<(?:$domain_ref|$domain_lit)>; \& my $domain = qq<$sub_domain(?:$period$sub_domain)*>; \& my $local_part = qq<$word(?:$word|$period)*>; # This part is modified \& $Addr_spec_re = qr<$local_part\e@$domain>; .Ve .PP If you read the code from bottom to top, it is quite readable. And, you can even see the one violation of \s-1RFC822\s0 that Tatsuhiko Miyagawa deliberately put into Email::Valid::Loose to allow periods. Look for the \f(CW\*(C`|\e.\*(C'\fR in the upper regexp to see that same deviation. .PP One could certainly argue that the top regexp could be re-written more legibly with \f(CW\*(C`m//x\*(C'\fR and comments. But the bottom version is self-documenting and, for example, doesn't repeat \f(CW\*(C`\ex80\-\exff\*(C'\fR 18 times. Furthermore, it's much easier to compare the second version against the source \s-1BNF\s0 grammar in \s-1RFC 822\s0 to judge whether the implementation is sound even before running tests. .SH "CONFIGURATION" .IX Header "CONFIGURATION" This policy allows regexps up to \f(CW\*(C`N\*(C'\fR characters long, where \f(CW\*(C`N\*(C'\fR defaults to 60. You can override this to set it to a different number with the \f(CW\*(C`max_characters\*(C'\fR setting. To do this, put entries in a \&\fI.perlcriticrc\fR file like this: .PP .Vb 2 \& [RegularExpressions::ProhibitComplexRegexes] \& max_characters = 40 .Ve .SH "CREDITS" .IX Header "CREDITS" Initial development of this policy was supported by a grant from the Perl Foundation. .SH "AUTHOR" .IX Header "AUTHOR" Chris Dolan .SH "COPYRIGHT" .IX Header "COPYRIGHT" Copyright (c) 2007\-2011 Chris Dolan. Many rights reserved. .PP This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The full text of this license can be found in the \s-1LICENSE\s0 file included with this module