.\" Automatically generated by Pod::Man 2.25 (Pod::Simple 3.16) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "Text::RewriteRules 3pm" .TH Text::RewriteRules 3pm "2012-06-08" "perl v5.14.2" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" Text::RewriteRules \- A system to rewrite text using regexp\-based rules .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use Text::RewriteRules; \& \& RULES email \& \e.==> DOT \& @==> AT \& ENDRULES \& \& print email("ambs@cpan.org") # prints ambs AT cpan DOT org \& \& RULES/m inc \& (\ed+)=e=> $1+1 \& ENDRULES \& \& print inc("I saw 11 cats and 23 dogs") # prints I saw 12 cats and 24 dogs .Ve .SH "ABSTRACT" .IX Header "ABSTRACT" This module uses a simplified syntax for regexp-based rules for rewriting text. You define a set of rules, and the system applies them until no more rule can be applied. .PP Two variants are provided: .IP "1." 4 traditional rewrite (\s-1RULES\s0 function): .Sp .Vb 2 \& while it is possible do substitute \& | apply first substitution rule .Ve .IP "2." 4 cursor based rewrite (RULES/m function): .Sp .Vb 4 \& add a cursor to the beginning of the string \& while not reach end of string \& | apply substitute just after cursor and advance cursor \& | or advance cursor if no rule can be applied .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" A lot of computer science problems can be solved using rewriting rules. .PP Rewriting rules consist of mainly two parts: a regexp (\s-1LHS:\s0 Left Hand Side) that is matched with the text, and the string to use to substitute the content matched with the regexp (\s-1RHS:\s0 Right Hand Side). .PP Now, why don't use a simple substitute? Because we want to define a set of rules and match them again and again, until no more regexp of the \s-1LHS\s0 matches. .PP A point of discussion is the syntax to define this system. A brief discussion shown that some users would prefer a function to receive an hash with the rules, some other, prefer some syntax sugar. .PP The approach used is the last: we use \f(CW\*(C`Filter::Simple\*(C'\fR such that we can add a specific non-perl syntax inside the Perl script. This improves legibility of big rewriting rules systems. .PP This documentation is divided in two parts: first we will see the reference of the module. Kind of, what it does, with a brief explanation. Follows a tutorial which will be growing through time and releases. .SH "SYNTAX REFERENCE" .IX Header "SYNTAX REFERENCE" Note: most of the examples are very stupid, but that is the easiest way to explain the basic syntax. .PP The basic syntax for the rewrite rules is a block, started by the keyword \f(CW\*(C`RULES\*(C'\fR and ended by the \f(CW\*(C`ENDRULES\*(C'\fR. Everything between them is handled by the module and interpreted as rules or comments. .PP The \f(CW\*(C`RULES\*(C'\fR keyword can handle a set of flags (we will see that later), and requires a name for the rule-set. This name will be used to define a function for that rewriting system. .PP .Vb 3 \& RULES functioname \& ... \& ENDRULES .Ve .PP The function is defined in the main namespace where the \f(CW\*(C`RULES\*(C'\fR block appears. .PP In this block, each line can be a comment (Perl style), an empty line or a rule. .SS "Basic Rule" .IX Subsection "Basic Rule" A basic rule is a simple substitution: .PP .Vb 3 \& RULES foobar \& foo==>bar \& ENDRULES .Ve .PP The arrow \f(CW\*(C`==>\*(C'\fR is used as delimiter. At its left is the regexp to match, at the right side, the substitution. So, the previous block defines a \f(CW\*(C`foobar\*(C'\fR function that substitutes all \f(CW\*(C`foo\*(C'\fR by \&\f(CW\*(C`bar\*(C'\fR. .PP Although this can seems similar to a global substitution, it is not. With a global substitution you can't do an endless loop. With this module it is very simple. I know you will get the idea. .PP You can use the syntax of Perl both on the left and right hand side of the rule, including \f(CW\*(C`$1...\*(C'\fR. .SS "Execution Rule" .IX Subsection "Execution Rule" If the Perl substitution supports execution, why not to support it, also? So, you got the idea. Here is an example: .PP .Vb 4 \& RULES foo \& (\ed+)b=e=>\*(Aqb\*(Aq x $1 \& (\ed+)a=eval=>\*(Aqa\*(Aq x ($1*2) \& ENDRULES .Ve .PP So, for any number followed by a \f(CW\*(C`b\*(C'\fR, we replace by that number of \&\f(CW\*(C`b\*(Aqs\*(C'\fR. For each number followed by an \f(CW\*(C`a\*(C'\fR, we replace them by twice that number of \f(CW\*(C`a\*(Aqs\*(C'\fR. .PP Also, you mean evaluation using an \f(CW\*(C`e\*(C'\fR or \f(CW\*(C`eval\*(C'\fR inside the arrow. I should remind you can mix all these rules together in the same rewriting system. .SS "Conditional Rule" .IX Subsection "Conditional Rule" On some cases we want to perform a substitution if the pattern matches \&\fBand\fR a set of conditions about that pattern (or not) are true. .PP For that, we use a three part rule. We have the common rule plus the condition part, separated from the rule by \f(CW\*(C`!!\*(C'\fR. These conditional rules can be applied both for basic and execution rules. .PP .Vb 3 \& RULES translate \& ([[:alpha:]]+)=e=>$dic{$1}!! exists($dic{$1}) \& ENDRULES .Ve .PP The previous example would translate all words that exist on the dictionary. .SS "Begin Rule" .IX Subsection "Begin Rule" Sometimes it is useful to change something on the string before starting to apply the rules. For that, there is a special rule named \&\f(CW\*(C`begin\*(C'\fR (or \f(CW\*(C`b\*(C'\fR for abbreviate) just with a \s-1RHS\s0. This \s-1RHS\s0 is Perl code. Any Perl code. If you want to modify the string, use \f(CW$_\fR. .PP .Vb 3 \& RULES foo \& =b=> $_.=" END" \& ENDRULES .Ve .SS "Last Rule" .IX Subsection "Last Rule" As you use \f(CW\*(C`last\*(C'\fR on Perl to skip the remaining code on a loop, you can also call a \f(CW\*(C`last\*(C'\fR (or \f(CW\*(C`l\*(C'\fR) rule when a specific pattern matches. .PP Like the \f(CW\*(C`begin\*(C'\fR rule with only a \s-1RHS\s0, the \f(CW\*(C`last\*(C'\fR rule has only a \&\s-1LHS:\s0 .PP .Vb 3 \& RULES foo \& foobar=l=> \& ENDRULES .Ve .PP This way, the rules iterate until the string matches with \f(CW\*(C`foobar\*(C'\fR. .PP You can also supply a condition in a last rule: .PP .Vb 2 \& RULES bar \& f(o+)b(a+)r=l=> !! length($1) == 2 * length($2); .Ve .SS "Rules with /x mode" .IX Subsection "Rules with /x mode" It is possible to use the regular expressions /x mode in the rewrite rules. In this case: .IP "1." 4 there must be an empty line between rules .IP "2." 4 you can insert space and line breaks into the regular expression: .Sp .Vb 5 \& RULES/x f1 \& (\ed+) \& (\ed{3}) \& (000) \& ==>$1 milhao e $2 mil!! $1 == 1 \& \& ENDRULES .Ve .SH "POWER EXPRESSIONS" .IX Header "POWER EXPRESSIONS" To facilitate matching complex languages Text::RewriteRules defines a set of regular expressions that you can use (without defining them). .SS "Parenthesis" .IX Subsection "Parenthesis" There are three kind of usual parenthesis: the standard parenthesis, brackets or curly braces. You can match a balanced string of parenthesis using the power expressions \f(CW\*(C`[[:PB:]]\*(C'\fR, \f(CW\*(C`[[:BB:]]\*(C'\fR and \&\f(CW\*(C`[[:CBB:]]\*(C'\fR for these three kind of parenthesis. .PP For instance, if you apply this rule: .PP .Vb 1 \& [[:BB:]]==>foo .Ve .PP to this string .PP .Vb 1 \& something [ a [ b] c [d ]] and something more .Ve .PP then, you will get .PP .Vb 1 \& something foo and something more .Ve .PP Note that if you apply it to .PP .Vb 1 \& something [[ not ] balanced [ here .Ve .PP then you will get .PP .Vb 1 \& something [foo balanced [ here .Ve .SS "\s-1XML\s0 tags" .IX Subsection "XML tags" The power expression \f(CW\*(C`[[:XML:]]\*(C'\fR match a \s-1XML\s0 tag (with or without children \s-1XML\s0 tags. Note that this expression matches only well formed \&\s-1XML\s0 tags. .PP As an example, the rule .PP .Vb 1 \& [[:XML:]]=>tag .Ve .PP applied to the string .PP .Vb 1 \& and .Ve .PP will result in .PP .Vb 1 \& and tag .Ve .SH "TUTORIAL" .IX Header "TUTORIAL" At the moment, just a set of commented examples. .PP Example1 \*(-- from number to portuguese words (using traditional rewriting) .PP Example2 \*(-- Naif translator (using cursor-based rewriting) .SH "Conversion between numbers and words" .IX Header "Conversion between numbers and words" Yes, you can use Lingua::PT::Nums2Words and similar (for other languages). Meanwhile, before it existed we needed to write such a conversion tool. .PP Here I present a subset of the rules (for numbers bellow 1000). The generated text is Portuguese but I think you can get the idea. I'll try to create a version for English very soon. .PP You can check the full code on the samples directory (file \&\f(CW\*(C`num2words\*(C'\fR). .PP .Vb 1 \& use Text::RewriteRules; \& \& RULES num2words \& 100==>cem \& 1(\ed\ed)==>cento e $1 \& 0(\ed\ed)==>$1 \& 200==>duzentos \& 300==>trezentos \& 400==>quatrocentos \& 500==>quinhentos \& 600==>seiscentos \& 700==>setecentos \& 800==>oitocentos \& 900==>novecentos \& (\ed)(\ed\ed)==>${1}00 e $2 \& \& 10==>dez \& 11==>onze \& 12==>doze \& 13==>treze \& 14==>catorze \& 15==>quinze \& 16==>dezasseis \& 17==>dezassete \& 18==>dezoito \& 19==>dezanove \& 20==>vinte \& 30==>trinta \& 40==>quarenta \& 50==>cinquenta \& 60==>sessenta \& 70==>setenta \& 80==>oitenta \& 90==>noventa \& 0(\ed)==>$1 \& (\ed)(\ed)==>${1}0 e $2 \& \& 1==>um \& 2==>dois \& 3==>tre\*^s \& 4==>quatro \& 5==>cinco \& 6==>seis \& 7==>sete \& 8==>oito \& 9==>nove \& 0$==>zero \& 0==> \& ==> \& ,==>, \& ENDRULES \& \& num2words(123); # returns "cento e vinte e tre\*^s" .Ve .SS "Naif translator (using cursor-based rewriting)" .IX Subsection "Naif translator (using cursor-based rewriting)" .Vb 5 \& use Text::RewriteRules; \& %dict=(driver=>"motorista", \& the=>"o", \& of=>"de", \& car=>"carro"); \& \& $word=\*(Aq\eb\ew+\eb\*(Aq; \& \& if( b(a("I see the Driver of the car")) eq "(I) (see) o Motorista do carro" ) \& {print "ok\en"} \& else {print "ko\en"} \& \& RULES/m a \& ($word)==>$dict{$1}!! defined($dict{$1}) \& ($word)=e=> ucfirst($dict{lc($1)}) !! defined($dict{lc($1)}) \& ($word)==>($1) \& ENDRULES \& \& RULES/m b \& \ebde o\eb==>do \& ENDRULES .Ve .SH "AUTHOR" .IX Header "AUTHOR" Alberto Simo\*~es, \f(CW\*(C`\*(C'\fR .PP Jose\*' Joa\*~o Almeida, \f(CW\*(C`\*(C'\fR .SH "BUGS" .IX Header "BUGS" We know documentation is missing and you all want to use this module. In fact we are using it a lot, what explains why we don't have the time to write documentation. .PP Please report any bugs or feature requests to \&\f(CW\*(C`bug\-text\-rewrite@rt.cpan.org\*(C'\fR, or through the web interface at . I will be notified, and then you'll automatically be notified of progress on your bug as I make changes. .SH "ACKNOWLEDGEMENTS" .IX Header "ACKNOWLEDGEMENTS" Damian Conway for Filter::Simple .SH "COPYRIGHT & LICENSE" .IX Header "COPYRIGHT & LICENSE" Copyright 2004\-2009 Alberto Simo\*~es and Jose\*' Joa\*~o Almeida, All Rights Reserved. .PP This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.