.\" Automatically generated by Pod::Man 4.09 (Pod::Simple 3.35) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .if !\nF .nr F 0 .if \nF>0 \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} .\} .\" ======================================================================== .\" .IX Title "Data::Munge 3pm" .TH Data::Munge 3pm "2017-11-20" "perl v5.26.1" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" Data::Munge \- various utility functions .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use Data::Munge; \& \& my $re = list2re qw/f ba foo bar baz/; \& # $re = qr/bar|baz|foo|ba|f/; \& \& print byval { s/foo/bar/ } $text; \& # print do { my $tmp = $text; $tmp =~ s/foo/bar/; $tmp }; \& \& foo(mapval { chomp } @lines); \& # foo(map { my $tmp = $_; chomp $tmp; $tmp } @lines); \& \& print replace(\*(AqApples are round, and apples are juicy.\*(Aq, qr/apples/i, \*(Aqoranges\*(Aq, \*(Aqg\*(Aq); \& # "oranges are round, and oranges are juicy." \& print replace(\*(AqJohn Smith\*(Aq, qr/(\ew+)\es+(\ew+)/, \*(Aq$2, $1\*(Aq); \& # "Smith, John" \& \& my $trimmed = trim " a b c "; # "a b c" \& \& my $x = \*(Aqbar\*(Aq; \& if (elem $x, [qw(foo bar baz)]) { ... } \& \& my $contents = slurp $fh; # or: slurp *STDIN \& \& eval_string(\*(Aqprint "hello world\e\en"\*(Aq); # says hello \& eval_string(\*(Aqdie\*(Aq); # dies \& eval_string(\*(Aq{\*(Aq); # throws a syntax error \& \& my $fac = rec { \& my ($rec, $n) = @_; \& $n < 2 ? 1 : $n * $rec\->($n \- 1) \& }; \& print $fac\->(5); # 120 \& \& if ("hello, world!" =~ /(\ew+), (\ew+)/) { \& my @captured = submatches; \& # @captured = ("hello", "world") \& } .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" This module defines a few generally useful utility functions. I got tired of redefining or working around them, so I wrote this module. .SS "Functions" .IX Subsection "Functions" .IP "list2re \s-1LIST\s0" 4 .IX Item "list2re LIST" Converts a list of strings to a regex that matches any of the strings. Especially useful in combination with \f(CW\*(C`keys\*(C'\fR. Example: .Sp .Vb 2 \& my $re = list2re keys %hash; \& $str =~ s/($re)/$hash{$1}/g; .Ve .Sp This function takes special care to get several edge cases right: .RS 4 .IP "\(bu" 4 Empty list: An empty argument list results in a regex that doesn't match anything. .IP "\(bu" 4 Empty string: An argument list consisting of a single empty string results in a regex that matches the empty string (and nothing else). .IP "\(bu" 4 Prefixes: The input strings are sorted by descending length to ensure longer matches are tried before shorter matches. Otherwise \f(CW\*(C`list2re(\*(Aqab\*(Aq, \*(Aqabcd\*(Aq)\*(C'\fR would generate \f(CW\*(C`qr/ab|abcd/\*(C'\fR, which (on its own) can never match \f(CW\*(C`abcd\*(C'\fR (because \f(CW\*(C`ab\*(C'\fR is tried first, and it always succeeds where \f(CW\*(C`abcd\*(C'\fR could). .RE .RS 4 .RE .IP "byval \s-1BLOCK SCALAR\s0" 4 .IX Item "byval BLOCK SCALAR" Takes a code block and a value, runs the block with \f(CW$_\fR set to that value, and returns the final value of \f(CW$_\fR. The global value of \f(CW$_\fR is not affected. \f(CW$_\fR isn't aliased to the input value either, so modifying \f(CW$_\fR in the block will not affect the passed in value. Example: .Sp .Vb 3 \& foo(byval { s/!/?/g } $str); \& # Calls foo() with the value of $str, but all \*(Aq!\*(Aq have been replaced by \*(Aq?\*(Aq. \& # $str itself is not modified. .Ve .Sp Since perl 5.14 you can also use the \f(CW\*(C`/r\*(C'\fR flag: .Sp .Vb 1 \& foo($str =~ s/!/?/gr); .Ve .Sp But \f(CW\*(C`byval\*(C'\fR works on all versions of perl and is not limited to \f(CW\*(C`s///\*(C'\fR. .IP "mapval \s-1BLOCK LIST\s0" 4 .IX Item "mapval BLOCK LIST" Works like a combination of \f(CW\*(C`map\*(C'\fR and \f(CW\*(C`byval\*(C'\fR; i.e. it behaves like \&\f(CW\*(C`map\*(C'\fR, but \f(CW$_\fR is a copy, not aliased to the current element, and the return value is taken from \f(CW$_\fR again (it ignores the value returned by the block). Example: .Sp .Vb 4 \& my @foo = mapval { chomp } @bar; \& # @foo contains a copy of @bar where all elements have been chomp\*(Aqd. \& # This could also be written as chomp(my @foo = @bar); but that\*(Aqs not \& # always possible. .Ve .IP "submatches" 4 .IX Item "submatches" Returns a list of the strings captured by the last successful pattern match. Normally you don't need this function because this is exactly what \f(CW\*(C`m//\*(C'\fR returns in list context. However, \f(CW\*(C`submatches\*(C'\fR also works in other contexts such as the \s-1RHS\s0 of \f(CW\*(C`s//.../e\*(C'\fR. .IP "replace \s-1STRING, REGEX, REPLACEMENT, FLAG\s0" 4 .IX Item "replace STRING, REGEX, REPLACEMENT, FLAG" .PD 0 .IP "replace \s-1STRING, REGEX, REPLACEMENT\s0" 4 .IX Item "replace STRING, REGEX, REPLACEMENT" .PD A clone of javascript's \f(CW\*(C`String.prototype.replace\*(C'\fR. It works almost the same as \f(CW\*(C`byval { s/REGEX/REPLACEMENT/FLAG } STRING\*(C'\fR, but with a few important differences. \s-1REGEX\s0 can be a string or a compiled \f(CW\*(C`qr//\*(C'\fR object. \s-1REPLACEMENT\s0 can be a string or a subroutine reference. If it's a string, it can contain the following replacement patterns: .RS 4 .IP "$$" 4 Inserts a '$'. .IP "$&" 4 Inserts the matched substring. .IP "$`" 4 Inserts the substring preceding the match. .IP "$'" 4 Inserts the substring following the match. .ie n .IP "$N (where N is a digit)" 4 .el .IP "\f(CW$N\fR (where N is a digit)" 4 .IX Item "$N (where N is a digit)" Inserts the substring matched by the Nth capturing group. .IP "${N} (where N is one or more digits)" 4 .IX Item "${N} (where N is one or more digits)" Inserts the substring matched by the Nth capturing group. .RE .RS 4 .Sp Note that these aren't variables; they're character sequences interpreted by \&\f(CW\*(C`replace\*(C'\fR. .Sp If \s-1REPLACEMENT\s0 is a subroutine reference, it's called with the following arguments: First the matched substring (like \f(CW$&\fR above), then the contents of the capture buffers (as returned by \f(CW\*(C`submatches\*(C'\fR), then the offset where the pattern matched (like \f(CW\*(C`$\-[0]\*(C'\fR, see \*(L"@\-\*(R" in perlvar), then the \s-1STRING.\s0 The return value will be inserted in place of the matched substring. .Sp Normally only the first occurrence of \s-1REGEX\s0 is replaced. If \s-1FLAG\s0 is present, it must be \f(CW\*(Aqg\*(Aq\fR and causes all occurrences to be replaced. .RE .IP "trim \s-1STRING\s0" 4 .IX Item "trim STRING" Returns \fI\s-1STRING\s0\fR with all leading and trailing whitespace removed. Like \&\f(CW\*(C`length\*(C'\fR it returns \f(CW\*(C`undef\*(C'\fR if the input is \f(CW\*(C`undef\*(C'\fR. .IP "elem \s-1SCALAR, ARRAYREF\s0" 4 .IX Item "elem SCALAR, ARRAYREF" Returns a boolean value telling you whether \fI\s-1SCALAR\s0\fR is an element of \&\fI\s-1ARRAYREF\s0\fR or not. Two scalars are considered equal if they're both \f(CW\*(C`undef\*(C'\fR, if they're both references to the same thing, or if they're both not references and \f(CW\*(C`eq\*(C'\fR to each other. .Sp This is implemented as a linear search through \fI\s-1ARRAYREF\s0\fR that terminates early if a match is found (i.e. \f(CW\*(C`elem \*(AqA\*(Aq, [\*(AqA\*(Aq, 1 .. 9999]\*(C'\fR won't even look at elements \f(CW\*(C`1 .. 9999\*(C'\fR). .IP "eval_string \s-1STRING\s0" 4 .IX Item "eval_string STRING" Evals \fI\s-1STRING\s0\fR just like \f(CW\*(C`eval\*(C'\fR but doesn't catch exceptions. Caveat: Unlike with \f(CW\*(C`eval\*(C'\fR the code runs in an empty lexical scope: .Sp .Vb 3 \& my $foo = "Hello, world!\en"; \& eval_string \*(Aqprint $foo\*(Aq; \& # Dies: Global symbol "$foo" requires explicit package name .Ve .Sp That is, the eval'd code can't see variables from the scope of the \&\f(CW\*(C`eval_string\*(C'\fR call. .IP "slurp \s-1FILEHANDLE\s0" 4 .IX Item "slurp FILEHANDLE" Reads and returns all remaining data from \fI\s-1FILEHANDLE\s0\fR as a string, or \&\f(CW\*(C`undef\*(C'\fR if it hits end-of-file. (Interaction with non-blocking filehandles is currently not well defined.) .Sp \&\f(CW\*(C`slurp $handle\*(C'\fR is equivalent to \f(CW\*(C`do { local $/; scalar readline $handle }\*(C'\fR. .IP "rec \s-1BLOCK\s0" 4 .IX Item "rec BLOCK" Creates an anonymous sub as \f(CW\*(C`sub BLOCK\*(C'\fR would, but supplies the called sub with an extra argument that can be used to recurse: .Sp .Vb 6 \& my $code = rec { \& my ($rec, $n) = @_; \& $rec\->($n \- 1) if $n > 0; \& print $n, "\en"; \& }; \& $code\->(4); .Ve .Sp That is, when the sub is called, an implicit first argument is passed in \&\f(CW$_[0]\fR (all normal arguments are moved one up). This first argument is a reference to the sub itself. This reference could be used to recurse directly or to register the sub as a handler in an event system, for example. .Sp A note on defining recursive anonymous functions: Doing this right is more complicated than it may at first appear. The most straightforward solution using a lexical variable and a closure leaks memory because it creates a reference cycle. Starting with perl 5.16 there is a \f(CW\*(C`_\|_SUB_\|_\*(C'\fR constant that is equivalent to \f(CW$rec\fR above, and this is indeed what this module uses (if available). .Sp However, this module works even on older perls by falling back to either weak references (if available) or a \*(L"fake recursion\*(R" scheme that dynamically instantiates a new sub for each call instead of creating a cycle. This last resort is slower than weak references but works everywhere. .SH "AUTHOR" .IX Header "AUTHOR" Lukas Mai, \f(CW\*(C`\*(C'\fR .SH "COPYRIGHT & LICENSE" .IX Header "COPYRIGHT & LICENSE" Copyright 2009\-2011, 2013\-2015 Lukas Mai. .PP This program is free software; you can redistribute it and/or modify it under the terms of either: the \s-1GNU\s0 General Public License as published by the Free Software Foundation; or the Artistic License. .PP See http://dev.perl.org/licenses/ for more information.