.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.42) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "DelimMatch 3pm" .TH DelimMatch 3pm "2022-10-13" "perl v5.34.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" Text::DelimMatch \- Perl extension to find regexp delimited strings with proper nesting .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use Text::DelimMatch; \& \& $mc = new Text::DelimMatch, $startdelim, $enddelim; \& \& $mc\->quote(\*(Aq"\*(Aq); \& $mc\->escape("\e\e"); \& $mc\->double_escape(\*(Aq"\*(Aq); \& $mc\->case_sensitive(1); \& \& ($prefix, $match, $remainder) = $mc\->match($string); \& ($prefix, $nextmatch, $remainder) = $mc\->match(); \& \& $middle = $mc\->strip_delim($match); # returns $match w/o start and end delim .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" These routines allow you to match delimited substrings in a buffer. The delimiters can be specified with any regular expression and the start and end delimiters need not be the same. If the delimited text is properly nested, entire nested groups are returned. .PP In addition, you may specify quoting and escaping characters that contribute to the recognition of start and end delimiters. .PP For example, if you specify the start and end delimiters as '\e(' and \&'\e)', respectively, and the double quote character as a quoting character, and the backslash as an escaping character, then the delimited substring in this buffer is \*(L"(ma(t)c\e)h)\*(R": .PP .Vb 1 \& \*(Aqprefix text "(quoted text)" \e(escaped \e" text) (ma(t)c\e)h) postfix text\*(Aq .Ve .PP In order to support this rather complex interface, the matching context is encapsulated in an object. The object, Text::DelimMatch, has the following public methods: .ie n .IP "new $start, $end, $escape, $dblesc, $qs1, $qe1, ... $qsn, $qen" 4 .el .IP "new \f(CW$start\fR, \f(CW$end\fR, \f(CW$escape\fR, \f(CW$dblesc\fR, \f(CW$qs1\fR, \f(CW$qe1\fR, ... \f(CW$qsn\fR, \f(CW$qen\fR" 4 .IX Item "new $start, $end, $escape, $dblesc, $qs1, $qe1, ... $qsn, $qen" Creates a new object. All of the arguments are optional, and can be set with other methods, but they must be passed in the specified order: start delimiter, end delimiter, escape characters, double escape characters, and a set of quote characters. .ie n .IP "match $string" 4 .el .IP "match \f(CW$string\fR" 4 .IX Item "match $string" In an array context, returns ($pre, \f(CW$match\fR, \f(CW$post\fR) where \f(CW$pre\fR is the text preceding the first match, \f(CW$match\fR is the matched text (including the delimiters), and \f(CW$post\fR is the rest of the text in the buffer. In a scalar context, returns \f(CW$match\fR. .Sp If \f(CW$string\fR is not provided on subsequent calls, the \f(CW$post\fR from the previous match is used, unless keep is false. If keep is false, the match always fails. .ie n .IP "strip_delim $string" 4 .el .IP "strip_delim \f(CW$string\fR" 4 .IX Item "strip_delim $string" Returns \f(CW$string\fR with the start and end delimiters removed. .ie n .IP "delim $start, $end" 4 .el .IP "delim \f(CW$start\fR, \f(CW$end\fR" 4 .IX Item "delim $start, $end" Set the start and end delimiters. Only one set of delimiters can be in use at any one time. .Sp Returns the delimters in use before this call. .ie n .IP "quote $startq, $endq" 4 .el .IP "quote \f(CW$startq\fR, \f(CW$endq\fR" 4 .IX Item "quote $startq, $endq" Specifies the start and end quote characters. Multiple quote character pairs are supported, so this function is additive. To clear the current settings, pass no arguments, e.g., \&\f(CW$mc\fR\->\fBquote()\fR. .Sp If only \f(CW$start\fR is passed, \f(CW$end\fR is assumed to be the same. .Sp In matching, quotes occur in pairs. In other words, if (\*(L",\*(R") and (',') are both specified as quote pairs and a string beginning with \*(L" is found, it is ended only by another \*(R", not by '. .Sp Returns the quote hash in use before this call. .ie n .IP "escape $esc" 4 .el .IP "escape \f(CW$esc\fR" 4 .IX Item "escape $esc" Specifies a set of escaping characters. This can only be a string of characters. \f(CW$esc\fR can be a regexp set or a simple string. If it is a simple string, it will be translated into the regexp set \&\*(L"[ quotemeta($esc) ]\*(R". .Sp Returns the escape characters in use before this call. .ie n .IP "double_escape $esc" 4 .el .IP "double_escape \f(CW$esc\fR" 4 .IX Item "double_escape $esc" Specifies a set of double-escaping characters, i.e., characters that are considered escaped if they occur in pairs. For example, in some languages, .Sp .Vb 1 \& \*(AqDon\*(Aq\*(Aqt you see?\*(Aq .Ve .Sp defines a string containing a single apostrophe. .Sp \&\f(CW$esc\fR can only be a string of characters. \f(CW$esc\fR can be a regexp set or a simple string. If it is a simple string, it will be translated into the regexp set \*(L"[ quotemeta($esc) ]\*(R". .Sp Returns the double-escaping characters in use before this call. .ie n .IP "case_sensitive $bool" 4 .el .IP "case_sensitive \f(CW$bool\fR" 4 .IX Item "case_sensitive $bool" Sets case sensitivity to \f(CW$bool\fR or true if \f(CW$bool\fR is not specified. .Sp Returns the case sensitivity in use before this call. .ie n .IP "keep $bool" 4 .el .IP "keep \f(CW$bool\fR" 4 .IX Item "keep $bool" Sets keep to \f(CW$bool\fR or true if \f(CW$bool\fR is not specified. .Sp Keep, which is true by default, specifies whether or not the matching context object keeps a local copy of the buffer used in matching. Keeping a local copy allows repeated matching on the same buffer, but might be a bad idea if the buffer is a terabyte long. ;\-) .Sp Returns the keep setting in use before this call. .ie n .IP "returndelim $bool" 4 .el .IP "returndelim \f(CW$bool\fR" 4 .IX Item "returndelim $bool" Sets returndelim to \f(CW$bool\fR or true if \f(CW$bool\fR is not specified. .Sp Returndelim, which is true by default, specifies whether or not the start and end delimiters are returned with the matching string. .Sp Returns the returndelim setting in use before this call. .ie n .IP "error $seterr" 4 .el .IP "error \f(CW$seterr\fR" 4 .IX Item "error $seterr" Returns the last error that occured. If \f(CW$seterr\fR is passed, the error is set to that value. Some common kinds of bad input are detected and an error condition is raised. If an error condition is raised, all matching fails until the error is cleared. .Sp The most common error is a bad regular expression, for example specifing the start delimiter as \*(L"(\*(R" instead of \*(L"\e\e(\*(R". Remember, these are regexps! .IP "pre_matched" 4 .IX Item "pre_matched" Returns the prefix text from the last match if keep is true. Sets an error and returns an empty string if keep is false. .IP "matched" 4 .IX Item "matched" Returns the matched text from the last match if keep is true. Sets an error and returns an empty string if keep is false. .IP "post_matched" 4 .IX Item "post_matched" Returns the postfix text from the last match if keep is true. Sets an error and returns an empty string if keep is false. .ie n .IP "debug $bool" 4 .el .IP "debug \f(CW$bool\fR" 4 .IX Item "debug $bool" Sets debug to \f(CW$bool\fR or true if \f(CW$bool\fR is not specified. .Sp If debug is true, informative and progress messages are printed to \s-1STDOUT\s0 by some methods. .Sp Returns the debugging setting in use before this call. .IP "dump" 4 .IX Item "dump" For debugging, prints all of the instance variables for a particular object. .ie n .IP "slow $bool" 4 .el .IP "slow \f(CW$bool\fR" 4 .IX Item "slow $bool" For debugging. Some classes of delimited strings can be located with much faster algorithms than can be used in the most general case. If slow is true, the slower, general algorithm is always used. .PP For simplicity, and backward compatibility with the previous (limited release) incarnation of this module, the following functions are also available directly: .ie n .IP "nested_match ($string, $start, $end, $three)" 4 .el .IP "nested_match ($string, \f(CW$start\fR, \f(CW$end\fR, \f(CW$three\fR)" 4 .IX Item "nested_match ($string, $start, $end, $three)" If \f(CW$three\fR is true, returns ($pre, \f(CW$match\fR, \f(CW$post\fR) in an array context otherwise returns (\*(L"$pre$match\*(R", \f(CW$post\fR). In a scalar context, returns \&\*(L"$pre$match\*(R". .ie n .IP "skip_nested_match ($string, $start, $end, $three)" 4 .el .IP "skip_nested_match ($string, \f(CW$start\fR, \f(CW$end\fR, \f(CW$three\fR)" 4 .IX Item "skip_nested_match ($string, $start, $end, $three)" If \f(CW$three\fR is true, returns ($pre, \f(CW$match\fR, \f(CW$post\fR) in an array context otherwise returns (\*(L"$pre$match\*(R", \f(CW$post\fR). In a scalar context, returns \&\f(CW$post\fR. .SH "EXAMPLES" .IX Header "EXAMPLES" .Vb 2 \& $mc = new Text::DelimMatch \*(Aq"\*(Aq; \& $mc\->(\*(Aqpre "match" post\*(Aq) == \*(Aq"match"\*(Aq; \& \& $mc\->delim("\e\e(", "\e\e)"); \& $mc\->(\*(Aqpre (match) post\*(Aq) == (\*(Aqpre \*(Aq, \*(Aq(match)\*(Aq, \*(Aq post\*(Aq); \& $mc\->(\*(Aqpre (ma(t)ch) post\*(Aq) == (\*(Aqpre \*(Aq, \*(Aq(ma(t)ch)\*(Aq, \*(Aq post\*(Aq); \& \& $mc\->quote(\*(Aq"\*(Aq); \& $mc\->escape("\e\e"); \& $mc\->(\*(Aqpre (ma")"tch) post\*(Aq) == (\*(Aqpre \*(Aq, \*(Aq(ma")"tch)\*(Aq, \*(Aq post\*(Aq); \& $mc\->(\*(Aqpre (ma(t)c\e)h\e") post\*(Aq) == (\*(Aqpre \*(Aq, \*(Aq(ma(t)c\e)h\e")\*(Aq, \*(Aq post\*(Aq); .Ve .PP See also test.pl in the distribution. .SH "AUTHOR" .IX Header "AUTHOR" Norman Walsh, ndw@nwalsh.com .SH "COPYRIGHT" .IX Header "COPYRIGHT" Copyright (C) 1997\-2002 Norman Walsh. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. .SH "WARRANTY" .IX Header "WARRANTY" \&\s-1THIS PACKAGE IS PROVIDED \*(L"AS IS\*(R" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.\s0 .SH "SEE ALSO" .IX Header "SEE ALSO" \&\fBperl\fR\|(1).