Scroll to navigation

visual-regexp(1) General Use Manual visual-regexp(1)

NAME

visual-regexp - graphical front-end to write/debug regular expression

SYNOPSIS

visual-regexp file

DESCRIPTION

visual-regexp is a program that interactively creates and shows the output from regular expressions in Perl. It is ideal for debugging complicated perl expressions.

Helps you to design, debug or more generally work with perl regular expression. As it is often difficult to write the right regexp at the first try, this tool will show you the effect of your regexp on a sample you can choose.

DESIGN OF REGEXP

To design regexp, just type the expression in the top text widget. Press the 'Go' button to highlight the matched part of the text in the sample text widget.

To get a quickref of the regexp syntax use the menu 'View/Show regexp help'.

You can specify some options using the checkboxes (please read Tcl help to learn the meaning of these options).

RECURSIVE DESIGN OF REGEXPS

Sometimes you will need more than one step to extract the information you want from the sample. For example, imagine you want to retrieve information from an HTML table inside an another HTML table :


<html><body>
<table border=1>
<tr><td>
<table bgcolor="#FFFF00" border=1>
<tr> <td>One</td> <td>1</td> </tr>
<tr> <td>Two</td> <td>2</td> </tr>
</table>
<tr> <td>Foo</td> <td>Bar</td> </tr>
</table>
</body></html>
You cannot use one global regexp to extract the two lines "One 1" and "Two 2". You have to use a first regexp to narrow the processed region. Type the following regexp '<table bg[^>]*?>(.*?)</table>' and press 'Go'. You see now that the interesting area is shown in blue. Press the Match '1' button which will extract the blue text (the regexp to use to get this text is then printed on the console). Now use '<td>(.*?)</td>.*?<td>(.*?)</td>' to get the information you need.

OPTIMIZATION OF REGEXPS

When you need to match a list of words, use the menu ’Insert regexp/Make regexp' to design an optimized version of the word list.

For example, the list 'aa aab ab ad' is optimized into 'a(ab?|b|d)'.

PROCESSING THE SAMPLE TEXT

Can use visual-regexp to perform modification of a text. Just use the menu 'Select mode/Use replace'. You can now design a regexp to match what you want. Then use the replace text widget to enter the substitution you want to apply (use  , 1, 2, ... to match the subregexp, use the color to map the number with the matched sub-expressions).

After the substitution, you can save the new text using the 'File/Save ...' menu. You can let the program choose the end-of-line format or force them for a specific environment (Unix, Windows, Mac).

KNOWN PROBLEMS

  • Some regexp can consume a lot of CPU time. This seems to be caused by the use of -all, -inline and -indices flags together.
  • When a subexpression is not matched (empty match), the last character of the previous match are coloured. This is due to a problem in Tcl (bug submitted to Scriptics).

REGULAR EXPRESSIONS IN PERL

METACHARACTERS

"^"
beginning of string
"$"
end of string
"."
any character except newline
"*"
match 0 or more times
"+"
match 1 or more times
"?"
match 0 or 1 times; or: shortest match
"|"
alternative
"( )"
grouping; “storing”
"[ ]"
set of characters
"{ }"
repetition modifier
"\"
quote or special

REPETITION

zero or more a’s
one or more a’s
zero or one a’s (i.e., optional a)
exactly m a’s
at least m a’s
at least m but at most n a’s
same as repetition but the shortest match is taken

SPECIAL NOTATIONS WITH \


Single characters
\t tab
\n newline
\r return (CR)
\xhh character with hex. code hh
“Zero-width assertions”
\b “word” boundary
\B not a “word” boundary
Matching
\w matches any single character classified as a “word” character (alphanumeric or “_”)
\W matches any non-“word” character
\s matches any whitespace character (space, tab, newline)
\S matches any non-whitespace character
\d matches any digit character, equivalent to [0-9]
\D matches any non-digit character
CHARACTER SETS: SPECIALITIES INSIDE [...]
[characters]
matches any of the characters in the sequence
[x-y]
matches any of the characters from x to y (inclusively) in the ASCII code
[\-]
matches the hyphen character “-”
[\n]
matches the newline; other single character denotations with  apply normally, too
[^something]
matches any character except those that [something] denotes; that is, immediately after the leading “[”, the circumflex “^” means “not” applied to all of the rest

EXAMPLES

abc (that exact character sequence, but anywhere in the string)
^abc
abc at the beginning of the string
abc at the end of the string
either of a and b
^abc|abc$
the string abc at the beginning or at the end of the string
an a followed by two, three or four b’s followed by a c
an a followed by at least two b’s followed by a c
an a followed by any number (zero or more) of b’s followed by a c
an a followed by one or more b’s followed by a c
an a followed by an optional b followed by a c; that is, either abc or ac
an a followed by any single character (not newline) followed by a c
a.c exactly
[abc]
any one of a, b and c
[Aa]bc
either of Abc and abc
[abc]+
any (nonempty) string of a’s, b’s and c’s (such as a, abba, acbabcacaa)
[^abc]+
any (nonempty) string which does not contain any of a, b and c (such as 'defg')
\d\d
any two decimal digits, such as 42; same as \d{2}
\w+
a “word”: a nonempty sequence of alphanumeric characters and low lines (underscores), such as foo and 12bar8 and foo_1
100\s*mk
the strings 100 and mk optionally separated by any amount of white space (spaces, tabs, newlines)
abc\b
abc when followed by a word boundary (e.g. in abc! but not in abcd)
perl when not followed by a word boundary (e.g. in perlert but not in perl stuff)

REQUIREMENTS

This program requires Tcl/Tk 8.3.0 or later with the script version. Nothing with the standalone program.

SEE ALSO

perlre(1), perlrequick(1)

AUTHOR

visual-regexp was written by Laurent Riesterer <laurent.riesterer@free.fr>.

This manual page was written by Braulio Henrique Marques Souto <braulio@disroot.org> for the Debian project (but may be used by others).

18 October 2022 visual-regexp-3.1