NAME¶
Regexp::Common::comment -- provide regexes for comments.
SYNOPSIS¶
use Regexp::Common qw /comment/;
while (<>) {
/$RE{comment}{C}/ and print "Contains a C comment\n";
/$RE{comment}{C++}/ and print "Contains a C++ comment\n";
/$RE{comment}{PHP}/ and print "Contains a PHP comment\n";
/$RE{comment}{Java}/ and print "Contains a Java comment\n";
/$RE{comment}{Perl}/ and print "Contains a Perl comment\n";
/$RE{comment}{awk}/ and print "Contains an awk comment\n";
/$RE{comment}{HTML}/ and print "Contains an HTML comment\n";
}
use Regexp::Common qw /comment RE_comment_HTML/;
while (<>) {
$_ =~ RE_comment_HTML() and print "Contains an HTML comment\n";
}
DESCRIPTION¶
Please consult the manual of Regexp::Common for a general description of the
works of this interface.
Do not use this module directly, but load it via
Regexp::Common.
This modules gives you regular expressions for comments in various languages.
THE LANGUAGES¶
Below, the comments of each of the languages are described. The patterns are
available as $RE{comment}{
LANG}, foreach language
LANG. Some
languages have variants; it's described at the individual languages how to get
the patterns for the variants. Unless mentioned otherwise, "{-keep}"
sets $1, $2, $3 and $4 to the entire comment, the opening marker, the content
of the comment, and the closing marker (for many languages, the latter is a
newline) respectively.
- ABC
- Comments in ABC start with a backslash
("\"), and last till the end of the line. See
<http://homepages.cwi.nl/%7Esteven/abc/>.
- Ada
- Comments in Ada start with "--", and last
till the end of the line.
- Advisor
- Advisor is a language used by the HP product
glance. Comments for this language start with either "#"
or "//", and last till the end of the line.
- Advsys
- Comments for the Advsys language start with
";" and last till the end of the line. See also
<http://www.wurb.com/if/devsys/12>.
- Alan
- Alan comments start with "--", and last
till the end of the line. See also
<http://w1.132.telia.com/~u13207378/alan/manual/alanTOC.html>.
- Algol 60
- Comments in the Algol 60 language start with the
keyword "comment", and end with a ";". See
<http://www.masswerk.at/algol60/report.htm>.
- Algol 68
- In Algol 68, comments are either delimited by
"#", or by one of the keywords "co" or
"comment". The keywords should not be part of another word. See
http://westein.arb-phys.uni-dortmund.de/~wb/a68s.txt
<http://westein.arb-phys.uni-dortmund.de/~wb/a68s.txt>. With
"{-keep}", only $1 will be set, returning the entire
comment.
- ALPACA
- The ALPACA language has comments starting with
"/*" and ending with "*/".
- awk
- The awk programming language uses comments that
start with "#" and end at the end of the line.
- B
- The B language has comments starting with
"/*" and ending with "*/".
- BASIC
- There are various forms of BASIC around. Currently, we only
support the variant supported by mvEnterprise, whose pattern is
available as $RE{comment}{BASIC}{mvEnterprise}. Comments in this language
start with a "!", a "*" or the keyword
"REM", and end till the end of the line. See
<http://www.rainingdata.com/products/beta/docs/mve/50/ReferenceManual/Basic.pdf>.
- Beatnik
- The esotoric language Beatnik only uses words
consisting of letters. Words are scored according to the rules of
Scrabble. Words scoring less than 5 points, or 18 points or more are
considered comments (although the compiler might mock at you if you score
less than 5 points). Regardless whether "{-keep}", $1 will be
set, and set to the entire comment. This pattern requires perl
5.8.0 or newer.
- beta-Juliet
- The beta-Juliet programming language has comments
that start with "//" and that continue till the end of the line.
See also http://www.catseye.mb.ca/esoteric/b-juliet/index.html
<http://www.catseye.mb.ca/esoteric/b-juliet/index.html>.
- Befunge-98
- The esotoric language Befunge-98 uses comments that
start and end with a ";". See
<http://www.catseye.mb.ca/esoteric/befunge/98/spec98.html>.
- BML
- BML, or Better Markup Language is an HTML
templating language that uses comments starting with "<?c_",
and ending with "c_?>". See
<http://www.livejournal.com/doc/server/bml.index.html>.
- Brainfuck
- The minimal language Brainfuck uses only eight
characters, "<", ">", "[",
"]", "+", "-", "." and
",". Any other characters are considered comments. With
"{-keep}", $1 is set to the entire comment.
- C
- The C language has comments starting with
"/*" and ending with "*/".
- C--
- The C-- language has comments starting with
"/*" and ending with "*/". See
http://cs.uas.arizona.edu/classes/453/programs/C--Spec.html
<http://cs.uas.arizona.edu/classes/453/programs/C--Spec.html>.
- C++
- The C++ language has two forms of comments. Comments
that start with "//" and last till the end of the line, and
comments that start with "/*", and end with "*/". If
"{-keep}" is used, only $1 will be set, and set to the entire
comment.
- C#
- The C# language has two forms of comments. Comments
that start with "//" and last till the end of the line, and
comments that start with "/*", and end with "*/". If
"{-keep}" is used, only $1 will be set, and set to the entire
comment. See
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/csspec/html/vclrfcsharpspec_C.asp
<http://msdn.microsoft.com/library/default.asp?url=/library/en-us/csspec/html/vclrfcsharpspec_C.asp>.
- Caml
- Comments in Caml start with "(*", end with
"*)", and can be nested. See
<http://www.cs.caltech.edu/courses/cs134/cs134b/book.pdf> and
http://pauillac.inria.fr/caml/index-eng.html
<http://pauillac.inria.fr/caml/index-eng.html>.
- Cg
- The Cg language has two forms of comments. Comments
that start with "//" and last till the end of the line, and
comments that start with "/*", and end with "*/". If
"{-keep}" is used, only $1 will be set, and set to the entire
comment. See <http://developer.nvidia.com/attach/3722>.
- CLU
- In "CLU", a comment starts with a procent sign
("%"), and ends with the next newline. See
ftp://ftp.lcs.mit.edu:/pub/pclu/CLU-syntax.ps
<ftp://ftp.lcs.mit.edu:/pub/pclu/CLU-syntax.ps> and
<http://www.pmg.lcs.mit.edu/CLU.html>.
- COBOL
- Traditionally, comments in COBOL are indicated by an
asteriks in the seventh column. This is what the pattern matches. Modern
compiler may more lenient though. See
<http://www.csis.ul.ie/cobol/Course/COBOLIntro.htm>, and
<http://www.csis.ul.ie/cobol/default.htm>. Due to a bug in the
regexp engine of perl 5.6.x, this regexp is only available in version
5.8.0 and up.
- CQL
- Comments in the chess query language (CQL) start
with a semi colon (";") and last till the end of the line. See
<http://www.rbnn.com/cql/>.
- Crystal Report
- The formula editor in Crystal Reports uses comments
that start with "//", and end with the end of the line.
- Dylan
- There are two types of comments in Dylan. They
either start with "//", or are nested comments, delimited with
"/*" and "*/". Under "{-keep}", only $1 will
be set, returning the entire comment. This pattern requires perl
5.6.0 or newer.
- ECMAScript
- The ECMAScript language has two forms of comments.
Comments that start with "//" and last till the end of the line,
and comments that start with "/*", and end with "*/".
If "{-keep}" is used, only $1 will be set, and set to the entire
comment. JavaScript is Netscapes implementation of
ECMAScript. See
http://www.ecma-international.org/publications/files/ecma-st/Ecma-262.pdf
<http://www.ecma-international.org/publications/files/ecma-st/Ecma-262.pdf>,
and http://www.ecma-international.org/publications/standards/Ecma-262.htm
<http://www.ecma-international.org/publications/standards/Ecma-262.htm>.
- Eiffel
- Eiffel comments start with "--", and last
till the end of the line.
- False
- In False, comments start with "{" and end
with "}". See
<http://wouter.fov120.com/false/false.txt>
- FPL
- The FPL language has two forms of comments. Comments
that start with "//" and last till the end of the line, and
comments that start with "/*", and end with "*/". If
"{-keep}" is used, only $1 will be set, and set to the entire
comment.
- Forth
- Comments in Forth start with "\", and end with
the end of the line. See also http://docs.sun.com/sb/doc/806-1377-10
<http://docs.sun.com/sb/doc/806-1377-10>.
- Fortran
- There are two forms of Fortran. There's free form
Fortran, which has comments that start with "!", and end
at the end of the line. The pattern for this is given by $RE{Fortran}.
Fixed form Fortran, which has been obsoleted, has comments that
start with "C", "c" or "*" in the first
column, or with "!" anywhere, but the sixth column. The pattern
for this are given by $RE{Fortran}{fixed}.
See also http://www.cray.com/craydoc/manuals/007-3692-005/html-007-3692-005/
<http://www.cray.com/craydoc/manuals/007-3692-005/html-007-3692-005/>.
- Funge-98
- The esotoric language Funge-98 uses comments that
start and end with a ";".
- fvwm2
- Configuration files for fvwm2 have comments starting
with a "#" and lasting the rest of the line.
- Haifu
- Haifu, an esotoric language using haikus, has
comments starting and ending with a ",". See
<http://www.dangermouse.net/esoteric/haifu.html>.
- Haskell
- There are two types of comments in Haskell. They
either start with at least two dashes, or are nested comments, delimited
with "{-" and "-}". Under "{-keep}", only $1
will be set, returning the entire comment. This pattern requires perl
5.6.0 or newer.
- HTML
- In HTML, comments only appear inside a comment
declaration. A comment declaration starts with a "<!",
and ends with a ">". Inside this declaration, we have zero or
more comments. Comments starts with "--" and end with
"--", and are optionally followed by whitespace. The pattern
$RE{comment}{HTML} recognizes those comment declarations (and hence more
than a comment). Note that this is not the same as something that starts
with "<!--" and ends with "-->", because the
following will be matched completely:
<!-- First Comment --
--> Second Comment <!--
-- Third Comment -->
Do not be fooled by what your favourite browser thinks is an HTML comment.
If "{-keep}" is used, the following are returned:
- $1
- captures the entire comment declaration.
- $2
- captures the MDO (markup declaration open),
"<!".
- $3
- captures the content between the MDO and the MDC.
- $4
- captures the (last) comment, without the surrounding
dashes.
- $5
- captures the MDC (markup declaration close),
">".
- Hugo
- There are two types of comments in Hugo. They either
start with "!" (which cannot be followed by a "\"), or
are nested comments, delimited with "!\" and "\!".
Under "{-keep}", only $1 will be set, returning the entire
comment. This pattern requires perl 5.6.0 or newer.
- Icon
- Icon has comments that start with "#" and
end at the next new line. See
<http://www.toolsofcomputing.com/IconHandbook/IconHandbook.pdf>,
<http://www.cs.arizona.edu/icon/index.htm>, and
<http://burks.bton.ac.uk/burks/language/icon/index.htm>.
- ILLGOL
- The esotoric language ILLGOL uses comments starting
with NB and lasting till the end of the line. See
<http://www.catseye.mb.ca/esoteric/illgol/index.html>.
- INTERCAL
- Comments in INTERCAL are single line comments. They start
with one of the keywords "NOT" or "N'T", and can
optionally be preceded by the keywords "DO" and
"PLEASE". If both keywords are used, "PLEASE" precedes
"DO". Keywords are separated by whitespace.
- J
- The language J uses comments that start with
"NB.", and that last till the end of the line. See
<http://www.jsoftware.com/books/help/primer/contents.htm>, and
<http://www.jsoftware.com/>.
- Java
- The Java language has two forms of comments.
Comments that start with "//" and last till the end of the line,
and comments that start with "/*", and end with "*/".
If "{-keep}" is used, only $1 will be set, and set to the entire
comment.
- JavaDoc
- The Javadoc documentation syntax is demarked with a
subset of ordinary Java comments to separate it from code. Comments start
with "/**" end with "*/". If "{-keep}" is
used, only $1 will be set, and set to the entire comment. See
http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html#format
<http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html#format>.
- JavaScript
- The JavaScript language has two forms of comments.
Comments that start with "//" and last till the end of the line,
and comments that start with "/*", and end with "*/".
If "{-keep}" is used, only $1 will be set, and set to the entire
comment. JavaScript is Netscapes implementation of
ECMAScript. See http://www.mozilla.org/js/language/E262-3.pdf
<http://www.mozilla.org/js/language/E262-3.pdf>, and
<http://www.mozilla.org/js/language/>.
- LaTeX
- The documentation language LaTeX uses comments
starting with "%" and ending at the end of the line.
- Lisp
- Comments in Lisp start with a semi-colon
(";") and last till the end of the line.
- LPC
- The LPC language has comments starting with
"/*" and ending with "*/".
- LOGO
- Comments for the language LOGO start with
";", and last till the end of the line.
- lua
- Comments for the lua language start with
"--", and last till the end of the line. See also
<http://www.lua.org/manual/manual.html>.
- M, MUMPS
- In "M" (aka "MUMPS"), comments start
with a semi-colon, and last till the end of a line. The language
specification requires the semi-colon to be preceded by one or more
linestart characters. Those characters default to a space, but
that's configurable. This requirement, of preceding the comment with
linestart characters is not tested for. See
<ftp://ftp.intersys.com/pub/openm/ism/ism64docs.zip>,
<http://mtechnology.intersys.com/mproducts/openm/index.html>, and
<http://mcenter.com/mtrc/index.html>.
- m4
- By default, the preprocessor language m4 uses single
line comments, that start with a "#" and continue to the end of
the line, including the newline. The pattern "$RE {comment}
{m4}" matches such comments. In m4, it is possible to change
the starting token though. See
<http://wolfram.schneider.org/bsd/7thEdManVol2/m4/m4.pdf>,
http://www.cs.stir.ac.uk/~kjt/research/pdf/expl-m4.pdf
<http://www.cs.stir.ac.uk/~kjt/research/pdf/expl-m4.pdf>, and
<http://www.gnu.org/software/m4/manual/>.
- Modula-2
- In "Modula-2", comments start with
"(*", and end with "*)". Comments may be nested. See
<http://www.modula2.org/>.
- Modula-3
- In "Modula-3", comments start with
"(*", and end with "*)". Comments may be nested. See
<http://www.m3.org/>.
- mutt
- Configuration files for mutt have comments starting
with a "#" and lasting the rest of the line.
- Nickle
- The Nickle language has one line comments starting
with "#" (like Perl), or multiline comments delimited by
"/*" and "*/" (like C). Under "-keep", only
$1 will be set. See also <http://www.nickle.org>.
- Oberon
- Comments in Oberon start with "(*" and end
with "*)". See
<http://www.oberon.ethz.ch/oreport.html>.
- Pascal
- There are many implementations of Pascal. This modules
provides pattern for comments of several implementations.
- $RE{comment}{Pascal}
- This is the pattern that recognizes comments according to
the Pascal ISO standard. This standard says that comments start with
either "{", or "(*", and end with "}" or
"*)". This means that "{*)" and "(*}" are
considered to be comments. Many Pascal applications don't allow this. See
http://www.pascal-central.com/docs/iso10206.txt
<http://www.pascal-central.com/docs/iso10206.txt>
- $RE{comment}{Pascal}{Alice}
- The Alice Pascal compiler accepts comments that
start with "{" and end with "}". Comments are not
allowed to contain newlines. See
<http://www.templetons.com/brad/alice/language/>.
- $RE{comment}{Pascal}{Delphi}, $RE{comment}{Pascal}{Free}
and $RE{comment}{Pascal}{GPC}
- The Delphi Pascal, Free Pascal and the Gnu
Pascal Compiler implementations of Pascal all have comments that
either start with "//" and last till the end of the line, are
delimited with "{" and "}" or are delimited with
"(*" and "*)". Patterns for those comments are given
by $RE{comment}{Pascal}{Delphi}, $RE{comment}{Pascal}{Free} and
$RE{comment}{Pascal}{GPC} respectively. These patterns only set $1 when
"{-keep}" is used, which will then include the entire comment.
See <http://info.borland.com/techpubs/delphi5/oplg/>,
http://www.freepascal.org/docs-html/ref/ref.html
<http://www.freepascal.org/docs-html/ref/ref.html> and
http://www.gnu-pascal.de/gpc/ <http://www.gnu-pascal.de/gpc/>.
- $RE{comment}{Pascal}{Workshop}
- The Workshop Pascal compiler, from SUN Microsystems,
allows comments that are delimited with either "{" and
"}", delimited with "(*)" and "*"),
delimited with "/*", and "*/", or starting and ending
with a double quote ("""). When "{-keep}" is
used, only $1 is set, and returns the entire comment.
See http://docs.sun.com/db/doc/802-5762
<http://docs.sun.com/db/doc/802-5762>.
- PEARL
- Comments in PEARL start with a "!" and
last till the end of the line, or start with "/*" and end with
"*/". With "{-keep}", $1 will be set to the entire
comment.
- PHP
- Comments in PHP start with either "#" or
"//" and last till the end of the line, or are delimited by
"/*" and "*/". With "{-keep}", $1 will be
set to the entire comment.
- PL/B
- In PL/B, comments start with either "." or
";", and end with the next newline. See
http://www.mmcctech.com/pl-b/plb-0010.htm
<http://www.mmcctech.com/pl-b/plb-0010.htm>.
- PL/I
- The PL/I language has comments starting with
"/*" and ending with "*/".
- PL/SQL
- In PL/SQL, comments either start with "--"
and run till the end of the line, or start with "/*" and end
with "*/".
- Perl
- Perl uses comments that start with a "#",
and continue till the end of the line.
- Portia
- The Portia programming language has comments that
start with "//", and last till the end of the line.
- Python
- Python uses comments that start with a
"#", and continue till the end of the line.
- Q-BAL
- Comments in the Q-BAL language start with
"`" (a backtick), and contine till the end of the line.
- QML
- In "QML", comments start with "#" and
last till the end of the line. See
<http://www.questionmark.com/uk/qml/overview.doc>.
- R
- The statistical language R uses comments that start
with a "#" and end with the following new line. See
http://www.r-project.org/ <http://www.r-project.org/>.
- REBOL
- Comments for the REBOL language start with
";" and last till the end of the line.
- Ruby
- Comments in Ruby start with "#" and last
till the end of the time.
- Scheme
- Scheme comments start with ";", and last
till the end of the line. See <http://schemers.org/>.
- shell
- Comments in various shells start with a
"#" and end at the end of the line.
- Shelta
- The esotoric language Shelta uses comments that
start and end with a ";". See
<http://www.catseye.mb.ca/esoteric/shelta/index.html>.
- SLIDE
- The SLIDE language has two froms of comments. First
there is the line comment, which starts with a "#" and includes
the rest of the line (just like Perl). Second, there is the multiline,
nested comment, which are delimited by "(*" and "*)".
Under C{-keep}>, only $1 is set, and is set to the entire comment. This
pattern needs at least Perl version 5.6.0. See
<http://www.cs.berkeley.edu/~ug/slide/docs/slide/spec/spec_frame_intro.shtml>.
- slrn
- Configuration files for slrn have comments starting
with a "%" and lasting the rest of the line.
- Smalltalk
- Smalltalk uses comments that start and end with a
double quote, """.
- SMITH
- Comments in the SMITH language start with
";", and last till the end of the line.
- Squeak
- In the Smalltalk variant Squeak, comments start and
end with """. Double quotes can appear inside comments by
doubling them.
- SQL
- Standard SQL uses comments starting with two or more
dashes, and ending at the end of the line.
MySQL does not follow the standard. Instead, it allows comments that
start with a "#" or "-- " (that's two dashes and a
space) ending with the following newline, and comments starting with
"/*", and ending with the next ";" or "*/"
that isn't inside single or double quotes. A pattern for this is returned
by $RE{comment}{SQL}{MySQL}. With "{-keep}", only $1 will be
set, and it returns the entire comment.
- Tcl
- In Tcl, comments start with "#" and
continue till the end of the line.
- TeX
- The documentation language TeX uses comments
starting with "%" and ending at the end of the line.
- troff
- The document formatting language troff uses comments
starting with "\"", and continuing till the end of the
line.
- Ubercode
- The Windows programming language Ubercode uses
comments that start with "//" and continue to the end of the
line. See <http://www.ubercode.com>.
- vi
- In configuration files for the editor vi, one can
use comments starting with """, and ending at the end of
the line.
- *W
- In the language *W, comments start with
"||", and end with "!!".
- zonefile
- Comments in DNS zonefiles start with ";",
and continue till the end of the line.
- ZZT-OOP
- The in-game language ZZT-OOP uses comments that
start with a "'" character, and end at the following newline.
See <http://dave2.rocketjump.org/rad/zzthelp/lang.html>.
REFERENCES¶
- [Go 90]
- Charles F. Goldfarb: The SGML Handbook. Oxford:
Oxford University Press. 1990. ISBN 0-19-853737-9. Ch. 10.3, pp
390-391.
SEE ALSO¶
Regexp::Common for a general description of how to use this interface.
AUTHOR¶
Damian Conway (damian@conway.org)
MAINTAINANCE¶
This package is maintained by Abigail (
regexp-common@abigail.be).
BUGS AND IRRITATIONS¶
Bound to be plenty.
For a start, there are many common regexes missing. Send them in to
regexp-common@abigail.be.
LICENSE and COPYRIGHT¶
This software is Copyright (c) 2001 - 2009, Damian Conway and Abigail.
This module is free software, and maybe used under any of the following
licenses:
1) The Perl Artistic License. See the file COPYRIGHT.AL.
2) The Perl Artistic License 2.0. See the file COPYRIGHT.AL2.
3) The BSD Licence. See the file COPYRIGHT.BSD.
4) The MIT Licence. See the file COPYRIGHT.MIT.