NAME¶
Makefile::DOM - Simple DOM parser for Makefiles
VERSION¶
This document describes Makefile::DOM 0.006 released on 28 August 2011.
DESCRIPTION¶
This library can serve as an advanced lexer for (GNU) makefiles. It parses
makefiles as "documents" and the parsing is lossless. The results
are data structures similar to DOM trees. The DOM trees hold every single bit
of the information in the original input files, including white spaces, blank
lines and makefile comments. That means it's possible to reproduce the
original makefiles from the DOM trees. In addition, each node of the DOM trees
is modifiable and so is the whole tree, just like the PPI module used for Perl
source parsing and the HTML::TreeBuilder module used for parsing HTML source.
If you're looking for a true GNU make parser that generates an AST, please see
Makefile::Parser::GmakeDB instead.
The interface of "Makefile::DOM" mimics the API design of PPI. In
fact, I've directly stolen the source code and POD documentation of PPI::Node,
PPI::Element, and PPI::Dumper, with the full permission from the author of
PPI, Adam Kennedy.
"Makefile::DOM" tries to be independent of specific makefile's syntax.
The same set of DOM node types is supposed to get shared by different makefile
DOM generators. For example, MDOM::Document::Gmake parses GNU makefiles and
returns an instance of MDOM::Document, i.e., the root of the DOM tree while
the NMAKE makefile lexer in the future, "MDOM::Document::Nmake",
also returns instances of the MDOM::Document class. Later, I'll also consider
adding support for dmake and bsdmake.
Structure of the DOM¶
Makefile DOM (MDOM) is a structured set of a series of data types. They provide
a flexible document model conformed to the makefile syntax. Below is a
complete list of the 19 MDOM classes in the current implementation where the
indentation indicates the class inheritance relationships.
MDOM::Element
MDOM::Node
MDOM::Unknown
MDOM::Assignment
MDOM::Command
MDOM::Directive
MDOM::Document
MDOM::Document::Gmake
MDOM::Rule
MDOM::Rule::Simple
MDOM::Rule::StaticPattern
MDOM::Token
MDOM::Token::Bare
MDOM::Token::Comment
MDOM::Token::Continuation
MDOM::Token::Interpolation
MDOM::Token::Modifier
MDOM::Token::Separator
MDOM::Token::Whitespace
It's not hard to see that all of the MDOM classes inherit from the MDOM::Element
class. MDOM::Token and MDOM::Node are its direct children. The former
represents a string token which is atomic from the perspective of the lexer
while the latter represents a structured node, which usually has one or more
children, and serves as the container for other DOM::Element objects.
Next we'll show a few examples to demonstrate how to map DOM trees to particular
makefiles.
- Case 1
- Consider the following simple "hello, world" makefile:
all : ; echo "hello, world"
We can use the MDOM::Dumper class provided by Makefile::DOM to dump out the
internal structure of its corresponding MDOM tree:
MDOM::Document::Gmake
MDOM::Rule::Simple
MDOM::Token::Bare 'all'
MDOM::Token::Whitespace ' '
MDOM::Token::Separator ':'
MDOM::Token::Whitespace ' '
MDOM::Command
MDOM::Token::Separator ';'
MDOM::Token::Whitespace ' '
MDOM::Token::Bare 'echo "hello, world"'
MDOM::Token::Whitespace '\n'
In this example, speparators ":" and ";" are all
instances of the MDOM::Token::Separator class while spaces and new line
characters are all represented as MDOM::Token::Whitespace. The other two
leaf nodes, "all" and "echo "hello, world""
both belong to MDOM::Token::Bare.
It's worth mentioning that, the space characters in the rule command
"echo "hello, world"" were not represented as
MDOM::Token::Whitespace. That's because in makefiles, the spaces in
commands do not make any sense to "make" in syntax; those spaces
are usually sent to shell programs verbatim. Therefore, the DOM parser
does not try to recognize those spaces specifially so as to reduce memory
use and the number of nodes. However, leading spaces and trailing new
lines will still be recognized as MDOM::Token::Whitespace.
On a higher level, it's a MDOM::Rule::Simple instance holding several
"Token" and one MDOM::Command. On the highest level, it's the
root node of the whole DOM tree, i.e., an instance of
MDOM::Document::Gmake.
- Case 2
- Below is a relatively complex example:
a: foo.c bar.h $(baz) # hello!
@echo ...
It's corresponding DOM structure is
MDOM::Document::Gmake
MDOM::Rule::Simple
MDOM::Token::Bare 'a'
MDOM::Token::Separator ':'
MDOM::Token::Whitespace ' '
MDOM::Token::Bare 'foo.c'
MDOM::Token::Whitespace ' '
MDOM::Token::Bare 'bar.h'
MDOM::Token::Whitespace '\t'
MDOM::Token::Interpolation '$(baz)'
MDOM::Token::Whitespace ' '
MDOM::Token::Comment '# hello!'
MDOM::Token::Whitespace '\n'
MDOM::Command
MDOM::Token::Separator '\t'
MDOM::Token::Modifier '@'
MDOM::Token::Bare 'echo ...'
MDOM::Token::Whitespace '\n'
Compared to the previous example, here appears several new node types.
The variable interpolation "$(baz)" on the first line of the
original makefile corresponds to a MDOM::Token::Interpolation node in its
MDOM tree. Similarly, the comment "# hello" corresponds to a
MDOM::Token::Comment node.
On the second line, the rule command indented by a tab character is still
represented by a MDOM::Command object. Its first child node (or its first
element) is also an MDOM::Token::Seperator instance corresponding to that
tab. The command modifier "@" follows the "Separator"
immediately, which is of type MDOM::Token::Modifier.
- Case 3
- Now let's study a sample makefile with various global structures:
a: b
foo = bar
# hello!
Here on the top level, there are three language structures: one rule
""a: b"", one assignment statement "foo =
bar", and one comment "# hello!".
Its MDOM tree is shown below:
MDOM::Document::Gmake
MDOM::Rule::Simple
MDOM::Token::Bare 'a'
MDOM::Token::Separator ':'
MDOM::Token::Whitespace ' '
MDOM::Token::Bare 'b'
MDOM::Token::Whitespace '\n'
MDOM::Assignment
MDOM::Token::Bare 'foo'
MDOM::Token::Whitespace ' '
MDOM::Token::Separator '='
MDOM::Token::Whitespace ' '
MDOM::Token::Bare 'bar'
MDOM::Token::Whitespace '\n'
MDOM::Token::Whitespace '\t'
MDOM::Token::Comment '# hello!'
MDOM::Token::Whitespace '\n'
We can see that below the root node MDOM::Document::Gmake, there are
MDOM::Rule::Simple, MDOM::Assignment, and MDOM::Comment three elements, as
well as two MDOM::Token::Whitespace objects.
It can be observed from the examples above that the MDOM representation for the
makefile's lexical elements is rather loose. It only provides very limited
structural representation instead of making a bad guess.
OPERATIONS FOR MDOM TREES¶
Generating an MDOM tree from a GNU makefile only requires two lines of Perl
code:
use MDOM::Document::Gmake;
my $dom = MDOM::Document::Gmake->new('Makefile');
If the makefile source code being parsed is already stored in a Perl variable,
say, $var, then we can construct an MDOM via the following code:
my $dom = MDOM::Document::Gmake->new(\$var);
Now $dom becomes the reference to the root of the MDOM tree and its type is now
MDOM::Document::Gmake, which is also an instance of the MDOM::Node class.
Just as mentioned above, "MDOM::Node" is the container for other
MDOM::Element instances. So we can retrieve some element node's value via its
"child" method:
$node = $dom->child(3);
# or $node = $dom->elements(0);
And we may also use the "elements" method to obtain the values of all
the nodes:
@elems = $dom->elements;
For every MDOM node, its corresponding makefile source can be generated by
invoking its "content" method.
BUGS AND TODO¶
The current implemenation of the MDOM::Document::Gmake lexer is based on a
hand-written state machie. Although the efficiency of the engine is not bad,
the code is rather complicated and messy, which hurts both extensibility and
maintanabilty. So it's expected to rewrite the parser using some grammatical
tools like the Perl 6 regex engine Pugs::Compiler::Rule or a yacc-style one
like Parse::Yapp.
SOURCE REPOSITORY¶
You can always get the latest source code of this module from its GitHub
repository:
http://github.com/agentzh/makefile-dom-pm
<
http://github.com/agentzh/makefile-dom-pm>
If you want a commit bit, please let me know.
AUTHOR¶
Zhang "agentzh" Yichun (XXX) <agentzh@gmail.com>
COPYRIGHT¶
Copyright 2006-2011 by Zhang "agentzh" Yichun (XXX).
This library is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.
SEE ALSO¶
MDOM::Document, MDOM::Document::Gmake, PPI, Makefile::Parser::GmakeDB,
makesimple.