table of contents
MP4H(1) | HTML Tools | MP4H(1) |
NAME¶
mp4h - Macro Processor for HTML DocumentsVERSION¶
This documentation describes mp4h version 1.3.1.INTRODUCTION¶
The mp4h software is a macro-processor specifically designed to deal with HTML documents. It allows powerful programming constructs, with a syntax familiar to HTML authors. This software is based on Meta-HTML "<URL:http://www.metahtml.org/>", written by Brian J. Fox, Even if both syntaxes look similar, source code is completely different. Indeed, a subset of Meta-HTML was used as a part of a more complex program, WML (Website Meta Language "<URL:http://www.thewml.org/>") written by Ralf S. Engelschall and which I maintain since January 1999. For licensing reasons, it was hard to hack Meta-HTML and so I decided to write my own macro-processor. Instead of rewriting it from scratch, I preferred using another macro-processor engine. I chose GNU m4 "<URL:http://www.gnu.org/software/m4/>", written by Rene Seindal, because of its numerous advantages : this software is stable, robust and very well documented. This version of mp4h is derived from GNU m4 version 1.4n, which is a development version. The mp4h software is not an HTML editor; its unique goal is to provide an easy way to define its own macros inside HTML documents. There is no plan to add functionalities to automagically produce valid HTML documents, if you want to clean up your code or validate it, simply use a post-processor like tidy "<URL:http://www.w3.org/People/Raggett/tidy/>".COMMAND LINE OPTIONS¶
Optional arguments are enclosed within square brackets. All option synonyms have a similar syntax, so when a long option accepts an argument, short option do too. Syntax call ismp4h [options] [filename [filename] ...]Options are described below. If no filename is specified, or if its name is "-", then characters are read on standard input.
Operation modes¶
- --help display an help message and exit
- --version output mp4h version information and exit
- -E --fatal-warnings stop execution after first warning
- -Q --quiet --silent suppress some warnings for builtins
- -S --safety-level="NUMBER" disable risky functions; 0 means no filtering, 1 disable "execute" and 2 disable this one too plus all filesystem related functions: "file-exists", "real-path", "get-file-properties", "directory-contents" and "include".
Preprocessor features¶
- -I --include="DIRECTORY" search this directory for includes and packages
- -D --define="NAME"[=VALUE]" " set variable NAME to VALUE, or empty
- -U --undefine="COMMAND" delete builtin COMMAND
- -s --synclines generate `#line NO "FILE"' lines
Parser features¶
- -c --caseless="NUMBER" set case sensitiveness according to the bits of "NUMBER". A null bit means symbol is case sensitive, and bits are defined as followed: 0 for tags, 1 for variables and 2 for entities. Default value is 3, i.e. only entities are case sensitive.
- -e --encoding="NAME" specify document encoding. Valid options are `8bit' (default) or `utf8'.
- -X --expansion="NUMBER" set parser behaviour according to the bits of "NUMBER"
- 1 do not parse unknown tags
- 2 unknown tags are assumed being simple
- 4 trailing star in tag name do not make this tag simple
- 8 an unmatched end tag closes all previous unmatched begin tags
- 16 interpret backslashes as printf
- 32 remove trailing slash in tag attributes
- 64 do not remove trailing star in tag name
- 128 do not remove leading star in tag name
- 256 do not add a space before trailing slash in tag attributes
- 1024 suppress warnings about bad nested tags
- 2048 suppress warnings about missing trailing slash
Limits control¶
- -H --hashsize="PRIME" set symbol lookup hash table size (default 509)
- -L -nesting-limit="NUMBER" change artificial nesting limit (default 250)
Debugging¶
- -d --debug="FLAGS" set debug level (no FLAGS implies `aeq')
- -t --trace="NAME" trace NAME when it will be defined
- -l --arglength="NUMBER" restrict macro tracing size
- -o --error-output="FILE" redirect debug and trace output
- t trace for all macro calls, not only debugging-on'ed
- a show actual arguments
- e show expansion
- c show before collect, after collect and after call
- x add a unique macro call id, useful with c flag
- f say current input file name
- l say current input line number
- p show results of path searches
- m show results of module operations
- i show changes in input files
- V shorthand for all of the above flags
DESCRIPTION¶
The mp4h software is a macro-processor, which means that keywords are replaced by other text. This chapter describes all primitives. As mp4h has been specially designed for HTML documents, its syntax is very similar to HTML, with tags and attributes. One important feature has no equivalent in HTML: comments until end of line. All text following three colons is discarded until end of line, like;;; This is a comment
Function Macros¶
Note: All examples in this documentation are processed through mp4h with expansion flags set to zero (see a description of possible expansion flags at the end of document), it is why simple tags contain a trailing slash. But mp4h can output plain HTML files with other expansion flags. The definition of new tags is the most common task provided by mp4h. As with HTML, macro names are case insensitive, unless "-c" option is used to change this default behaviour. In this documentation, only lowercase letters are used. There are two kinds of tags: simple and complex. A simple tag has the following form:<name [attributes] />whereas a complex tag looks like:
<name [attributes]> body </name>Since version 0.9.1, mp4h knows XHTML syntax too, so your input file may conform to HTML or XHTML syntax. In this manual, we adopt the latter, which is why simple tags have a trailing slash in attributes. If you want to produce HTML files with this input file, you may either choose an adequate "--expansion" flag or use a post-processor like tidy "<URL:http://www.w3.org/People/Raggett/tidy/>". When a simple tag is defined by mp4h, it can be parsed even if the trailing slash is omitted, because mp4h knows that this tag is simple. But it is a good practice to always append a trailing slash to simple tags. In macro descriptions below, a slash indicates a simple tag, and a V letter that attributes are read verbatim (without expansion) (see the chapter on macro expansion for further details).
- •
- define-tag
<define-tag foo>bar</define-tag> <foo/>Output:
barEven if spaces have usually few incidence on HTML syntax, it is important to note that
<define-tag foo>bar</define-tag>and
<define-tag foo> bar </define-tag>are not equivalent, the latter form contains two newlines that were not present in the former.
- "whitespace=delete"
- Some spaces are suppressed in replacement text, in particular any leading or trailing spaces, and newlines not enclosed within angle brackets.
- "endtag=required"
- Define a complex tag
<define-tag foo>bar</define-tag> <foo/>
bar
<define-tag bar endtag=required>;;; body is: %body</define-tag> <bar>Here it is</bar>
body is: Here it is
- "attributes=verbatim"
- By default attributes are expanded before text is replaced.
If this attribute is used, attributes are inserted into replacement text
without expansion.
<define-tag foo>quux</define-tag> <define-tag bar attributes=verbatim endtag=required> Body: %Ubody Attributes: %Uattributes </define-tag> <bar txt="<foo/>">Here we go</bar>
Body: Here we go Attributes: txt=<foo/>
- •
- provide-tag
- •
- let "S"
<define-tag foo>one</define-tag> <let bar=foo /> <define-tag foo>two</define-tag> <foo/><bar/>Output:
twoone
- •
- undef "S"
<define-tag foo>one</define-tag> <undef foo /> <foo/>Output:
<foo />
- •
- set-hook
<let foo=add /> <set-hook foo position=before> Before</set-hook> <set-hook foo position=after> After</set-hook> <foo 1 2 3 4 />Output:
Before10 After
- •
- get-hook "S"
Text inserted with position=before:<get-hook foo position=before />! Text inserted with position=after:<get-hook foo position=after />!Output:
Text inserted with position=before: Before! Text inserted with position=after: After!
- •
- attributes-quote "S"
<define-tag foo>;;; %attributes <img<attributes-quote %attributes />/> </define-tag> <foo id="logo" src="logo.gif" name="Logo" alt="Our logo" /> <foo/>Output:
id=logo src=logo.gif name=Logo alt=Our logo <img id="logo" src="logo.gif" name="Logo" alt="Our logo"/> <img/>
- •
- attributes-extract "S"
<define-tag img whitespace=delete> <img* <attributes-extract name,src,alt %attributes /> /> </define-tag> <img id="logo" src="logo.gif" name="Logo" alt="Our logo" />Output:
<img src=logo.gif name=Logo alt=Our logo />
- •
- attributes-remove "S"
<define-tag img whitespace=delete> <img* <attributes-quote <attributes-remove name,src,alt %attributes />/> /> </define-tag> <img id="logo" src="logo.gif" name="Logo" alt="Our logo" />Output:
<img id="logo" />Note: The two previous functions are special, because unlike all other macros, their expansion do not form a group. This is necessary to parse the resulting list of attributes. In those two functions, names of attributes may be regular expressions. Main goal of these primitives is to help writing macros accepting any kind of attributes without having to declare them. A canonical example is Source:
<define-tag href whitespace=delete> <preserve url name /> <set-var <attributes-extract url,name %attributes />/> <a <attributes-quote <attributes-remove url,name %attributes />/> href="<get-var url />"><get-var name /></a> <restore url name /> </define-tag> <href class=web url="http://www.foo.com" name="Welcome" />Output:
<a class="web" href="http://www.foo.com">Welcome</a>But we want now to add an image attribute. So we may write Source:
<define-tag href whitespace=delete> <preserve url name image /> <set-var <attributes-extract url,name,image %attributes />/> <a <attributes-quote <attributes-remove url,name,image %attributes />/> href="<get-var url />"> <if <get-var image /> <img <attributes-quote <attributes-remove url,name,image %attributes />/> src="<get-var image />" alt="<get-var name />" border=0 /> <get-var name /> /> </a> <restore url name image /> </define-tag> <href class=web url="http://www.foo.com" name="Welcome" image="foo.png"/>Output:
<a class="web" href="http://www.foo.com"><img class="web" src="foo.png" alt="Welcome" border=0 /></a>We need a mechanism to tell mp4h that some attributes refer to specific HTML tags. A solution is to prepend attribute with tag name, e.g. Source:
<define-tag href whitespace=delete> <preserve url name image /> <set-var <attributes-extract url,name,image %attributes />/> <a <attributes-quote <attributes-extract a:.* %attributes />/> href="<get-var url />"> <if <get-var image /> <img <attributes-quote <attributes-extract img:.* %attributes />/> src="<get-var image />" alt="<get-var name />" /> <get-var name /> /> </a> <restore url name image /> </define-tag> <href a:class=web img:id=logo img:border=1 url="http://www.foo.com" name="Welcome" image="foo.png" />Output:
<a a:class="web" href="http://www.foo.com"><img img:id="logo" img:border="1" src="foo.png" alt="Welcome" /></a>This example shows that regular expressions may be used within attributes names, but it is still incomplete, because we want to remove prefix from attributes. One solution is with "subst-in-string", but there is a more elegant one: Source:
<define-tag href whitespace=delete> <preserve url name image /> <set-var <attributes-extract url,name,image %attributes />/> <a <attributes-quote <attributes-extract :a:(.*) %attributes />/> href="<get-var url />"> <if <get-var image /> <img <attributes-quote <attributes-extract :img:(.*) %attributes />/> src="<get-var image />" alt="<get-var name />" /> <get-var name /> /> </a> <restore url name image /> </define-tag> <href :a:class=web :img:id=logo :img:border=1 url="http://www.foo.com" name="Welcome" image="foo.png" />Output:
<a class="web" href="http://www.foo.com"><img id="logo" border="1" src="foo.png" alt="Welcome" /></a>When there are subexpressions within regular expressions, they are printed instead of the whole expression. Note also that I put a colon before the prefix in order not to mix them with XML namespaces.
Entities¶
Entities are macros in the same way as tags, but they do not take any arguments. Whereas tags are normally used to mark up text, entities contain already marked up text. Also note that unlike tags, entities are by default case sensitive. An entity has the following form:&entity;
- •
- define-entity
<define-entity foo>bar</define-entity> &foo;Output:
bar
Variables¶
Variables are a special case of simple tags, because they do not accept attributes. In fact their use is different, because variables contain text whereas macros act like operators. A nice feature concerning variables is their manipulation as arrays. Indeed variables can be considered like newline separated lists, which will allow powerful manipulation functions as we will see below.- •
- set-var "S"
- •
- set-var-verbatim "S""V"
- •
- set-var-x
- •
- get-var "S"
<set-var version="0.10.1" /> This is version <get-var version /> <set-var-x name="osversion">Operating sytem is "<include command="uname" /><include command="uname -r" />"</set-var-x> <get-var osversion />Output:
This is version 0.10.1Operating sytem is
"Linux
2.6.32-5-amd64
" Source:
<set-var foo="0 1 2 3" /> <get-var foo[2] foo[0] foo />Output:
200 1 2 3
- •
- get-var-once "S""V"
<define-tag foo>0.10.1</define-tag> <set-var version="<foo/>" />;;; Here is version <get-var version /> <set-var-verbatim version="<foo/>" />;;; Here is version <get-var version /> <set-var-verbatim version="<foo/>" />;;; Here is version <get-var-once version />Output:
Here is version 0.10.1 Here is version 0.10.1 Here is version <foo/>
- •
- preserve "S"
- •
- restore "S"
<define-tag foo whitespace=delete> <preserve src name text /> <set-var %attributes /> Inside: src=<get-var src /> name=<get-var name /> text=<get-var text /> <restore src name text /> </define-tag> <set-var src=foo.png text="Hello, World!" /> Before: src=<get-var src /> name=<get-var name /> text=<get-var text /> <foo src=bar name=quux /> After: src=<get-var src /> name=<get-var name /> text=<get-var text />Output:
Before: src=foo.png name= text=Hello, World! Inside: src=bar name=quux text= After: src=foo.png name= text=Hello, World!
- •
- unset-var "S"
- •
- var-exists "S"
- •
- increment "S"
- "by=value"
- Change increment amount.
<set-var i=10 /> <get-var i /> <increment i /><get-var i /> <increment i by="-3" /><get-var i />Output:
10 11 8
- •
- decrement "S"
- "by=value"
- Change decrement amount.
<set-var i=10 /> <get-var i /> <decrement i /><get-var i /> <decrement i by="3" /><get-var i />Output:
10 9 6
- •
- copy-var "S"
<set-var i=10 /> <copy-var i j /> <get-var j />Output:
10
- •
- defvar "S"
<unset-var title /> <defvar title "Title" /><get-var title /> <defvar title "New title" /><get-var title />Output:
Title Title
- •
- symbol-info "S"
<set-var x="0\n1\n2\n3\n4" /> <define-tag foo>bar</define-tag> <define-tag bar endtag=required>quux</define-tag> <symbol-info x /> <symbol-info symbol-info /> <symbol-info define-tag /> <symbol-info foo /> <symbol-info bar />Output:
STRING 5 PRIM TAG PRIM COMPLEX USER TAG USER COMPLEX
String Functions¶
- •
- string-length "S"
<set-var foo="0 1 2 3" />;;; <string-length <get-var foo /> /> <set-var foo="0 1 2 3" />;;; <set-var l=<string-length <get-var foo /> /> />;;; <get-var l />Output:
7 7
- •
- downcase "S"
<downcase "Does it work?" />Output:
does it work?
- •
- upcase "S"
<upcase "Does it work?" />Output:
DOES IT WORK?
- •
- capitalize "S"
<capitalize "Does it work?" />Output:
Does It Work?
- •
- substring "S"
<set-var foo="abcdefghijk" /> <substring <get-var foo /> 4 /> <substring <get-var foo /> 4 6 />Output:
efghijk ef
- •
- string-eq "S"
1:<string-eq "aAbBcC" "aabbcc" /> 2:<string-eq "aAbBcC" "aAbBcC" />Output:
1: 2:true
- "caseless=true"
- Comparison is case insensitive.
1:<string-eq "aAbBcC" "aabbcc" caseless=true /> 2:<string-eq "aAbBcC" "aAbBcC" caseless=true />Output:
1:true 2:true
- •
- string-neq "S"
1:<string-neq "aAbBcC" "aabbcc" /> 2:<string-neq "aAbBcC" "aAbBcC" />Output:
1:true 2:
- "caseless=true"
- Comparison is case insensitive.
1:<string-neq "aAbBcC" "aabbcc" caseless=true /> 2:<string-neq "aAbBcC" "aAbBcC" caseless=true />Output:
1: 2:
- •
- string-compare "S"
1:<string-compare "aAbBcC" "aabbcc" /> 2:<string-compare "aAbBcC" "aAbBcC" />Output:
1:less 2:equal
- "caseless=true"
- Comparison is case insensitive.
1:<string-compare "aAbBcC" "aabbcc" caseless=true />Output:
1:equal
- •
- char-offsets "S"
- "caseless=true"
- Comparison is case insensitive.
1:<char-offsets "abcdAbCdaBcD" a /> 2:<char-offsets "abcdAbCdaBcD" a caseless=true />Output:
1:0 8 2:0 4 8
- •
- printf "S"
1:<printf "foo %s bar %s" baz 10 /> 2:<printf "foo %2$s bar %1$s" baz 10 />Output:
1:foo baz bar 10 2:foo 10 bar baz
Regular Expressions¶
Regular expression support is provided by the PCRE (Perl Compatible Regular Expressions) library package, which is open source software, copyright by the University of Cambridge. This is a very nice piece of software, latest versions are available at"<URL:ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/>". Before version 1.0.6, POSIX regular expressions were implemented. For this reason, the following macros recognize two attributes, "caseless=true" and "singleline=true|false". But Perl allows a much better control on regular expressions with so called modifiers, which are assed to the new "reflags" attribute. It may contain one or more modifiers:
- i Matching is case insensitive
- m Treat string as multiple lines. When set, a "^" matches any beginning of line, and "$" any end of line. By default, they match begin and end of string.
- s Treat string as single line. A dot (".") may also match a newline, whereas it does not by default.
- x Allow formatted regular expression, that means whitespaces, newlines and comments are removed from regular expression before processing.
- •
- subst-in-string "S"
<set-var foo="abcdefghijk" /> <subst-in-string <get-var foo /> "[c-e]" /> <subst-in-string <get-var foo /> "([c-e])" "\\1 " />Output:
abfghijk abc d e fghijkSource:
<set-var foo="abcdefghijk\nabcdefghijk\nabcdefghijk" /> <subst-in-string <get-var foo /> ".$" "" /> <subst-in-string <get-var foo /> ".$" "" singleline=false /> <subst-in-string <get-var foo /> " ([a-c]) | [0-9] " ":\\1:" reflags=x />Output:
abcdefghijk abcdefghijk abcdefghij abcdefghij abcdefghij abcdefghij :a::b::c:defghijk :a::b::c:defghijk :a::b::c:defghijk
- •
- subst-in-var "S"
- •
- match "S"
- "action=report"
- Prints "true" if string contains regexp.
- "action=extract"
- Prints the expression matching regexp in string.
- "action=delete"
- Prints the string without the expression matching regexp in string.
- "action=startpos"
- Prints the first char of the expression matching regexp in string. If there is no match, returns "-1".
- "action=endpos"
- Prints the last char of the expression matching regexp in string. If there is no match, returns "-1".
- "action=length"
- Prints the length of the expression matching regexp in string.
1:<match "abcdefghijk" "[c-e]+" /> 2:<match "abcdefghijk" "[c-e]+" action=extract /> 3:<match "abcdefghijk" "[c-e]+" action=delete /> 4:<match "abcdefghijk" "[c-e]+" action=startpos /> 5:<match "abcdefghijk" "[c-e]+" action=endpos /> 6:<match "abcdefghijk" "[c-e]+" action=length />Output:
1:true 2:cde 3:abfghijk 4:2 5:5 6:3
Arrays¶
With mp4h one can easily deal with string arrays. Variables can be treated as a single value or as a newline separated list of strings. Thus after defining <set-var digits="01
2
3" /> one can view its content or one of these values: Source:
<get-var digits /> <get-var digits[2] />Output:
0 1 2 3 2
- •
- array-size "S"
<array-size digits />Output:
4
- •
- array-push "S"
<array-push digits "10\n11\n12" /> <get-var digits />Output:
0 1 2 3 10 11 12
- •
- array-pop "S"
- •
- array-topvalue "S"
<array-topvalue digits />Output:
12
- •
- array-add-unique "S"
<array-add-unique digits 2 /> <get-var digits />Output:
0 1 2 3 10 11 12
- "caseless=true"
- Comparison is case insensitive.
- •
- array-concat "S"
<set-var foo="foo" /> <set-var bar="bar" /> <array-concat foo bar /><get-var foo />Output:
foo bar
- •
- array-member "S"
<array-member digits 11 />Output:
5
- "caseless=true"
- Comparison is case insensitive.
- •
- array-shift "S"
<array-shift digits 2 /> Now: <get-var digits /> <array-shift digits -4 /> And: <get-var digits />Output:
Now:0
1
2
3
10
11
12 And: 2
3
10
11
12
- "start=start"
- Change origin of shifts (default is 0).
<array-shift digits -2 start=2 /><get-var digits />
2 3 12
- •
- sort "S"
<sort digits /><get-var digits />Output:
12 2 3
- "caseless=true"
- Comparison is case insensitive.
- "numeric=true"
- Sort lines numerically
<sort digits numeric=true /><get-var digits />
2 3 12
- "sortorder=reverse"
- Reverse sort order
<sort digits numeric=true sortorder=reverse />;;; <get-var digits />
12 3 2
Numerical operators¶
These operators perform basic arithmetic operations. When all operands are integers result is an integer too, otherwise it is a float. These operators are self-explanatory.- •
- add "S"
- •
- substract "S"
- •
- multiply "S"
- •
- divide "S"
- •
- min "S"
- •
- max "S"
<add 1 2 3 4 5 6 /> <add 1 2 3 4 5 6. />Output:
21 21.000000Source:
<define-tag factorial whitespace=delete> <ifeq %0 1 1 <multiply %0 "<factorial <substract %0 1 /> />" /> /> </define-tag> <factorial 6 />Output:
720
- •
- modulo "S"
<modulo 345 7 />Output:
2Those functions compare two numbers and returns "true" when this comparison is true. If one argument is not a number, comparison is false.
- •
- gt "S"
- •
- lt "S"
- •
- eq "S"
- •
- neq "S"
Relational operators¶
- •
- not "S"
- •
- and "S"
- •
- or "S"
Flow functions¶
- •
- group "S""V"
<define-tag text1> Text on 3 lines without whitespace=delete </define-tag> <define-tag text2 whitespace=delete> Text on 3 lines with whitespace=delete </define-tag> <define-tag text3 whitespace=delete> <group "Text on 3 lines with whitespace=delete" /> </define-tag> <text1/> <text2/> <text3/>Output:
Text on 3 lines without whitespace=deleteText on3 lines withwhitespace=delete
Text on
3 lines with
whitespace=delete Note that newlines are suppressed in "text2" and result is certainly unwanted.
- •
- compound
- "separator=string"
- By default arguments are put aside. This attribute define a separator inserted between arguments.
- •
- disjoin "S"
- •
- noexpand "S""V"
- •
- expand "S"
<subst-in-string "=LT=define-tag foo>bar=LT=/define-tag>" "=LT=" "<" /> <foo/> <subst-in-string "=LT=define-tag foo>quux=LT=/define-tag>" "=LT=" "<noexpand "<" />" /> <foo/>Output:
bar <define-tag foo>quux</define-tag> bar
- •
- if "S""V"
<define-tag test whitespace=delete> <if %0 "yes" "no" /> </define-tag> <test "string" /> <test "" />Output:
yes no
- •
- ifeq "S""V"
- •
- ifneq "S""V"
- •
- when
- •
- while "V"
<set-var i=10 /> <while <gt <get-var i /> 0 />>;;; <get-var i /> <decrement i />;;; </while>Output:
10 9 8 7 6 5 4 3 2 1
- •
- foreach
<set-var x="1\n2\n3\n4\n5\n6" /> <foreach i x><get-var i /> </foreach>Output:
1 2 3 4 5 6
- "start=start"
- Skips first indexes.
<set-var x="1\n2\n3\n4\n5\n6" /> <foreach i x start=3><get-var i /> </foreach>
4 5 6
- "end=end"
- Stops after index has reached that value.
<set-var x="1\n2\n3\n4\n5\n6" /> <foreach i x end=3><get-var i /> </foreach>
1 2 3
- "step=step"
- Change index increment (default is 1). If step is negative,
array is treated in reverse order.
<set-var x="1\n2\n3\n4\n5\n6" /> <foreach i x step=2><get-var i /> </foreach> <foreach i x step=-2><get-var i /> </foreach>
1 3 5 6 4 2
- •
- var-case "S""V"
<set-var i=0 /> <define-tag test> <var-case x=1 <group <increment i /> x<get-var i /> /> x=2 <group <decrement i /> x<get-var i /> /> y=1 <group <increment i /> y<get-var i /> /> y=2 <group <decrement i /> y<get-var i /> /> /> </define-tag> <set-var x=1 y=2 /><test/> <set-var x=0 y=2 /><test/>Output:
x1y0 y-1
- •
- break "S"
<set-var i=10 /> <while <gt <get-var i /> 0 />>;;; <get-var i /> <decrement i />;;; <ifeq <get-var i /> 5 <break/> />;;; </while>Output:
10 9 8 7 6
- •
- return "S"
- "up=number"
- This attribute determines how much levels have to be exited. By default only one level is skipped. With a null value, all current macros are exited from. A negative value do the same, and stops processing current file.
- •
- warning "S"
- •
- exit "S"
- "message=string"
- Prints a message to the standard error.
- "status=rc"
- Selects the code returned by the program (-1 by default).
- •
- at-end-of-file
File functions¶
- •
- directory-contents "S"
<directory-contents . matching=".*\\.mp4h$" />Output:
mp4h.mp4h
- •
- real-path "S"
<real-path pathname=<__file__/> />Output:
/tmp/buildd/mp4h-1.3.1/doc/mp4h.mp4h
- •
- file-exists "S"
- •
- get-file-properties "S"
<get-file-properties <__file__/> />Output:
68628 FILE 1324148499 1324148499 1324148519 pbuilder pbuilder
- •
- include "S"
- "file=filename"
- The given file is read and inserted into the input stream.
This attribute cannot be combined with the command attribute.
- "command=command-line"
- The given command line is executed on the operating system,
and the output of it is inserted in the input stream. This attribute
cannot be combined with the file attribute.
- "alt=action"
- If file is not found, this alternate action is handled. If this atribute is not set and file is not found, then an error is raised. This attribute has no effect when the command attribute is specified.
- "verbatim=true"
- File content is included without expansion. This is similar to using the m4 undivert macro with a filename as argument.
<include command="uname -a" />Output:
Linux snidget 2.6.32-5-amd64 #1 SMP Mon Oct 3 03:59:20 UTC 2011 x86_64 GNU/Linux
- •
- use "S"
- •
- comment
- •
- set-eol-comment "S"
- •
- set-quotes "S"
- "display=visible"
- Delimiters are also written into output.
Diversion functions¶
Diversions are a way of temporarily saving output. The output of mp4h can at any time be diverted to a temporary file, and be reinserted into the output stream, undiverted, again at a later time. Numbered diversions are counted from 0 upwards, diversion number 0 being the normal output stream. The number of simultaneous diversions is limited mainly by the memory used to describe them, because mp4h tries to keep diversions in memory. However, there is a limit to the overall memory usable by all diversions taken altogether. When this maximum is about to be exceeded, a temporary file is opened to receive the contents of the biggest diversion still in memory, freeing this memory for other diversions. So, it is theoretically possible that the number of diversions be limited by the number of available file descriptors.- •
- divert "S"
<divert divnum="-1"/> This is sent nowhere... <divert/> This is output.Output:
This is sent nowhere...This is output.
- •
- undivert "S"
<divert divnum="1"/> This text is diverted. <divert/> This text is not diverted. <undivert divnum="1"/>Output:
This text is diverted.This text is not diverted.
- •
- divnum "S"
Initial <divnum/> <divert divnum="1"/> Diversion one: <divnum/> <divert divnum="2"/> Diversion two: <divnum/> <divert/>Output:
Initial 0Diversion one: 1 Diversion two: 2
Debugging functions¶
When constructs become complex it could be hard to debug them. Functions listed below are very useful when you could not figure what is wrong. These functions are not perfect yet and must be improved in future releases.- •
- function-def "S"
<function-def example />Output:
<set-var-verbatim verb-body=%ubody /><subst-in-var verb-body "<" "<" /> <subst-in-var verb-body ">" ">" /><subst-in-var verb-body "^\n*" "" /><subst-in-var verb-body "^" " " reflags=m /><set-var body=%body /><subst-in-var body "<three-colon/>[^;\n]*\n[ \t]*" "" /><subst-in-var body "<three-colon/>$" "" reflags=m /><subst-in-var body "^\n*" "" /><subst-in-var body "^" " " reflags=m /><group "Source:<get-var-once verb-body /> Output: <get-var-once body /> " />
- •
- debugmode "S"
- •
- debugfile "S"
- •
- debugging-on "S"
- •
- debugging-off "S"
Miscellaneous¶
- •
- __file__ "S"
- •
- __line__ "S"
This is <__file__/>, line <__line__/>.Output:
This is ./mp4h.mp4h, line 2201.If you closely look at source code you will see that this number is wrong. Indeed the number line is the end of the entire block containing this instruction.
- •
- __version__ "S"
- •
- dnl "S"
<dnl/>This is a comment foo <dnl/>This is a comment barOutput:
foo bar
- •
- date "S"
- "time"
- An epoch time specification.
- "format"
- A format specification as used with the strftime(3) C library routine.
<date/> <set-var info=<get-file-properties <__file__/> /> /> <date <get-var info[2] /> /> <date time="<get-var info[2] />" format="%Y-%m-%d %H:%M:%S" />Output:
Sat Dec 17 19:01:59 2011Sat Dec 17 19:01:39 2011
2011-12-17 19:01:39
- •
- timer "S"
The number of clock ticks since the beginning of generation of this documentation by &mp4h; is: <timer/>Output:
The number of clock ticks since the beginning of generation of this documentation by B<mp4h> is: user 3 sys 0
- •
- mp4h-l10n "S"
- •
- mp4h-output-radix "S"
<add 1.2 3.4 /> <mp4h-output-radix 2 /> <add 1.2 3.4 />Output:
4.6000004.60
EXTERNAL PACKAGES¶
It is possible to include external files with the "include" command. Files are first searched in current directory, then in directories specified on command-line with the "-I" option, next in directories listed in the "MP4HLIB" environment variable (it used to be "MP4HPATH" for versions prior to 1.3), and last under the compile-time location ("/usr/local/lib/mp4h/1.3.1:/usr/local/share/mp4h" by default). Another way to include packages is with the "use" command. There are two differences between "use" and "include": first, package name has no suffix; and more important, a package cannot be loaded more than once.MACRO EXPANSION¶
This part describes internal mechanism of macro expansion. It must be as precise and exhaustive as possible so contact me "<URL:mailto:barbier@linuxfr.org>" if you have any suggestion.Basics¶
Let us begin with some examples: Source:<define-tag foo> This is a simple tag </define-tag> <define-tag bar endtag=required> This is a complex tag </define-tag> <foo/> <bar>Body function</bar>Output:
This is a simple tag This is a complex tagUser defined macros may have attributes like HTML tags. To handle these attributes in replacement text, following conventions have been adopted (mostly derived from Meta-HTML):
- •
- Sequence %name is replaced by the command name.
- •
- Attributes are numbered from 0. In replacement text, %0 is
replaced by first argument, %1 by the 2nd, etc. As there is no limitation
on the number of arguments, %20 is the 21st argument and not the third
followed by the 0 letter.
<define-tag href> <a href="%0">%1</a> </define-tag> <href http://www.gimp.org "The Gimp" />
<a href="http://www.gimp.org">The Gimp</a>
- •
- Sequence "%#" prints number of attributes.
- •
- Sequence "%%" is replaced by "%", which
is useful in nested definitions.
<define-tag outer>;;; outer, # attributes: %# <define-tag inner1>;;; inner1, # attributes: %#;;; </define-tag>;;; <define-tag inner2>;;; inner2, # attributes: %%#;;; </define-tag>;;; <inner1 %attributes and some others /> <inner2 %attributes and some others /> </define-tag> <outer list attributes />
outer, # attributes: 2 inner1, # attributes: 2 inner2, # attributes: 5
- •
- Sequence %attributes is replaced by the space separated
list of attributes.
<define-tag mail1> <set-var %attributes /> <get-var name /> <get-var mail /> </define-tag> <set-var name="" mail="" /> <mail1 name="Dr. Foo" mail="hello@foo.com" />
Dr. Foo hello@foo.com
- •
- Sequence %body is replaced by the body of a complex macro.
<define-tag mail2 endtag=required whitespace=delete> <set-var %attributes /> <a href="mailto:<get-var mail />">%body</a> </define-tag> <mail2 mail="hello@foo.com"> <img src="photo.png" alt="Dr. Foo" border=0 /> </mail2>
<a href="mailto:hello@foo.com"> <img src="photo.png" alt="Dr. Foo" border=0 /> </a>
- •
- The two forms above accept modifiers. When %Aattributes or
%Abody is used, a newline separated list of attributes is printed.
<define-tag show-attributes whitespace=delete> <set-var list="%Aattributes" i=0 /> <foreach attr list> <group "%<get-var i />: <get-var attr />" /> <increment i /> </foreach> </define-tag> <show-attributes name="Dr. Foo" mail="hello@foo.com" />
%0: name=Dr. Foo%1: mail=hello@foo.com
- •
- Another alternate form is obtained by replacing
"A" by "U", in which case text is replaced but will
not be expanded. This does make sense only when macro has been defined
with "attributes=verbatim", otherwise attributes are expanded
before replacement.
<define-tag show1> Before expansion: %Uattributes After expansion: %attributes </define-tag> <define-tag show2 attributes=verbatim> Before expansion: %Uattributes After expansion: %attributes </define-tag> <define-tag bar>and here %attributes</define-tag> <show1 <bar we go /> /> <show2 <bar we go /> />
Before expansion: and here we go After expansion: and here we go Before expansion: <bar we go /> After expansion: and here we go
- •
- Modifiers "A" and "U" can be combined.
Attributes¶
Attributes are separated by spaces, tabulations or newlines, and each attribute must be a valid mp4h entity. For instance with the definitions above, "<bar>" can not be an attribute since it must be finished by "</bar>". But this is valid:<foo <foo/> />or even
<foo <foo name=src url=ici /> />In these examples, the "foo" tag has only one argument. Under certain circumstances it is necessary to group multiple statements into a single one. This can be done with double quotes or with the "group" primitive, e.g.
<foo "This is the 1st attribute" <group and the second /> />Note: Unlike HTML single quotes can not replace doube quotes for this purpose. If double quotes appear in an argument, they must be escaped by a backslash "\". Source:
<set-var text="Text with double quotes \" inside" /> <get-var text />Output:
Text with double quotes " inside
Macro evaluation¶
Macros are characterized by- •
- name
- •
- container status (simple or complex)
- •
- if attributes are expanded or not
- •
- function type (primitive or user defined macro)
- •
- for primitives, address of corresponding code in memory and for user defined macros the replacement text
<tt>%body</tt>
</define-tag> This definition has a major drawback: Source:
<text-tt>This is an <text-tt>example</text-tt></text-tt>Output:
<tt>This is an <tt>example</tt></tt>We would like the inner tags be removed. First idea is to use an auxiliary variable to know whether we still are inside such an environment: <set-var _text:tt=0 />
<define-tag text-tt endtag=required whitespace=delete>
<increment _text:tt />
<ifeq <get-var _text:tt /> 1 "<tt*>" />
%body
<ifeq <get-var _text:tt /> 1 "</tt*>" />
<decrement _text:tt />
</define-tag> (the presence of asterisks in HTML tags is explained in next section). Source:
<text-tt>This is an <text-tt>example</text-tt></text-tt>Output:
<tt>This is an example</tt>But if we use simple tags, as in the example below, our definition does not seem to work. It is because attributes are expanded before they are put into replacement text. Source:
<define-tag opt><text-tt>%attributes</text-tt></define-tag> <opt "This is an <opt example />" />Output:
<tt>This is an <tt>example</tt></tt>If we want to prevent this problem we have to forbid attributes expansion with Source:
<define-tag opt attributes=verbatim>;;; <text-tt>%attributes</text-tt>;;; </define-tag> <opt "This is an <opt example />" />Output:
<tt>This is an example</tt>
Expansion flags¶
When you want to embed some server-side scripting language in your pages, you face up some weird problems, like in <a href=<%= $url %>>Hello</a> The question is how do mp4h know that this input has some extra delimiters? The answer is that mp4h should not try to handle some special delimiters, because it cannot handle all of them (there are ASP, ePerl, PHP,... and some of them are customizable). Now, remember that mp4h is a macro-processor, not an XML parser. So we must focus on macros,and format our input file so that it can be parsed without any problem. Previous example may be written <a href="<%= $url %>">Hello</a> because quotes prevent inner right-angle bracket from closing the "a" tag. Another common problem is when we need to print only a begin or an end tag alone. For instance it is very desirable to define its own headers and footers with <define-tag header><html*>
<head>
... put here some information ....
</head>
<body* bgcolor="#ffffff" text="#000000">
</define-tag>
<define-tag footer>
</body*>
</html*>
</define-tag> Asterisks mark these tags as pseudo-simple tags, which means that they are complex HTML tags, but used as simple tags within mp4h because tags would not be well nested otherwise. This asterisk is called ``trailing star'', it appears at the end of the tag name. Sometimes HTML tags are not parsable, as in this javascript code:
... document.write('<*img src="foo.gif"'); if (text) document.write(' alt="'+text+'"'); document.write('>'); ...The ``leading star'' is an asterisk between left-angle bracket and tag name, which prevents this tag from being parsed. That said we can now understand what the "--expansion" flag is for. It controls how expansion is performed by mp4h. It is followed by an integer, which is a bit sum of the following values
- 1 do not parse unknown tags.
- When set, HTML tags are not parsed. When unset, HTML tags are parsed, i.e. that attributes and/or body is collected.
- 2 unknown tags are assumed being simple.
- When set, HTML tags are simple by default. When unset, HTML tags are complex by default, unless their attribute contain a trailing slash or a trailing star appear just after tag name (see below).
- 4 trailing star in tag name do not make this tag simple.
- When set, trailing star in tag name has no special effect. When unset, it causes an HTML tag to be simple.
- 8 an unmatched end tag closes all previous unmatched begin tags.
- When set, all missing end closing tags are automatically inserted. When unset, an unmatched end tag is discarded and interpreted as normal text, so processing goes on until matching and tag is found.
- 16 interpret backslashes as printf.
- When set, backslashes before non special characters are removed. When unset, they are preserved.
- 32 remove trailing slash in tag attributes.
- When set, remove trailing slash in tag attributes on output. When unset, they are preserved.
- 64 do not remove trailing star in tag name.
- When set, trailing star after tag name are preserved on output. When unset, they are removed.
- 128 do not remove leading star in tag name.
- When set, leading star before tag name are preserved on output. When unset, they are removed.
- 256 do not add a space before trailing slash in tag attributes
- By default, a space is inserted before trailing slash in tag attributes. When set, this space is not prepended.
- 1024 suppress warnings about bad nested tags.
- When set, warnings about bad nested tags are not displayed. When unset, they are printed on standard error.
- 2048 suppress warnings about missing trailing slash.
- When set, warnings about missing trailing slash are not displayed. When unset, they are printed on standard error.
mp4h -hto find default value. Current value matches HTML syntax, and it will tend to zero when XHTML syntax becomes more familiar.
AUTHOR¶
Denis Barbier "<URL:mailto:barbier@linuxfr.org>" Mp4h has its own homepage "<URL:http://mp4h.tuxfamily.org/>".THANKS¶
Sincere thanks to Brian J. Fox for writing Meta-HTML and Rene Seindal for maintaining this wonderful macro parser called GNU m4.2011-12-17 | HTML Tools |