TclXML(3tcl) | TclXML Package Commands | TclXML(3tcl) |
See the file "LICENSE" for information on
usage and redistribution of this file, and for a DISCLAIMER OF ALL WARRANTIES.
package require parserclass
::xml::parserclass option ? arg arg ... ?
::xml::parser ? name? ? -option value
... ?
parser option arg
The parser may also perform other functions, such as normalisation, validation
and/or entity expansion. Generally, these functions are under the control of
configuration options. Whether these functions can be performed at all depends
on the parser implementation.
The TclXML package provides a generic interface for use by a Tcl application,
along with a low-level interface for use by a parser implementation. Each
implementation provides a class of XML parser, and these register themselves
using the ::xml::parserclass create command. One of the registered
parser classes will be the default parser class.
Loading the package with the generic package require xml command allows
the package to automatically determine the default parser class. In order to
select a particular parser class as the default, that class' package may be
loaded directly, eg. package require xml::libxml2. In all cases, all
available parser classes are registered with the TclXML package, the
difference is simply in which one becomes the default.
The parser scans an XML document's syntactical structure, evaluating callback
scripts for each feature found. At the very least the parser will normalise
the document and check the document for well-formedness. If the document is
not well-formed then the -errorcommand option will be evaluated. Some
parser classes may perform additional functions, such as validation.
Additional features provided by the various parser classes are described in
the section Parser Classes
Parsing is performed synchronously. The command blocks until the entire document
has been parsed. Parsing may be terminated by an application callback, see the
section Callback Return Codes. Incremental parsing is also supported by using
the -final configuration option.
-baseurl is deprecated in favour of -baseuri.
See also -startdoctypedeclcommand and -enddoctypedeclcommand.
Additional information about the element takes the form of configuration
options. Possible options are:
Additional information about the element takes the form of configuration
options. Possible options are:
The return result of the callback script determines the action of the parser.
Note that these codes are interpreted in a different manner to other
callbacks.
This is useful to either override the normal loading of an entity's data, or to
implement new or alternative URI schemes. As an example, the script below sets
an external entity handler that intercepts "tcl:" URIs and evaluates
them as inline Tcl scripts:
package require xml
proc External {name baseuri uri id} {
switch -glob -- $uri { tcl:* { regexp {^tcl:(.*)$} $uri discard script return [uplevel #0 $script] } default { return -code continue {} }
} }
set parser [xml::parser -externalentitycommand External] $parser parse
{<!DOCTYPE example [
<!ENTITY example SYSTEM "tcl:set%20example%20HelloWorld"> ]> <example>
&example; </example> }
puts $example
This script will print "HelloWorld" to stdout.
This is useful to interpose on the loading of external entities without
interfering with the loading of entities.
set parser [::xml::parser -final 0] $parser parse $data1 $parser parse $data2
$parser configure -final 1 $parser parse $finaldata
package require xml
proc cdata {data args} {
puts -nonewline $data }
set parser [::xml::parser -characterdatacommand cdata] $parser parse [read
stdin]
This script counts the number of elements in an XML document read from stdin.
package require xml
proc EStart {varName name attlist args} {
upvar #0 $varName var
incr var }
set count 0 set parser [::xml::parser -elementstartcommand [list EStart count]]
$parser parse [read stdin] puts "The XML document contains $count
elements"
See the description of the -externalentitycommand for further details.
This parser implementation aims to implement XML v1.0 and supports XML
Namespaces.
Generally the parser produces XML Infoset information items. That is, it gives
the application a slightly higher-level view than the raw XML syntax. For
example, it does not report CDATA Sections.
TclXML/tcl is not able to handle character encodings other than UTF-8.
When the package is loaded the variable ::xml::libxml2::libxml2version is
set to the version number of the libxml2 library being used.
On MS Windows, it is necessary to load the generic XML package first, and then
the TclXML/libxml2 package. For example,
package require xml package require xml::libxml2
TclXML/libxml2 manages the document object as a Tcl object. See the
-keep for further information.
* -reportempty has no effect. libxml2 does not report empty element
syntax.
* Incremental (push) parsing, ie. -final 0 is not
supported.
* TclXML/libxml2 does not provide (DTD) validation, (WXS) schema validation or
Relax NG validation, although the libxml2 library does provide those
functions. These functions are provided by the TclDOM/libxml2 package, but
only in a "posteriori" fashion (ie. only after the document has been
parsed).
* libxml2 supports XML Namespaces. The use of XML Namespaces can be queried, but
the declaration of a XML Namespace is not reported.
NAME¶
TclXML - XML parser support for TclSYNOPSIS¶
package require xmlDESCRIPTION¶
TclXML provides event-based parsing of XML documents. The application may register callback scripts for certain document features, and when the parser encounters those features while parsing the document the callback is evaluated.COMMANDS¶
::xml::parserclass¶
The ::xml::parserclass command is used to manage XML parser classes.Command Options¶
The following command options may be used:- create
- create name ? -createcommand script? ? -createentityparsercommand script? ? -parsecommand script? ? -configurecommand script? ? -getcommand script? ? -deletecommand script?
- destroy
- destroy name
- info
- info names default
::xml::parser¶
The ::xml::parser command creates an XML parser object. The return value of the command is the name of the newly created parser.Configuration Options¶
The ::xml::parser command accepts the following configuration options:- -attlistdeclcommand
- -attlistdeclcommand script
-
name
Element type name
-
attrname
Attribute name being declared
-
type
Attribute type
-
default
Attribute default, such as #IMPLIED
-
value
Default attribute value. Empty string if none given.
- -baseuri -baseurl
- -baseuri URI
- -characterdatacommand
- -characterdatacommand script
-
data
Character data in the document
- -commentcommand
- -commentcommand script
-
data
Comment data
- -defaultcommand
- -defaultcommand script
-
data
Document data
- -defaultexpandinternalentities
- -defaultexpandinternalentities boolean
- -doctypecommand
- -doctypecommand script
-
name
The name of the document element
-
public
Public identifier for the external DTD subset
-
system
System identifier for the external DTD subset. Usually a URI.
-
dtd
The internal DTD subset
- -elementdeclcommand
- -elementdeclcommand script
-
name
The element type name
-
model
Content model specification
- -elementendcommand
- -elementendcommand script
-
name
The element type name that has ended
-
args
Additional information about this element
-
-empty
boolean
The empty element syntax was used for this element
-
-namespace
uri
The element is in the XML namespace associated with the given URI
- -elementstartcommand
- -elementstartcommand script
-
name
The element type name that has started
-
attlist
A Tcl list containing the attributes for this element. The list of attributes is formatted as pairs of attribute names and their values.
-
args
Additional information about this element
-
-empty
boolean
The empty element syntax was used for this element
-
-namespace
uri
The element is in the XML namespace associated with the given URI
-
-namespacedecls
list
The start tag included one or more XML Namespace declarations. list is a Tcl list giving the namespaces declared. The list is formatted as pairs of values, the first value is the namespace URI and the second value is the prefix used for the namespace in this document. A default XML namespace declaration will have an empty string for the prefix.
- -encoding
- -encoding value
- -endcdatasectioncommand
- -endcdatasectioncommand script
- -enddoctypedeclcommand
- -enddoctypedeclcommand script
- -entitydeclcommand
- -entitydeclcommand script
-
name
The name of the entity being declared
-
args
Additional information about the entity declaration. An internal entity shall have a single argument, the replacement text. An external parsed entity shall have two additional arguments, the public and system indentifiers of the external resource. An external unparsed entity shall have three additional arguments, the public and system identifiers followed by the notation name.
- -entityreferencecommand
- -entityreferencecommand script
-
name
The name of the entity being referenced
- -errorcommand
- -errorcommand script
-
errorcode
A single word description of the error, intended for use by an application
-
errormsg
A human-readable description of the error
- -externalentitycommand
- -externalentitycommand script
-
name
The Tcl command name of the current parser
-
baseuri
An absolute URI for the current entity which is to be used to resolve relative URIs
-
uri
The system identifier of the external entity, usually a URI
-
id
The public identifier of the external entity. If no public identifier was given in the entity declaration then id will be an empty string.
- TCL_OK
-
switch -glob -- $uri { tcl:* { regexp {^tcl:(.*)$} $uri discard script return [uplevel #0 $script] } default { return -code continue {} }
} }
<!ENTITY example SYSTEM "tcl:set%20example%20HelloWorld"> ]> <example>
&example; </example> }
- TCL_CONTINUE
-
- TCL_BREAK
-
- TCL_ERROR
-
- -final
- -final boolean
- -ignorewhitespace
- -ignorewhitespace boolean
- -notationdeclcommand
- -notationdeclcommand script
-
name
The name of the notation
-
uri
An external identifier for the notation, usually a URI.
- -notstandalonecommand
- -notstandalonecommand script
- -paramentityparsing
- -paramentityparsing boolean
- -parameterentitydeclcommand
- -parameterentitydeclcommand script
-
name
The name of the parameter entity
-
args
For an internal parameter entity there is only one additional argument, the replacement text. For external parameter entities there are two additional arguments, the system and public identifiers respectively.
- -parser
- -parser name
- -processinginstructioncommand
- -processinginstructioncommand script
-
target
The name of the processing instruction target
-
data
Remaining data from the processing instruction
- -reportempty
- -reportempty boolean
- -startcdatasectioncommand
- -startcdatasectioncommand script
- -startdoctypedeclcommand
- -startdoctypedeclcommand script
- -unknownencodingcommand
- -unknownencodingcommand script
- -unparsedentitydeclcommand
- -unparsedentitydeclcommand script
-
system
The system identifier of the external entity, usually a URI
-
public
The public identifier of the external entity
-
notation
The name of the notation for the external entity
- -validate
- -validate boolean
- -warningcommand
- -warningcommand script
-
warningcode
A single word description of the warning, intended for use by an application
-
wanringmsg
A human-readable description of the warning
- -xmldeclcommand
- -xmldeclcommand script
-
version
The version number of the XML specification to which this document purports to conform
-
encoding
The character encoding of the document
-
standalone
A boolean declaring whether the document is standalone
Parser Command¶
The ::xml::parser command creates a new Tcl command with the same name as the parser. This command may be used to invoke various operations on the parser object. It has the following general form: name option arg option and the arg determine the exact behaviour of the command. The following commands are possible for parser objects:- cget
- cget -option
- configure
- configure -option value
- entityparser
- entityparser option value
- free
- free name
- get
- get name args
- parse
- parse xml args
- reset
- reset
CALLBACK RETURN CODES¶
Every callback script evaluated by a parser may return a return code other than TCL_OK. Return codes are interpreted as follows:- break Suppresses invocation of all further callback scripts. The parse method returns the TCL_OK return code.
- continue Suppresses invocation of further callback scripts until the current element has finished.
- error Suppresses invocation of all further callback scripts. The parse method also returns the TCL_ERROR return code.
- default Any other return code suppresses invocation of all further callback scripts. The parse method returns the same return code.
ERROR MESSAGES¶
If an error or warning condition is detected then an error message is returned. These messages are structured as a Tcl list, as described below: {domain level code node line message int1 int2 string1 string2 string3}- domain
-
- level
-
- code
-
- node
-
- line
-
- message
-
- int1
-
- int2
-
- string1
-
- string2
-
- string3
-
APPLICATION EXAMPLES¶
This script outputs the character data of an XML document read from stdin.puts -nonewline $data }
upvar #0 $varName var
incr var }
SAFE XML¶
TclXML/Tcl and TclXML/libxml2 may be used in a Safe Tcl interpreter. When a document is parsed in a Safe Tcl interpreter, any attempt by the XML document to load an external entity is handled by the -externalentitycommand callback. This callback is evaluated in the context of the safe interpreter and therefore is subject to the security policy in force for that interpreter. The default entity loader will not be invoked, even if the callback script returns a TCL_CONTINUE code.PARSER CLASSES¶
This section will discuss how a parser class is implemented.Tcl Parser Class¶
The pure-Tcl parser class requires no compilation - it is a collection of Tcl scripts. This parser implementation is non-validating, ie. it can only check well-formedness in a document. However, by enabling the -validate option it will read the document's DTD and resolve external entities. This parser class is referred to as TclXML/tcl.libxml2 Parser Class¶
The libxml2 parser class provides a Tcl interface to the libxml2 XML parser library. This parser class is referred to as TclXML/libxml2.get Method¶
TclXML/libxml2 provides the following arguments to the get method:- document
-
Additional Options¶
- -keep
- -keep normal | implicit
- -retainpath
- -retainpath xpath
- -retainpathns
- -retainpathns prefix ns ...
Limitations¶
The libxml2 parser classes has the following limitations:KEYWORDS¶
3.2 | TclXML |