.\" Automatically generated by Pod::Man 2.22 (Pod::Simple 3.07) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "MKDoc::XML::TreeBuilder 3pm" .TH MKDoc::XML::TreeBuilder 3pm "2004-10-06" "perl v5.10.1" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" MKDoc::XML::TreeBuilder \- Builds a parsed tree from XML data .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& my @top_nodes = MKDoc::XML::TreeBuilder\->process_data ($some_xml); .Ve .SH "SUMMARY" .IX Header "SUMMARY" MKDoc::XML::TreeBuilder uses MKDoc::XML::Tokenizer to turn \s-1XML\s0 data into a parsed tree. Basically it smells like an \s-1XML\s0 parser, looks like an \&\s-1XML\s0 parser, and awfully overlaps with \s-1XML\s0 parsers. .PP But it's not an \s-1XML\s0 parser. .PP \&\s-1XML\s0 parsers are required to die if the \s-1XML\s0 data is not well formed. MKDoc::XML::TreeBuilder doesn't give a rip: it'll parse whatever as long as it's good enough for it to parse. .PP \&\s-1XML\s0 parsers expand entities. MKDoc::XML::TreeBuilder doesn't. At least not yet. .PP \&\s-1XML\s0 parsers generally support namespaces. MKDoc::XML::TreeBuilder doesn't \- and probably won't. .SH "DISCLAIMER" .IX Header "DISCLAIMER" \&\fBThis module does low level \s-1XML\s0 manipulation. It will somehow parse even broken \s-1XML\s0 and try to do something with it. Do not use it unless you know what you're doing.\fR .SH "API" .IX Header "API" .ie n .SS "my @top_nodes = MKDoc::XML::Tokenizer\->process_data ($some_xml);" .el .SS "my \f(CW@top_nodes\fP = MKDoc::XML::Tokenizer\->process_data ($some_xml);" .IX Subsection "my @top_nodes = MKDoc::XML::Tokenizer->process_data ($some_xml);" Returns all the top nodes of the \f(CW$some_xml\fR parsed tree. .PP Although the \s-1XML\s0 spec says that there can be only one top element in an \s-1XML\s0 file, you have to take two things into account: .PP 1. Pseudo-elements such as \s-1XML\s0 declarations, processing instructions, and comments. .PP 2. MKDoc::XML::TreeBuilder is not an \s-1XML\s0 parser, it's not its job to care about the \s-1XML\s0 specification, so having multiple top elements is just fine. .ie n .SS "my $tokens = MKDoc::XML::Tokenizer\->process_data ('/some/file.xml');" .el .SS "my \f(CW$tokens\fP = MKDoc::XML::Tokenizer\->process_data ('/some/file.xml');" .IX Subsection "my $tokens = MKDoc::XML::Tokenizer->process_data ('/some/file.xml');" Same as MKDoc::XML::TreeBuilder\->process_data ($some_xml), except that it reads \f(CW$some_xml\fR from '/some/file.xml'. .SH "Returned parsed tree \- data structure" .IX Header "Returned parsed tree - data structure" I have tried to make MKDoc::XML::TreeBuilder look enormously like HTML::TreeBuilder. So most of this section is stolen and slightly adapted from the HTML::Element man page. .PP \&\s-1START\s0 \s-1PLAGIARISM\s0 \s-1HERE\s0 .PP It may occur to you to wonder what exactly a \*(L"tree\*(R" is, and how it's represented in memory. Consider this \s-1HTML\s0 document: .PP .Vb 9 \& \& \& Stuff \& \& \& \&

I like potatoes!

\& \& .Ve .PP Building a syntax tree out of it makes a tree-structure in memory that could be diagrammed as: .PP .Vb 11 \& html (lang=\*(Aqen\-US\*(Aq) \& / \e \& / \e \& / \e \& head body \& /\e \e \& / \e \e \& / \e \e \& title meta h1 \& | (name=\*(Aqauthor\*(Aq, | \& "Stuff" content=\*(AqJojo\*(Aq) "I like potatoes" .Ve .PP This is the traditional way to diagram a tree, with the \*(L"root\*(R" at the top, and it's this kind of diagram that people have in mind when they say, for example, that \*(L"the meta element is under the head element instead of under the body element\*(R". (The same is also said with \&\*(L"inside\*(R" instead of \*(L"under\*(R" \*(-- the use of \*(L"inside\*(R" makes more sense when you're looking at the \s-1HTML\s0 source.) .PP Another way to represent the above tree is with indenting: .PP .Vb 8 \& html (attributes: lang=\*(Aqen\-US\*(Aq) \& head \& title \& "Stuff" \& meta (attributes: name=\*(Aqauthor\*(Aq content=\*(AqJojo\*(Aq) \& body \& h1 \& "I like potatoes" .Ve .PP Incidentally, diagramming with indenting works much better for very large trees, and is easier for a program to generate. The \f(CW$tree\fR\->dump method uses indentation just that way. .PP However you diagram the tree, it's stored the same in memory \*(-- it's a network of objects, each of which has attributes like so: .PP .Vb 4 \& element #1: _tag: \*(Aqhtml\*(Aq \& _parent: none \& _content: [element #2, element #5] \& lang: \*(Aqen\-US\*(Aq \& \& element #2: _tag: \*(Aqhead\*(Aq \& _parent: element #1 \& _content: [element #3, element #4] \& \& element #3: _tag: \*(Aqtitle\*(Aq \& _parent: element #2 \& _content: [text segment "Stuff"] \& \& element #4 _tag: \*(Aqmeta\*(Aq \& _parent: element #2 \& _content: none \& name: author \& content: Jojo \& \& element #5 _tag: \*(Aqbody\*(Aq \& _parent: element #1 \& _content: [element #6] \& \& element #6 _tag: \*(Aqh1\*(Aq \& _parent: element #5 \& _content: [text segment "I like potatoes"] .Ve .PP The \*(L"treeness\*(R" of the tree-structure that these elements comprise is not an aspect of any particular object, but is emergent from the relatedness attributes (_parent and _content) of these element-objects and from how you use them to get from element to element. .PP \&\s-1STOP\s0 \s-1PLAGIARISM\s0 \s-1HERE\s0 .PP This is pretty much the kind of data structure MKDoc::XML::TreeBuilder returns. More information on different nodes and their type is available in MKDoc::XML::Token. .SH "NOTES" .IX Header "NOTES" Did I mention that MKDoc::XML::TreeBuilder is \s-1NOT\s0 an \s-1XML\s0 parser? .SH "AUTHOR" .IX Header "AUTHOR" Copyright 2003 \- MKDoc Holdings Ltd. .PP Author: Jean-Michel Hiver .PP This module is free software and is distributed under the same license as Perl itself. Use it at your own risk. .SH "SEE ALSO" .IX Header "SEE ALSO" MKDoc::XML::Token MKDoc::XML::Tokenizer