.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.42)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings.  \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote.  \*(C+ will
.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and
.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
.    ds -- \(*W-
.    ds PI pi
.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
.    ds L" ""
.    ds R" ""
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds -- \|\(em\|
.    ds PI \(*p
.    ds L" ``
.    ds R" ''
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
.    if \nF \{\
.        de IX
.        tm Index:\\$1\t\\n%\t"\\$2"
..
.        if !\nF==2 \{\
.            nr % 0
.            nr F 2
.        \}
.    \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "CompactTree 3pm"
.TH CompactTree 3pm "2022-06-28" "perl v5.34.0" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
XML::CompactTree \- builder of compact tree structures from XML documents
.SH "VERSION"
.IX Header "VERSION"
Version 0.03
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 2
\&    use XML::CompactTree;
\&    use XML::LibXML::Reader;
\&
\&    my $reader = XML::LibXML::Reader\->new(location => $url);
\&    ...
\&    my $tree = XML::CompactTree::readSubtreeToPerl($reader);
\&    ...
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
This module provides functions that use XML::LibXML::Reader to parse
an \s-1XML\s0 document into a parse tree formed of nested arrays (and hashes).
.PP
It aims to be fast in doing that and to presreve all relevant
information from the \s-1XML\s0 (including namespaces, document order, mixed
content, etc.). It sacrifices user friendliness for speed.
.PP
\&\s-1IMPORTANT:\s0 There is an even more efficient \s-1XS\s0 implementation of this
module called XML::CompactTree::XS with 100% equivalent functionality.
.SH "PURPOSE"
.IX Header "PURPOSE"
I wrote this module because I noticed that repeated calls to methods
implemented in C (\s-1XS\s0) were very expensive in Perl.
.PP
Therefore traversing a large \s-1DOM\s0 tree using XML::LibXML or iterating
over an \s-1XML\s0 stream using XML::LibXML::Reader was much slower than
traversing similarly large and structured native Perl data
structures.
.PP
This module allows the user to build a document parse tree consisting
of native Perl data structures (arrays and optionally hashes) using
XML::LibXML::Reader with minimal number of \s-1XS\s0 calls.
.PP
(Note that there XML::CompactTree::XS is 100% equivalent of this
module that manages the same with just one \s-1XS\s0 call.)
.PP
It does not provide full \s-1DOM\s0 navigation but attempts to provide
maximum amount of information.  Its memory footprint should be
somewhat smaller than that of a corresponding XML::LibXML \s-1DOM\s0 tree.
.SH "EXPORT"
.IX Header "EXPORT"
By default, the following constants are exported (\f(CW\*(C`:flags\*(C'\fR export
tag) to be used as flags for the tree builder:
.PP
.Vb 11
\&   XCT_IGNORE_WS
\&   XCT_IGNORE_SIGNIFICANT_WS
\&   XCT_IGNORE_PROCESSING_INSTRUCTIONS
\&   XCT_IGNORE_COMMENTS
\&   XCT_USE_QNAMES           /* not yet implemented */
\&   XCT_KEEP_NS_DECLS
\&   XCT_TEXT_AS_STRING       /* not yet implemented */
\&   XCT_ATTRIBUTE_ARRAY
\&   XCT_PRESERVE_PARENT      /* not yet implemented */
\&   XCT_MERGE_TEXT_NODES     /* not yet implemented */
\&   XCT_DOCUMENT_ROOT
.Ve
.SH "FUNCTIONS"
.IX Header "FUNCTIONS"
.ie n .SS "readSubtreeToPerl( $reader, $flags, \emy %ns )"
.el .SS "readSubtreeToPerl( \f(CW$reader\fP, \f(CW$flags\fP, \emy \f(CW%ns\fP )"
.IX Subsection "readSubtreeToPerl( $reader, $flags, my %ns )"
Uses a given XML::LibXML::Reader parser objects to parse a subtree at
the current reader position to build a tree formed of nested arrays
(see \*(L"\s-1OUTPUT FORMAT\*(R"\s0).
.IP "reader" 4
.IX Item "reader"
A XML::LibXML::Reader object to use as the reader. While building the
tree, the reader moves to the next node on the current or higher
level.
.IP "flags" 4
.IX Item "flags"
An integer consisting of 1 bit flags (see constants in the \s-1EXPORT\s0 section).
Use binary or (|) to combine individual flags.
.Sp
The following flags are \s-1NOT\s0 implemented yet:
.Sp
.Vb 1
\&   XCT_USE_QNAMES, XCT_TEXT_AS_STRING, XCT_PRESERVE_PARENT, XCT_MERGE_TEXT_NODES
.Ve
.IP "ns" 4
.IX Item "ns"
You may pass an empty hash reference that will be populated by a
namespace_uri to namespace_index map, that can be used to decode
namespace indexes in the resulting data structure (see \s-1OUTPUT
FORMAT\s0).
.ie n .SS "readLevelToPerl( $reader, $flags, $ns )"
.el .SS "readLevelToPerl( \f(CW$reader\fP, \f(CW$flags\fP, \f(CW$ns\fP )"
.IX Subsection "readLevelToPerl( $reader, $flags, $ns )"
Like \f(CW\*(C`readSubtreeToPerl\*(C'\fR, but reads the subtree
at the current reader position and all its following siblings.
It returns an array reference of representations of these subtrees
as in the format described in \*(L"\s-1OUTPUT FORMAT\*(R"\s0.
.SH "OUTPUT FORMAT"
.IX Header "OUTPUT FORMAT"
The result of parsing a subtree is a Perl array reference \f(CW$node\fR
contains a node type followed by node data whose interpretation on
further positions in \f(CW$node\fR depends on the node type, as described
below:
.SS "Any Node"
.IX Subsection "Any Node"
.IP "\(bu" 5
\&\f(CW$node\fR\->[0] is an integer representing the node type. Use
XML::LibXML::Reader node-tye constants, e.g. \s-1XML_READER_TYPE_ELEMENT\s0
for an element node, \s-1XML_READER_TYPE_TEXT\s0 for text node, etc.
.SS "Document or Document Fragment Nodes"
.IX Subsection "Document or Document Fragment Nodes"
.IP "\(bu" 5
\&\f(CW$node\fR\->[1] contains the document encoding
.IP "\(bu" 5
\&\f(CW$node\fR\->[2] is an array reference containing similar represention of
all the child nodes of the document (fragment).
.PP
Note: XML::LibXML::Reader does not document node by default, which
means that calling readSubtreeToPerl on a reader object in its initial
state only parses the first node in the document (which can be the
root element, but also a comment or a processing instruction). Use
\&\s-1XCT_DOCUMENT_ROOT\s0 flag to force creating a document node in such case.
.SS "Element nodes"
.IX Subsection "Element nodes"
.IP "\(bu" 5
\&\f(CW$node\fR\->[1] is the local name (\s-1UTF\-8\s0 encoded character string)
.IP "\(bu" 5
\&\f(CW$node\fR\->[2] is the namespace index (see \s-1NAMESPACES\s0 below)
.IP "\(bu" 5
\&\f(CW$node\fR\->[3] is undef if the element has no attributes. Otherwise if
\&\s-1XCT_ATTRIBUTE_ARRAY\s0 flag was used, \f(CW$node\fR\->[3] is an array reference of
the form \f(CW\*(C`[ name1, value1, name2, value2, ....]\*(C'\fR of attribute names and
corresponding values. If \s-1XCT_ATTRIBUTE_ARRAY\s0 flag was not used, then
\&\f(CW$node\fR\->[3] is a hash reference mapping attribute names to the
corresponding attribute values \f(CW\*(C`{ name1=\*(C'\fRvalue1, name2=>value2...}>
.Sp
The flag \s-1XCT_KEEP_NS_DECLS\s0 controls whether namespace declarations
(xmlns=... or xmlns:prefix=...) are included along with normal
attributes or not.
.Sp
Note: there is no support for namespaced attributes yet, but the
attribute names are stored as QNames, so one can always use
\&\s-1XCT_KEEP_NS_DECLS\s0 to keep track of namespace prefix declarations and
do the resolving manually. Support for namespaced attributes is
planned.
.IP "\(bu" 5
If \s-1XTC_LINE_NUMBERS\s0 flag was used, \f(CW$node\fR\->[4] contains the line number
of the element and \f(CW$node\fR\->[5] contains an array reference containing
similar representions of the child nodes of the current node.
.IP "\(bu" 5
If \s-1XTC_LINE_NUMBERS\s0 flag was \s-1NOT\s0 used, \f(CW$node\fR\->[4] contains an array
reference of similar representations of the child nodes of the current
node.
.SS "Text, \s-1CDATA,\s0 Comment and White-Space Nodes"
.IX Subsection "Text, CDATA, Comment and White-Space Nodes"
.IP "\(bu" 5
\&\f(CW$node\fR\->[1] contains the node value (\s-1UTF\-8\s0 encoded character string)
.SS "Unparsed Entity, Processing-Instruction, and Notation Nodes"
.IX Subsection "Unparsed Entity, Processing-Instruction, and Notation Nodes"
.IP "\(bu" 5
\&\f(CW$node\fR\->[1] contains the local name (there is no support for
namespaces on these types of nodes yet)
.IP "\(bu" 5
\&\f(CW$node\fR\->[2] contains the node value
.SS "Skipping Less-Significant Nodes"
.IX Subsection "Skipping Less-Significant Nodes"
White-space (non-significant or significant), processing-instruction
and comment nodes can be completely skipped, using the following
flags:
.PP
.Vb 4
\&   XCT_IGNORE_WS
\&   XCT_IGNORE_SIGNIFICANT_WS
\&   XCT_IGNORE_PROCESSING_INSTRUCTIONS
\&   XCT_IGNORE_COMMENTS
.Ve
.SH "NAMESPACES"
.IX Header "NAMESPACES"
Namespaces of element nodes are stored in the element node as an
integer. 0 always represents nodes without namespace, all other
namespaces are assigned unique numbers in an increasing order as they
appear. You can pass an empty hash reference to the parsing functions
to obtain the mapping.
.SS "Example"
.IX Subsection "Example"
.Vb 2
\&  use XML::CompactTree;
\&  use XML::LibXML::Reader;
\&
\&  my $reader = XML::LibXML::Reader\->new(location => $ARGV[0]);
\&  my %ns;
\&  my $data = XML::CompactTree::readSubtreeToPerl( $reader, XCT_DOCUMENT_ROOT, \e%ns );
\&  $ns_map[$ns{$_}]=$_ for keys %ns;
\&  my @nodes = ($data);
\&  while (@nodes) {
\&    my $node = shift @nodes;
\&    my $type = $node\->[0];
\&    if ($type == XML_READER_TYPE_ELEMENT) {
\&      print "element $node\->[1] is from ns $node\->[2] \*(Aq$ns_map[$node\->[2]]\*(Aq\en";
\&      push @nodes, @{$node\->[4]}; # queue children
\&    } elsif ($type == XML_READER_TYPE_DOCUMENT) {
\&      push @nodes, @{$node\->[2]}; # queue children
\&    }
\&  }
.Ve
.SH "PLANNED FEATURES"
.IX Header "PLANNED FEATURES"
Planned flags:
.PP
.Vb 4
\&   XCT_USE_QNAMES \- use QNames instead of local names for all nodes
\&   XCT_TEXT_AS_STRING \- put text nodes into the tree as plain scalars
\&   XCT_PRESERVE_PARENT \- add a slot with a weak reference to the parent node
\&   XCT_MERGE_TEXT_NODES \- merge adjacent text/cdata nodes together
.Ve
.PP
Features: allow blessing the array refs to default or user-specified
classes; the default classes would provide a very small subset of \s-1DOM\s0
methods to retrieve node information, manipulate the tree, and
possibly serialize the parse tree back to \s-1XML.\s0
.SH "AUTHOR"
.IX Header "AUTHOR"
Petr Pajas, \f(CW\*(C`<pajas@matfyz.cz>\*(C'\fR
.SH "BUGS"
.IX Header "BUGS"
Please report any bugs or feature requests to
\&\f(CW\*(C`bug\-xml\-compacttree\-xs@rt.cpan.org\*(C'\fR, or through the web interface at
<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=XML\-CompactTree\-XS>.
I will be notified, and then you'll automatically be notified of progress on
your bug as I make changes.
.SH "COPYRIGHT & LICENSE"
.IX Header "COPYRIGHT & LICENSE"
Copyright 2008\-2009 Petr Pajas, All Rights Reserved.
.PP
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
.SH "SEE ALSO"
.IX Header "SEE ALSO"
.Vb 1
\&  XML::CompactTree::XS
\&
\&  XML::LibXML::Reader
.Ve