NAME¶
XML::Handler::YAWriter - Yet another Perl SAX XML Writer
SYNOPSIS¶
use XML::Handler::YAWriter;
my $ya = new XML::Handler::YAWriter( %options );
my $perlsax = new XML::Parser::PerlSAX( 'Handler' => $ya );
DESCRIPTION¶
YAWriter implements Yet Another XML::Handler::Writer. The reasons for this one
are that I needed a flexible escaping technique, and want some kind of pretty
printing. If an instance of YAWriter is created without any options, the
default behavior is to produce an array of strings containing the XML in :
@{$ya->{Strings}}
Options¶
Options are given in the usual 'key' => 'value' idiom.
- Output IO::File
- This option tells YAWriter to use an already open file for
output, instead of using $ya->{Strings} to store the array of strings.
It should be noted that the only thing the object needs to implement is
the print method. So anything can be used to receive a stream of strings
from YAWriter.
- AsFile string
- This option will cause start_document to open named file
and end_document to close it. Use the literal dash "-" if you
want to print on standard output.
- AsPipe string
- This option will cause start_document to open a pipe and
end_document to close it. The pipe is a normal shell command. Secure shell
comes handy but has a 2GB limit on most systems.
- AsArray boolean
- This option will force storage of the XML in
$ya->{Strings}, even if the Output option is given.
- AsString boolean
- This option will cause end_document to return the complete
XML document in a single string. Most SAX drivers return the value of
end_document as a result of their parse method. As this may not work with
some combinations of SAX drivers and filters, a join of $ya->{Strings}
in the controlling method is preferred.
- Encoding string
- This will change the default encoding from UTF-8 to
anything you like. You should ensure that given data are already in this
encoding or provide an Escape hash, to tell YAWriter about the
recoding.
- Escape hash
- The Escape hash defines substitutions that have to be done
to any string, with the exception of the processing_instruction and
doctype_decl methods, where I think that escaping of target and data would
cause more trouble than necessary.
The default value for Escape is
$XML::Handler::YAWriter::escape = {
'&' => '&',
'<' => '<',
'>' => '>',
'"' => '"',
'--' => '--'
};
YAWriter will use an evaluated sub to make the recoding based on a given
Escape hash reasonably fast. Future versions may use XS to improve this
performance bottleneck.
- Pretty hash
- Hash of string => boolean tuples, to define kind of
prettyprinting. Default to undef. Possible string values:
- AddHiddenNewline boolean
- Add hidden newline before ">"
- AddHiddenAttrTab boolean
- Add hidden tabulation for attributes
- CatchEmptyElement boolean
- Catch empty Elements, apply "/>"
compression
- CatchWhiteSpace boolean
- Catch whitespace with comments
- CompactAttrIndent
- Places Attributes on the same line as the Element
- IsSGML boolean
- This option will cause start_document,
processing_instruction and doctype_decl to appear as SGML. The SGML is
still well-formed of course, if your SAX events are well-formed.
- NoComments boolean
- Supress Comments
- NoDTD boolean
- Supress DTD
- NoPI boolean
- Supress Processing Instructions
- NoProlog boolean
- Supress <?xml ... ?> Prolog
- NoWhiteSpace boolean
- Supress WhiteSpace to clean documents from prior pretty
printing.
- PrettyWhiteIndent boolean
- Add visible indent before any eventstring
- PrettyWhiteNewline boolean
- Add visible newlines before any eventstring
- SAX1 boolean (not yet implemented)
- Output only SAX1 compliant eventstrings
Notes:¶
Correct handling of start_document and end_document is required!
The YAWriter Object initialises its structures during start_document and does
its cleanup during end_document. If you forget to call start_document, any
other method will break during the run. Most likely place is the encode
method, trying to eval undef as a subroutine. If you forget to call
end_document, you should not use a single instance of YAWriter more than once.
For small documents AsArray may be the fastest method and AsString the easiest
one to receive the output of YAWriter. But AsString and AsArray may run out of
memory with infinite SAX streams. The only method XML::Handler::Writer calls
on a given Output object is the print method. So it's easy to use a self
written Output object to improve streaming.
A single instance of XML::Handler::YAWriter is able to produce more than one
file in a single run. Be sure to provide a fresh IO::File as Output before you
call start_document and close this File after calling end_document. Or provide
a filename in AsFile, so start_document and end_document can open and close
its own filehandle.
Automatic recoding between 8bit and 16bit does not work in any Perl correctly !
I have Perl-5.00563 at home and here I can specify "use utf8;" in the
right places to make recoding work. But I dislike saying "use
5.00555;" because many systems run 5.00503.
If you use some 8bit character set internally and want use national characters,
either state your character as Encoding to be ISO-8859-1, or provide an Escape
hash similar to the following :
$ya->{'Escape'} = {
'&' => '&',
'<' => '<',
'>' => '>',
'"' => '"',
'--' => '--'
'oe' => 'ö'
'ae' => 'ä'
'ue' => 'ü'
'Oe' => 'Ö'
'Ae' => 'Ä'
'Ue' => 'Ü'
'ss' => 'ß'
};
You may abuse YAWriter to clean whitespace from XML documents. Take a look at
test.pl, doing just that with an XML::Edifact message, without querying the
DTD. This may work in 99% of the cases where you want to get rid of ignorable
whitespace caused by the various forms of pretty printing.
my $ya = new XML::Handler::YAWriter(
'Output' => new IO::File ( ">-" );
'Pretty' => {
'NoWhiteSpace'=>1,
'NoComments'=>1,
'AddHiddenNewline'=>1,
'AddHiddenAttrTab'=>1,
} );
XML::Handler::Writer implements any method XML::Parser::PerlSAX wants. This
extends the Java SAX1.0 specification. I have in mind using
Pretty=>SAX1=>1 to disable this feature, if abusing YAWriter for a SAX
proxy.
AUTHOR¶
Michael Koehne, Kraehe@Copyleft.De
Thanks¶
"Derksen, Eduard (Enno), CSCIO" <enno@att.com> helped me with
the Escape hash and gave quite a lot of useful comments.
SEE ALSO¶
perl and XML::Parser::PerlSAX