.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.40) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "XML::SAX::ByRecord 3pm" .TH XML::SAX::ByRecord 3pm "2020-12-29" "perl v5.32.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" XML::SAX::ByRecord \- Record oriented processing of (data) documents .SH "VERSION" .IX Header "VERSION" version 0.46 .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use XML::SAX::Machines qw( ByRecord ) ; \& \& my $m = ByRecord( \& "My::RecordFilter1", \& "My::RecordFilter2", \& ... \& { \& Handler => $h, ## optional \& } \& ); \& \& $m\->parse_uri( "foo.xml" ); .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" XML::SAX::ByRecord is a \s-1SAX\s0 machine that treats a document as a series of records. Everything before and after the records is emitted as-is while the records are excerpted in to little mini-documents and run one at a time through the filter pipeline contained in ByRecord. .PP The output is a document that has the same exact things before, after, and between the records that the input document did, but which has run each record through a filter. So if a document has 10 records in it, the per-record filter pipeline will see 10 sets of ( start_document, body of record, end_document ) events. An example is below. .PP This has several use cases: .IP "\(bu" 4 Big, record oriented documents .Sp Big documents can be treated a record at a time with various \s-1DOM\s0 oriented processors like XML::Filter::XSLT. .IP "\(bu" 4 Streaming \s-1XML\s0 .Sp Small sections of an \s-1XML\s0 stream can be run through a document processor without holding up the stream. .IP "\(bu" 4 Record oriented style sheets / processors .Sp Sometimes it's just plain easier to write a style sheet or \s-1SAX\s0 filter that applies to a single record at at time, rather than having to run through a series of records. .SS "Topology" .IX Subsection "Topology" Here's how the innards look: .PP .Vb 12 \& +\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+ \& | An XML:SAX::ByRecord | \& | Intake | \& | +\-\-\-\-\-\-\-\-\-\-+ +\-\-\-\-\-\-\-\-\-+ +\-\-\-\-\-\-\-\-+ Exhaust | \& \-\-+\-\->| Splitter |\-\-\->| Stage_1 |\-\->...\-\->| Merger |\-\-\-\-\-\-\-\-\-\-+\-\-\-\-\-> \& | +\-\-\-\-\-\-\-\-\-\-+ +\-\-\-\-\-\-\-\-\-+ +\-\-\-\-\-\-\-\-+ | \& | \e ^ | \& | \e | | \& | +\-\-\-\-\-\-\-\-\-\->\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+ | \& | Events not in any records | \& | | \& +\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+ .Ve .PP The \f(CW\*(C`Splitter\*(C'\fR is an XML::Filter::DocSplitter by default, and the \&\f(CW\*(C`Merger\*(C'\fR is an XML::Filter::Merger by default. The line that bypasses the \*(L"Stage_1 ...\*(R" filter pipeline is used for all events that do not occur in a record. All events that occur in a record pass through the filter pipeline. .SS "Example" .IX Subsection "Example" Here's a quick little filter to uppercase text content: .PP .Vb 1 \& package My::Filter::Uc; \& \& use vars qw( @ISA ); \& @ISA = qw( XML::SAX::Base ); \& \& use XML::SAX::Base; \& \& sub characters { \& my $self = shift; \& my ( $data ) = @_; \& $data\->{Data} = uc $data\->{Data}; \& $self\->SUPER::characters( @_ ); \& } .Ve .PP And here's a little machine that uses it: .PP .Vb 4 \& $m = Pipeline( \& ByRecord( "My::Filter::Uc" ), \& \e$out, \& ); .Ve .PP When fed a document like: .PP .Vb 5 \& a \& b c \& d e \& f g \& .Ve .PP the output looks like: .PP .Vb 5 \& a \& B c \& C e \& D g \& .Ve .PP and the My::Filter::Uc got three sets of events like: .PP .Vb 5 \& start_document \& start_element: \& characters: \*(Aqb\*(Aq \& end_element: \& end_document \& \& start_document \& start_element: \& characters: \*(Aqd\*(Aq \& end_element: \& end_document \& \& start_document \& start_element: \& characters: \*(Aqf\*(Aq \& end_element: \& end_document .Ve .SH "NAME" XML::SAX::ByRecord \- Record oriented processing of (data) documents .SH "METHODS" .IX Header "METHODS" .IP "new" 4 .IX Item "new" .Vb 1 \& my $d = XML::SAX::ByRecord\->new( @channels, \e%options ); .Ve .Sp Longhand for calling the ByRecord function exported by XML::SAX::Machines. .SH "CREDIT" .IX Header "CREDIT" Proposed by Matt Sergeant, with advise by Kip Hampton and Robin Berjon. .SH "Writing an aggregator." .IX Header "Writing an aggregator." To be written. Pretty much just that \f(CW\*(C`start_manifold_processing\*(C'\fR and \&\f(CW\*(C`end_manifold_processing\*(C'\fR need to be provided. See XML::Filter::Merger and it's source code for a starter. .SH "AUTHORS" .IX Header "AUTHORS" .IP "\(bu" 4 Barry Slaymaker .IP "\(bu" 4 Chris Prather .SH "COPYRIGHT AND LICENSE" .IX Header "COPYRIGHT AND LICENSE" This software is copyright (c) 2013 by Barry Slaymaker. .PP This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.