.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "HTTP::Proxy::BodyFilter::htmlparser 3pm" .TH HTTP::Proxy::BodyFilter::htmlparser 3pm "2022-12-04" "perl v5.36.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" HTTP::Proxy::BodyFilter::htmlparser \- Filter using HTML::Parser .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use HTTP::Proxy::BodyFilter::htmlparser; \& \& # $parser is a HTML::Parser object \& $proxy\->push_filter( \& mime => \*(Aqtext/html\*(Aq, \& response => HTTP::Proxy::BodyFilter::htmlparser\->new( $parser ); \& ); .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" The HTTP::Proxy::BodyFilter::htmlparser lets you create a filter based on the HTML::Parser object of your choice. .PP This filter takes a HTML::Parser object as an argument to its constructor. The filter is either read-only or read-write. A read-only filter will not allow you to change the data on the fly. If you request a read-write filter, you'll have to rewrite the response-body completely. .PP With a read-write filter, you \fBmust\fR recreate the whole body data. This is mainly due to the fact that the HTML::Parser has its own buffering system, and that there is no easy way to correlate the data that triggered the HTML::Parser event and its original position in the chunk sent by the origin server. See below for details. .PP Note that a simple filter that modify the \s-1HTML\s0 text (not the tags) can be created more easily with HTTP::Proxy::BodyFilter::htmltext. .SS "Creating a HTML::Parser that rewrites pages" .IX Subsection "Creating a HTML::Parser that rewrites pages" A read-write filter is declared by passing \f(CW\*(C`rw => 1\*(C'\fR to the constructor: .PP .Vb 1 \& HTTP::Proxy::BodyFilter::htmlparser\->new( $parser, rw => 1 ); .Ve .PP To be able to modify the body of a message, a filter created with HTTP::Proxy::BodyFilter::htmlparser must rewrite it completely. The HTML::Parser object can update a special attribute named \f(CW\*(C`output\*(C'\fR. To do so, the HTML::Parser handler will have to request the \f(CW\*(C`self\*(C'\fR attribute (that is to say, require access to the parser itself) and update its \f(CW\*(C`output\*(C'\fR key. .PP The following attributes are added to the HTML::Parser object by this filter: .IP "output" 4 .IX Item "output" A string that will hold the data sent back by the proxy. .Sp This string will be used as a replacement for the body data only if the filter is read-write, that is to say, if it was initialised with \&\f(CW\*(C`rw => 1\*(C'\fR. .Sp Data should always be \fBappended\fR to \f(CW\*(C`$parser\->{output}\*(C'\fR. .IP "message" 4 .IX Item "message" A reference to the HTTP::Message that triggered the filter. .IP "protocol" 4 .IX Item "protocol" A reference to the HTTP::Protocol object. .SH "METHODS" .IX Header "METHODS" This filter defines three methods, called automatically: .IP "\fBfilter()\fR" 4 .IX Item "filter()" The \f(CW\*(C`filter()\*(C'\fR method handles all the interactions with the HTML::Parser object. .IP "\fBinit()\fR" 4 .IX Item "init()" Initialise the filter with the HTML::Parser object passed to the constructor. .IP "\fBwill_modify()\fR" 4 .IX Item "will_modify()" This method returns a boolean value that indicates to the system if it will modify the data passing through. The value is actually the value of the \f(CW\*(C`rw\*(C'\fR parameter passed to the constructor. .SH "SEE ALSO" .IX Header "SEE ALSO" HTTP::Proxy, HTTP::Proxy::Bodyfilter, HTTP::Proxy::BodyFilter::htmltext. .SH "AUTHOR" .IX Header "AUTHOR" Philippe \*(L"BooK\*(R" Bruhat, . .SH "COPYRIGHT" .IX Header "COPYRIGHT" Copyright 2003\-2015, Philippe Bruhat. .SH "LICENSE" .IX Header "LICENSE" This module is free software; you can redistribute it or modify it under the same terms as Perl itself.