NAME¶
HTML::Copy - copy a HTML file without breaking links.
VERSION¶
Version 1.31
SYMPOSIS¶
use HTML::Copy;
HTML::Copy->htmlcopy($source_path, $destination_path);
# or
$p = HTML::Copy->new($source_path);
$p->copy_to($destination_path);
# or
open my $in, "<", $source_path;
$p = HTML::Copy->new($in)
$p->source_path($source_path); # can be omitted,
# when $source_path is in cwd.
$p->destination_path($destination_path) # can be omitted,
# when $source_path is in cwd.
open my $out, ">", $source_path;
$p->copy_to($out);
DESCRIPTION¶
This module is to copy a HTML file without beaking links in the file. This
module is a sub class of HTML::Parser.
REQUIRED MODULES¶
- HTML::Parser
CLASS METHODS¶
htmlcopy¶
HTML::Copy->htmlcopy($source_path, $destination_path);
Parse contents of $source_path, change links and write into $destination_path.
parse_file¶
$html_text = HTML::Copy->parse_file($source_path,
$destination_path);
Parse contents of $source_path and change links to copy into $destination_path.
But don't make $destination_path. Just return modified HTML. The encoding of
strings is converted into utf8.
CONSTRUCTOR METHODS¶
new¶
$p = HTML::Copy->new($source);
Make an instance of this module with specifying a source of HTML.
The argument $source can be a file path or a file handle. When a file handle is
passed, you may need to indicate a file path of the passed file handle by the
method "source_path". If calling "source_path" is omitted,
it is assumed that the location of the file handle is the current working
directory.
INSTANCE METHODS¶
copy_to¶
$p->copy_to($destination)
Parse contents of $source given in new method, change links and write into
$destination.
The argument $destination can be a file path or a file handle. When $destination
is a file handle, you may need to indicate the location of the file handle by
a method "destination_path". "destination_path" must be
called before calling "copy_to". When calling
"destination_path" is omitted, it is assumed that the locaiton of
the file handle is the current working directory.
parse_to¶
$p->parse_to($destination_path)
Parse contents of $source_path given in new method, change links and return HTML
contents to wirte $destination_path. Unlike copy_to, $destination_path will
not created and just return modified HTML. The encoding of strings is
converted into utf8.
ACCESSOR METHODS¶
source_path¶
$p->source_path
$p->source_path($path)
Get and set a source location. Usually source location is specified with the
"new" method. When a file handle is passed to "new" and
the location of the file handle is not the current working directory, you need
to use this method.
destination_path¶
$p->destination_path
$p->destination_path($path)
Get and set a destination location. Usually destination location is specified
with the "copy_to". When a file handle is passed to
"copy_to" and the location of the file handle is not the current
working directory, you need to use this method before "copy_to".
enchoding¶
$p->encoding;
Get an encoding of a source HTML.
io_layer¶
$p->io_layer;
$p->io_layer(':utf8');
Get and set PerlIO layer to read the source path and to write the destination
path. Usually it was automatically determined by $source_path's charset tag.
If charset is not specified, Encode::Guess module will be used.
encode_suspects¶
@suspects = $p->encode_sustects;
$p->encode_suspects(qw/shiftjis euc-jp/);
Add suspects of text encoding to guess the text encoding of the source HTML. If
the source HTML have charset tag, it is not required to add suspects.
source_html¶
$p->source_html;
Obtain source HTML's contents
NOTE¶
Cleanuped pathes should be given to HTML::Copy and it's instances. For example,
a verbose path like '/aa/bb/../cc' may cause converting links wrongly. This is
a limitaion of the URI module's rel method. To cleanup pathes, Cwd::realpath
is useful.
AUTHOR¶
Tetsuro KURITA <tkurita@mac.com>