NAME¶
HTML::HTML5::Sanity - make HTML5 DOM trees less insane
SYNOPSIS¶
use HTML::HTML5::Parser;
use HTML::HTML5::Sanity;
my $parser = HTML::HTML5::Parser->new;
my $html5_dom = $parser->parse_file('http://example.com/');
my $sane_dom = fix_document($html5_dom);
DESCRIPTION¶
The Document Object Model (DOM) generated by HTML::HTML5::Parser meets the
requirements of the HTML5 spec, but will probably catch a lot of people by
surprise.
The main oddity is that elements and attributes which appear to be namespaced
are not really. For example, the following element:
<div xml:lang="fr">...</div>
Looks like it should be parsed so that it has an attribute "lang" in
the XML namespace. Not so. It will really be parsed as having the attribute
"xml:lang" in the null namespace.
- "fix_document($document)"
-
$sane_dom = fix_document($html5_dom);
Returns a modified copy of the DOM and leaving the original DOM
unmodified.
- "fix_element($element_node, $new_document_node,
\%namespaces)"
- Don't use this. Not exported.
- "fix_attribute($attribute_node, $new_element_node,
\%namespaces)"
- Don't use this. Not exported.
- $HTML::HTML5::Sanity::FIX_LANG_ATTRIBUTES
-
$HTML::HTML5::Sanity::FIX_LANG_ATTRIBUTES = 2;
$sane_dom = fix_document($html5_dom);
If set to 1 (the default), the package will detect invalid values in @lang
and @xml:lang, and remove the attribute if it is invalid. If set to 2, it
will also attempt to canonicalise the value (e.g. 'EN_GB' will be
converted to to 'en-GB'). If set to 0, then the value of language
attributes is not checked.
BUGS¶
Please report any bugs to <
http://rt.cpan.org/>.
SEE ALSO¶
HTML::HTML5::Parser, XML::LibXML, Task::HTML5.
AUTHOR¶
Toby Inkster <tobyink@cpan.org>.
COPYRIGHT AND LICENSE¶
Copyright (C) 2009-2013 by Toby Inkster
This library is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.
DISCLAIMER OF WARRANTIES¶
THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.