NAME¶
HTML::HTML5::Sanity - make HTML5 DOM trees less insane
SYNOPSIS¶
use HTML::HTML5::Parser;
use HTML::HTML5::Sanity;
my $parser = HTML::HTML5::Parser->new;
my $html5_dom = $parser->parse_file('http://example.com/');
my $sane_dom = fix_document($html5_dom);
DESCRIPTION¶
The Document Object Model (DOM) generated by HTML::HTML5::Parser meets the
requirements of the HTML5 spec, but will probably catch a lot of people by
surprise.
The main oddity is that elements and attributes which appear to be namespaced
are not really. For example, the following element:
<div xml:lang="fr">...</div>
Looks like it should be parsed so that it has an attribute "lang" in
the XML namespace. Not so. It will really be parsed as having the attribute
"xml:lang" in the null namespace.
- "fix_document($document)"
-
$sane_dom = fix_document($html5_dom);
Returns a modified copy of the DOM and leaving the original DOM
unmodified.
- "fix_element($element_node, $new_document_node,
\%namespaces)"
- Don't use this. Not exported.
- "fix_attribute($attribute_node, $new_element_node,
\%namespaces)"
- Don't use this. Not exported.
- $HTML::HTML5::Sanity::FIX_LANG_ATTRIBUTES
-
$HTML::HTML5::Sanity::FIX_LANG_ATTRIBUTES = 2;
$sane_dom = fix_document($html5_dom);
If set to 1 (the default), the package will detect invalid values in @lang
and @xml:lang, and remove the attribute if it is invalid. If set to 2, it
will also attempt to canonicalise the value (e.g. 'EN_GB' will be
converted to to 'en-GB'). If set to 0, then the value of language
attributes is not checked.
BUGS¶
Please report any bugs to <
http://rt.cpan.org/>.
SEE ALSO¶
HTML::HTML5::Parser, XML::LibXML, Task::HTML5.
AUTHOR¶
Toby Inkster <tobyink@cpan.org>.
COPYRIGHT AND LICENSE¶
Copyright (C) 2009-2011 by Toby Inkster
This library is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.