Scroll to navigation

Mojo::DOM(3pm) User Contributed Perl Documentation Mojo::DOM(3pm)

NAME

Mojo::DOM - Minimalistic HTML5/XML DOM parser with CSS3 selectors

SYNOPSIS

  use Mojo::DOM;
  # Parse
  my $dom = Mojo::DOM->new('<div><p id="a">A</p><p id="b">B</p></div>');
  # Find
  my $b = $dom->at('#b');
  say $b->text;
  # Walk
  say $dom->div->p->[0]->text;
  say $dom->div->children('p')->first->{id};
  # Iterate
  $dom->find('p[id]')->each(sub { say shift->{id} });
  # Loop
  for my $e ($dom->find('p[id]')->each) {
    say $e->text;
  }
  # Modify
  $dom->div->p->[1]->append('<p id="c">C</p>');
  # Render
  say $dom;

DESCRIPTION

Mojo::DOM is a minimalistic and relaxed HTML5/XML DOM parser with CSS3 selector support. It will even try to interpret broken XML, so you should not use it for validation.

CASE SENSITIVITY

Mojo::DOM defaults to HTML5 semantics, that means all tags and attributes are lowercased and selectors need to be lowercase as well.
  my $dom = Mojo::DOM->new('<P ID="greeting">Hi!</P>');
  say $dom->at('p')->text;
  say $dom->p->{id};
If XML processing instructions are found, the parser will automatically switch into XML mode and everything becomes case sensitive.
  my $dom = Mojo::DOM->new('<?xml version="1.0"?><P ID="greeting">Hi!</P>');
  say $dom->at('P')->text;
  say $dom->P->{ID};
XML detection can also be disabled with the "xml" method.
  # Force XML semantics
  $dom->xml(1);
  # Force HTML5 semantics
  $dom->xml(0);

METHODS

Mojo::DOM implements the following methods.

"new"

  my $dom = Mojo::DOM->new;
  my $dom = Mojo::DOM->new('<foo bar="baz">test</foo>');
Construct a new Mojo::DOM object.

"all_text"

  my $trimmed   = $dom->all_text;
  my $untrimmed = $dom->all_text(0);
Extract all text content from DOM structure, smart whitespace trimming is enabled by default.
  # "foo bar baz"
  $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->all_text;
  # "foo\nbarbaz\n"
  $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->all_text(0);

"append"

  $dom = $dom->append('<p>Hi!</p>');
Append to element.
  # "<div><h1>A</h1><h2>B</h2></div>"
  $dom->parse('<div><h1>A</h1></div>')->at('h1')->append('<h2>B</h2>');

"append_content"

  $dom = $dom->append_content('<p>Hi!</p>');
Append to element content.
  # "<div><h1>AB</h1></div>"
  $dom->parse('<div><h1>A</h1></div>')->at('h1')->append_content('B');

"at"

  my $result = $dom->at('html title');
Find a single element with CSS3 selectors. All selectors from Mojo::DOM::CSS are supported.
  # Find first element with "svg" namespace definition
  my $namespace = $dom->at('[xmlns\:svg]')->{'xmlns:svg'};

"attrs"

  my $attrs = $dom->attrs;
  my $foo   = $dom->attrs('foo');
  $dom      = $dom->attrs({foo => 'bar'});
  $dom      = $dom->attrs(foo => 'bar');
Element attributes.

"charset"

  my $charset = $dom->charset;
  $dom        = $dom->charset('UTF-8');
Alias for "charset" in Mojo::DOM::HTML.

"children"

  my $collection = $dom->children;
  my $collection = $dom->children('div');
Return a Mojo::Collection object containing the children of this element, similar to "find".
  # Show type of random child element
  say $dom->children->shuffle->first->type;

"content_xml"

  my $xml = $dom->content_xml;
Render content of this element to XML.
  # "<b>test</b>"
  $dom->parse('<div><b>test</b></div>')->div->content_xml;

"find"

  my $collection = $dom->find('html title');
Find elements with CSS3 selectors and return a Mojo::Collection object. All selectors from Mojo::DOM::CSS are supported.
  # Find a specific element and extract information
  my $id = $dom->find('div')->[23]{id};
  # Extract information from multiple elements
  my @headers = $dom->find('h1, h2, h3')->map(sub { shift->text })->each;

"namespace"

  my $namespace = $dom->namespace;
Find element namespace.
   # Find namespace for an element with namespace prefix
   my $namespace = $dom->at('svg > svg\:circle')->namespace;
   # Find namespace for an element that may or may not have a namespace prefix
   my $namespace = $dom->at('svg > circle')->namespace;

"parent"

  my $parent = $dom->parent;
Parent of element.

"parse"

  $dom = $dom->parse('<foo bar="baz">test</foo>');
Alias for "parse" in Mojo::DOM::HTML.
  # Parse UTF-8 encoded XML
  my $dom = Mojo::DOM->new->charset('UTF-8')->xml(1)->parse($xml);

"prepend"

  $dom = $dom->prepend('<p>Hi!</p>');
Prepend to element.
  # "<div><h1>A</h1><h2>B</h2></div>"
  $dom->parse('<div><h2>B</h2></div>')->at('h2')->prepend('<h1>A</h1>');

"prepend_content"

  $dom = $dom->prepend_content('<p>Hi!</p>');
Prepend to element content.
  # "<div><h2>AB</h2></div>"
  $dom->parse('<div><h2>B</h2></div>')->at('h2')->prepend_content('A');

"replace"

  $dom = $dom->replace('<div>test</div>');
Replace elements.
  # "<div><h2>B</h2></div>"
  $dom->parse('<div><h1>A</h1></div>')->at('h1')->replace('<h2>B</h2>');

"replace_content"

  $dom = $dom->replace_content('test');
Replace element content.
  # "<div><h1>B</h1></div>"
  $dom->parse('<div><h1>A</h1></div>')->at('h1')->replace_content('B');

"root"

  my $root = $dom->root;
Find root node.

"text"

  my $trimmed   = $dom->text;
  my $untrimmed = $dom->text(0);
Extract text content from element only (not including child elements), smart whitespace trimming is enabled by default.
  # "foo baz"
  $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->text;
  # "foo\nbaz\n"
  $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->text(0);

"text_after"

  my $trimmed   = $dom->text_after;
  my $untrimmed = $dom->text_after(0);
Extract text content immediately following element, smart whitespace trimming is enabled by default.
  # "baz"
  $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->p->text_after;
  # "baz\n"
  $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->p->text_after(0);

"text_before"

  my $trimmed   = $dom->text_before;
  my $untrimmed = $dom->text_before(0);
Extract text content immediately preceding element, smart whitespace trimming is enabled by default.
  # "foo"
  $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->p->text_before;
  # "foo\n"
  $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->p->text_before(0);

"to_xml"

  my $xml = $dom->to_xml;
Render this element and its content to XML.
  # "<div><b>test</b></div>"
  $dom->parse('<div><b>test</b></div>')->div->to_xml;

"tree"

  my $tree = $dom->tree;
  $dom     = $dom->tree(['root', [qw(text lalala)]]);
Alias for "tree" in Mojo::DOM::HTML.

"type"

  my $type = $dom->type;
  $dom     = $dom->type('div');
Element type.
  # List types of child elements
  $dom->children->each(sub { say $_->type });

"xml"

  my $xml = $dom->xml;
  $dom    = $dom->xml(1);
Alias for "xml" in Mojo::DOM::HTML.

CHILD ELEMENTS

In addition to the methods above, many child elements are also automatically available as object methods, which return a Mojo::DOM or Mojo::Collection object, depending on number of children.
  say $dom->p->text;
  say $dom->div->[23]->text;
  $dom->div->each(sub { say $_->text });

ELEMENT ATTRIBUTES

Direct hash reference access to element attributes is also possible.
  say $dom->{foo};
  say $dom->div->{id};

SEE ALSO

Mojolicious, Mojolicious::Guides, <http://mojolicio.us>.
2012-09-05 perl v5.14.2