RSSLite(3pm)

User Contributed Perl Documentation

RSSLite(3pm)

NAME¶

XML::RSSLite - lightweight, "relaxed" RSS (and XML-ish) parser

SYNOPSIS¶

  use XML::RSSLite;
  . . .
  parseRSS(\%result, \$content);
  print "=== Channel ===\n",
        "Title: $result{'title'}\n",
        "Desc:  $result{'description'}\n",
        "Link:  $result{'link'}\n\n";
  foreach $item (@{$result{'item'}}) {
  print "  --- Item ---\n",
        "  Title: $item->{'title'}\n",
        "  Desc:  $item->{'description'}\n",
        "  Link:  $item->{'link'}\n\n";
  }

DESCRIPTION¶

This module attempts to extract the maximum amount of content from available documents, and is less concerned with XML compliance than alternatives. Rather than rely on XML::Parser, it uses heuristics and good old-fashioned Perl regular expressions. It stores the data in a simple hash structure, and "aliases" certain tags so that when done, you can count on having the minimal data necessary for re-constructing a valid RSS file. This means you get the basic title, description, and link for a channel and its items.

This module extracts more usable links by parsing "scriptingNews" and "weblog" formats in addition to RDF & RSS. It also "sanitizes" the output for best results. The munging includes:

Remove html tags to leave plain text

Remove characters other than 0-9~!@#$%^&*()-+=a-zA-Z[];',.:"<>?\s

Remove leading whitespace from URIs

Use <url> tags when <link> is empty

Use misplaced urls in <title> when <link> is empty

Exract links from <a href=...> if required

Limit links to ftp and http(s)

Join relative item urls (beginning with / or #) to the site base

EXPORT¶

parseRSS($outHashRef, $inScalarRef): $inScalarRef is a reference to a scalar containing the document to be parsed, the contents will effectively be destroyed. $outHashRef is a reference to the hash within which to store the parsed content.

EXPORTABLE¶

parseXML(\%parsedTree, \$parseThis, 'topTag', $comments);

parsedTree - required: Reference to hash to store the parsed document within.

parseThis - required: Reference to scalar containing the document to parse.

topTag - optional: Tag to consider the root node, leaving this undefined is not recommended.

comments - optional

false will remove contents from parseThis

true will not remove comments from parseThis

array reference is true, comments are stored here

CAVEATS¶

This is not a conforming parser. It does not handle the following

•

  <foo bar=">">

•

  <foo><bar> <bar></bar> <bar></bar> </bar></foo>

•

  <![CDATA[ ]]>

•

PI

It's non-validating, without a DTD the following cannot be properly addressed

entities

namespaces: This may or may not be arriving in some future release.

AUTHOR¶

Jerrad Pierce <jpierce@cpan.org>.

Scott Thomason <scott@thomasons.org>

LICENSE¶

Portions Copyright (c) 2002,2003,2009 Jerrad Pierce, (c) 2000 Scott Thomason. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

2010-10-25

perl v5.10.1

Source file:	XML::RSSLite.3pm.en.gz (from libxml-rsslite-perl 0.15+dfsg-2)
Source last updated:	2010-10-25T14:54:12Z
Converted to HTML:	2017-06-07T16:59:51Z

NAME¶

SYNOPSIS¶

DESCRIPTION¶

EXPORT¶

EXPORTABLE¶

CAVEATS¶

SEE ALSO¶

AUTHOR¶

LICENSE¶