.\" Man page generated from reStructuredText. . .TH "PODCASTPARSER" "1" "Jul 06, 2020" "0.6.5" "podcastparser" .SH NAME podcastparser \- podcastparser Documentation . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .sp \fIpodcastparser\fP is a simple and fast podcast feed parser library in Python. The two primary users of the library are the \fI\%gPodder Podcast Client\fP and the \fI\%gpodder.net web service\fP\&. .sp The following feed types are supported: .INDENT 0.0 .IP \(bu 2 Really Simple Syndication (\fI\%RSS 2.0\fP) .IP \(bu 2 Atom Syndication Format (\fI\%RFC 4287\fP) .UNINDENT .sp The following specifications are supported: .INDENT 0.0 .IP \(bu 2 \fI\%Paged Feeds\fP (\fI\%RFC 5005\fP) .IP \(bu 2 \fI\%Podlove Simple Chapters\fP .UNINDENT .sp These formats only specify the possible markup elements and attributes. We recommend that you also read the \fI\%Podcast Feed Best Practice\fP guide if you want to optimize your feeds for best display in podcast clients. .sp Where times and durations are used, the values are expected to be formatted either as seconds or as \fI\%RFC 2326\fP Normal Play Time (NPT). .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C import podcastparser import urllib feedurl = \(aqhttp://example.com/feed.xml\(aq parsed = podcastparser.parse(feedurl, urllib.urlopen(feedurl)) # parsed is a dict import pprint pprint.pprint(parsed) .ft P .fi .UNINDENT .UNINDENT .sp For both RSS and Atom feeds, only a subset of elements (those that are relevant to podcast client applications) is parsed. This section describes which elements and attributes are parsed and how the contents are interpreted/used. .SH RSS .INDENT 0.0 .TP \fBrss@xml:base\fP Base URL for all relative links in the RSS file. .TP \fBrss/channel\fP Podcast. .TP \fBrss/channel/title\fP Podcast title (whitespace is squashed). .TP \fBrss/channel/link\fP Podcast website. .TP \fBrss/channel/description\fP Podcast description (whitespace is squashed). .TP \fBrss/channel/image/url\fP Podcast cover art. .TP \fBrss/channel/itunes:image\fP Podcast cover art (alternative). .TP \fBrss/channel/atom:link@rel=payment\fP Podcast payment URL (e.g. Flattr). .TP \fBrss/channel/item\fP Episode. .TP \fBrss/channel/item/guid\fP Episode unique identifier (GUID), mandatory. .TP \fBrss/channel/item/title\fP Episode title (whitespace is squashed). .TP \fBrss/channel/item/link\fP Episode website. .TP \fBrss/channel/item/description\fP Episode description. If it contains html, it’s returned as description_html. Otherwise it’s returned as description (whitespace is squashed). See Mozilla’s article \fIWhy RSS Content Module is Popular\fP .TP \fBrss/channel/item/itunes:summary\fP Episode description (whitespace is squashed). .TP \fBrss/channel/item/itunes:subtitle\fP Episode subtitled / one\-line description (whitespace is squashed). .TP \fBrss/channel/item/content:encoded\fP Episode description in HTML. Best source for description_html. .TP \fBrss/channel/item/itunes:duration\fP Episode duration. .TP \fBrss/channel/item/pubDate\fP Episode publication date. .TP \fBrss/channel/item/atom:link@rel=payment\fP Episode payment URL (e.g. Flattr). .TP \fBrss/channel/item/atom:link@rel=enclosure\fP File download URL (@href), size (@length) and mime type (@type). .TP \fBrss/channel/item/itunes:image\fP Episode art URL. .TP \fBrss/channel/item/media:content\fP File download URL (@url), size (@fileSize) and mime type (@type). .TP \fBrss/channel/item/enclosure\fP File download URL (@url), size (@length) and mime type (@type). .TP \fBrss/channel/item/psc:chapters\fP Podlove Simple Chapters, version 1.1 and 1.2. .TP \fBrss/channel/item/psc:chapters/psc:chapter\fP Chapter entry (@start, @title, @href and @image). .UNINDENT .SH ATOM .sp For Atom feeds, \fIpodcastparser\fP will handle the following elements and attributes: .INDENT 0.0 .TP \fBatom:feed\fP Podcast. .TP \fBatom:feed/atom:title\fP Podcast title (whitespace is squashed). .TP \fBatom:feed/atom:subtitle\fP Podcast description (whitespace is squashed). .TP \fBatom:feed/atom:icon\fP Podcast cover art. .TP \fBatom:feed/atom:link@href\fP Podcast website. .TP \fBatom:feed/atom:entry\fP Episode. .TP \fBatom:feed/atom:entry/atom:id\fP Episode unique identifier (GUID), mandatory. .TP \fBatom:feed/atom:entry/atom:title\fP Episode title (whitespace is squashed). .TP \fBatom:feed/atom:entry/atom:link@rel=enclosure\fP File download URL (@href), size (@length) and mime type (@type). .TP \fBatom:feed/atom:entry/atom:link@rel=(self|alternate)\fP Episode website. .TP \fBatom:feed/atom:entry/atom:link@rel=payment\fP Episode payment URL (e.g. Flattr). .TP \fBatom:feed/atom:entry/atom:content\fP Episode description (in HTML or plaintext). .TP \fBatom:feed/atom:entry/atom:published\fP Episode publication date. .TP \fBatom:feed/atom:entry/psc:chapters\fP Podlove Simple Chapters, version 1.1 and 1.2. .TP \fBatom:feed/atom:entry/psc:chapters/psc:chapter\fP Chapter entry (@start, @title, @href and @image). .UNINDENT .sp Simplified, fast RSS parser .INDENT 0.0 .TP .B exception podcastparser.FeedParseError(msg, exception, locator) Exception raised when asked to parse an invalid feed .sp This exception allows users of this library to catch exceptions without having to import the XML parsing library themselves. .UNINDENT .INDENT 0.0 .TP .B class podcastparser.PodcastHandler(url, max_episodes) .INDENT 7.0 .TP .B characters(chars) Receive notification of character data. .sp The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information. .UNINDENT .INDENT 7.0 .TP .B endElement(name) Signals the end of an element in non\-namespace mode. .sp The name parameter contains the name of the element type, just as with the startElement event. .UNINDENT .INDENT 7.0 .TP .B startElement(name, attrs) Signals the start of an element in non\-namespace mode. .sp The name parameter contains the raw XML 1.0 name of the element type as a string and the attrs parameter holds an instance of the Attributes class containing the attributes of the element. .UNINDENT .UNINDENT .INDENT 0.0 .TP .B class podcastparser.RSSItemDescription RSS 2.0 almost encourages to put html content in item/description but content:encoded is the better source of html content and itunes:summary is known to contain the short textual description of the item. So use a heuristic to attribute text to either description or description_html, without overriding existing values. .UNINDENT .INDENT 0.0 .TP .B podcastparser.file_basename_no_extension(filename) Returns filename without extension .sp .nf .ft C >>> file_basename_no_extension(\(aq/home/me/file.txt\(aq) \(aqfile\(aq .ft P .fi .sp .nf .ft C >>> file_basename_no_extension(\(aqfile\(aq) \(aqfile\(aq .ft P .fi .UNINDENT .INDENT 0.0 .TP .B podcastparser.is_html(text) Heuristically tell if text is HTML .sp By looking for an open tag (more or less:) >>> is_html(‘