NAME¶
tv_extractinfo_en - read English-language listings and extract info from
programme descriptions.
SYNOPSIS¶
tv_extractinfo_en [--help] [--output FILE] [FILE...]
DESCRIPTION¶
Read XMLTV data and attempt to extract information from English-language
programme descriptions, putting it into machine-readable form. For example the
human-readable text '(repeat)' in a programme description might be replaced by
the XML element <previously-shown>.
--output FILE write to FILE rather than standard output
This tool also attempts to split multipart programmes into their constituents,
by looking for a description that seems to contain lots of times and titles.
But this depends on the description following one particular style and is
useful only for some listings sources (Ananova).
If some text is marked with the 'lang' attribute as being some language other
than English ('en'), it is ignored.
SEE ALSO¶
xmltv(5).
AUTHOR¶
Ed Avis, ed@membled.com
BUGS¶
Trying to parse human-readable text is always error-prone, more so with the
simple regexp-based approach used here. But because TV listing descriptions
usually conform to one of a few set styles, tv_extractinfo_en does reasonably
well. It is fairly conservative, trying to avoid false positives (extracting
'information' which isn't really there) even though this means some false
negatives (failing to extract information and leaving it in the human-readable
text).
However, the leftover bits of text after extracting information may not form a
meaningful English sentence, or the punctuation may be wrong.
On the two listings sources currently supported by the XMLTV package, this
program does a reasonably good job. But it has not been tested with every
source of anglophone TV listings.