.\" -*- mode: troff; coding: utf-8 -*-
.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
.ie n \{\
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
.    if \nF \{\
.        de IX
.        tm Index:\\$1\t\\n%\t"\\$2"
..
.        if !\nF==2 \{\
.            nr % 0
.            nr F 2
.        \}
.    \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "TV_EXTRACTINFO_EN 1p"
.TH TV_EXTRACTINFO_EN 1p 2024-02-24 "perl v5.38.2" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH NAME
tv_extractinfo_en \- read English\-language listings and extract info
from programme descriptions.
.SH SYNOPSIS
.IX Header "SYNOPSIS"
tv_extractinfo_en [\-\-help] [\-\-output FILE] [FILE...]
.SH DESCRIPTION
.IX Header "DESCRIPTION"
Read XMLTV data and attempt to extract information from
English-language programme descriptions, putting it into
machine-readable form.  For example the human-readable text '(repeat)'
in a programme description might be replaced by the XML element
<previously\-shown>.
.PP
\&\fB\-\-output FILE\fR write to FILE rather than standard output
.PP
This tool also attempts to split multipart programmes into their
constituents, by looking for a description that seems to contain lots
of times and titles.  But this depends on the description following
one particular style and is useful only for some listings sources
(Ananova).
.PP
If some text is marked with the 'lang' attribute as being some
language other than English ('en'), it is ignored.
.SH "SEE ALSO"
.IX Header "SEE ALSO"
\&\fBxmltv\fR\|(5).
.SH AUTHOR
.IX Header "AUTHOR"
Ed Avis, ed@membled.com
.SH BUGS
.IX Header "BUGS"
Trying to parse human-readable text is always error-prone, more so
with the simple regexp-based approach used here.  But because TV
listing descriptions usually conform to one of a few set styles,
tv_extractinfo_en does reasonably well.  It is fairly conservative,
trying to avoid false positives (extracting 'information' which
isn't really there) even though this means some false negatives
(failing to extract information and leaving it in the human-readable
text).
.PP
However, the leftover bits of text after extracting information may
not form a meaningful English sentence, or the punctuation may be
wrong.
.PP
On the two listings sources currently supported by the XMLTV package,
this program does a reasonably good job.  But it has not been tested
with every source of anglophone TV listings.