.de d \" begin display .sp .in +4 .nf .. .de e \" end display .in -4 .fi .sp .. .TH "HXEXTRACT" "1" "10 Jul 2011" "7.x" "HTML-XML-utils" .SH NAME hxextract \- extract selected elements from a HTML or XML file .SH SYNOPSIS .B hxextract .RB "[\| " \-h .RB "| " \-? " \|]" .RB "[\| " \-x " \|]" .RB "[\| " \-s .IR text " \|]" .RB "[\| " \-e .IR text " \|]" .RB "[\| " \-b .IR base " \|]" .I element-or-class .RB "[\| " \-c .IR "configfile" " | " .IR file\-or\-URL " \|]" .SH DESCRIPTION .B hxextract outputs all elements with a certain name and/or class. .PP Input must be well-formed, since no HTML heuristics are applied. .SH OPTIONS The following options are supported: .TP 10 .B \-x Use XML format conventions. .TP 10 .BI \-s " text" Insert .I text at the start of the output. .TP 10 .BI \-e " text" Insert .I text at the end of the output. .TP 10 .BI \-b " base" URL base .TP 10 .BI \-c " configfile" Read @chapter lines from .I configfile (lines must be of the form "@chapter filename") and extract elements from each of those files. .TP 10 .BR \-h ", " \-? Print command usage. .SH OPERANDS The following operands are supported: .TP 10 .I element-or-class The name of an element to extract (e.g., "H2"), or the name of a class preceded by "." (e.g., ".example") or a combination of both (e.g., "H2.example"). .TP .I file-or-URL A file name or a URL. To read from standard input, use "-". .SH ENVIRONMENT To use a proxy to retrieve remote files, set the environment variables .B http_proxy and .BR ftp_proxy "." E.g., .B http_proxy="http://localhost:8080/" .SH BUGS .LP Remote files (specified with a URL) are currently only supported for HTTP. Password-protected files or files that depend on HTTP "cookies" are not handled. (You can use tools such as .BR curl (1) or .BR wget (1) to retrieve such files.) .SH "SEE ALSO" .BR hxselect (1)