NAME¶
hxunent - replace HTML predefined character entities by UTF-8
SYNOPSIS¶
hxunent [
-b ] [
-f ] [
file ]
DESCRIPTION¶
The
hxunent command reads the
file (or standard input) and copies
it to standard output with &-entities by their equivalent character
(encoded as UTF-8). E.g., " is replaced by " and < is
replaced by <.
OPTIONS¶
The following options are supported:
- -b
- The five builtin entities of XML (< > " '
&) are not replaced but copied unchanged. This is necessary if the
output has to be valid XML or SGML.
- -f
- This option changes how unknown entities or lone ampersands are handled.
Normally they are copied unchanged, but this option tries to
"fix" them by replacing ampersands by &. Often such
stray ampersands are the result of copy and paste of URLs into a document
and then this option indeed fixes them and makes the document valid.
DIAGNOSTICS¶
The program's exit value is 0 if all went well, otherwise:
- 1
- The input couldn't be read (file not found, file not readable...)
- 2
- Wrong command line arguments.
SEE ALSO¶
asc2xml(1),
xml2asc(1),
UTF-8 (RFC 2279)
BUGS¶
The program assumes entities are as defined by HTML. It doesn't read a
document's DTD to find the actual definitions in use in a document. With
-f, it will even remove all entities that are not HTML entities.