.TH "HXUNENT" "1" "10 Jul 2011" "7.x" "HTML-XML-utils" .de d \" begin display .sp .in +4 .nf .ft CR .CDS .. .de e \" end display .CDE .in -4 .fi .ft R .sp .. .SH NAME hxunent \- replace HTML predefined character entities by UTF-8 .SH SYNOPSIS .B hxunent .RB "[\| " \-b " \|]" .RB "[\| " \-f " \|]" .RI "[\| " file " \|]" .SH DESCRIPTION .LP The .B hxunent command reads the .I file (or standard input) and copies it to standard output with &-entities by their equivalent character (encoded as UTF-8). E.g., " is replaced by " and < is replaced by <. .SH OPTIONS The following options are supported: .TP 10 .B -b The five builtin entities of XML (< > " ' &) are not replaced but copied unchanged. This is necessary if the output has to be valid XML or SGML. .TP .B -f This option changes how unknown entities or lone ampersands are handled. Normally they are copied unchanged, but this option tries to "fix" them by replacing ampersands by &. Often such stray ampersands are the result of copy and paste of URLs into a document and then this option indeed fixes them and makes the document valid. .SH "DIAGNOSTICS" The program's exit value is 0 if all went well, otherwise: .TP 10 .B 1 The input couldn't be read (file not found, file not readable...) .TP .B 2 Wrong command line arguments. .SH "SEE ALSO" .BR asc2xml (1), .BR xml2asc (1), .BR UTF-8 " (RFC 2279)" .SH BUGS .LP The program assumes entities are as defined by HTML. It doesn't read a document's DTD to find the actual definitions in use in a document. With .BR \-f , it will even remove all entities that are not HTML entities.