.\" -*- mode: troff; coding: utf-8 -*- .\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. .ie n \{\ . ds C` "" . ds C' "" 'br\} .el\{\ . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "FIX_LATIN 1p" .TH FIX_LATIN 1p 2024-03-05 "perl v5.38.2" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH NAME fix_latin \- filters a data stream that is predominantly utf8 and 'fixes' any latin (ie: non\-ASCII 8 bit) characters .SH SYNOPSIS .IX Header "SYNOPSIS" .Vb 1 \& fix_latin options output_file \& \& Options: \& \& \-\-use\-xs \*(Aqauto\*(Aq | \*(Aqalways\*(Aq | \*(Aqnever\*(Aq \& \-\-version list version number \& \-\-help detailed help message .Ve .SH DESCRIPTION .IX Header "DESCRIPTION" The script acts as a filter, taking source data which may contain a mix of ASCII, UTF8, ISO8859\-1 and CP1252 characters, and producing output will be all ASCII/UTF8. .PP Multi-byte UTF8 characters will be passed through unchanged (although over-long UTF8 byte sequences will be converted to the shortest normal form). Single byte characters will be converted as follows: .PP .Vb 3 \& 0x00 \- 0x7F ASCII \- passed through unchanged \& 0x80 \- 0x9F Converted to UTF8 using CP1252 mappings \& 0xA0 \- 0xFF Converted to UTF8 using Latin\-1 mappings .Ve .SH OPTIONS .IX Header "OPTIONS" .IP "\fB\-\-use\-xs 'auto' | 'always' | 'never'\fR" 4 .IX Item "--use-xs 'auto' | 'always' | 'never'" Override default ('auto') behaviour of trying to use XS module and falling back to pure-Perl version if not available. Set to 'never' to always use the Perl version or 'always' to always use XS and die if not available. .IP "\fB\-\-version\fR (alias \-v)" 4 .IX Item "--version (alias -v)" Display version number of underlying Encoding::FixLatin and XS modules. .IP "\fB\-\-help\fR (alias \-?)" 4 .IX Item "--help (alias -?)" Display this documentation. .SH EXAMPLES .IX Header "EXAMPLES" This script was originally written to assist in converting a Postgres database from SQL-ASCII encoding to UNICODE UTF8 encoding. The following examples illustrate its use in that context. .PP If you have a SQL format dump file that you would normally restore by piping into 'psql', you can simply filter the dump file through this script: .PP .Vb 1 \& fix_latin < dump_file | psql \-d database .Ve .PP If you have a compressed dump file that you would normally restore using \&'pg_restore', you can omit the '\-d' option on pg_restore and pipe the resulting SQL through this script and into psql: .PP .Vb 1 \& pg_restore \-O dump_file | fix_latin | psql \-d database .Ve .PP To take a look at non-ASCII lines in the dump file: .PP .Vb 1 \& perl \-ne \*(Aq/^COPY (\eS+)/ and $t = $1; print "$t:$_" if /[^\ex00\-\ex7F]/\*(Aq dump_file .Ve .SH "SEE ALSO" .IX Header "SEE ALSO" This script is implemented using the Encoding::FixLatin Perl module. For more details see the module documentation with the command: .PP .Vb 1 \& perldoc Encoding::FixLatin .Ve .PP In particular you should read the 'LIMITATIONS' section to understand the circumstances under which data corruption might occur. .SH "COPYRIGHT & LICENSE" .IX Header "COPYRIGHT & LICENSE" Copyright 2009\-2014 Grant McLean \f(CW\*(C`\*(C'\fR .PP This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.