.\" Automatically generated by Pod::Man 4.09 (Pod::Simple 3.35) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .if !\nF .nr F 0 .if \nF>0 \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} .\} .\" ======================================================================== .\" .IX Title "FIX_LATIN 1p" .TH FIX_LATIN 1p "2017-09-06" "perl v5.26.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" fix_latin \- filters a data stream that is predominantly utf8 and 'fixes' any latin (ie: non\-ASCII 8 bit) characters .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& fix_latin options output_file \& \& Options: \& \& \-\-use\-xs \*(Aqauto\*(Aq | \*(Aqalways\*(Aq | \*(Aqnever\*(Aq \& \-\-version list version number \& \-\-help detailed help message .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" The script acts as a filter, taking source data which may contain a mix of \&\s-1ASCII, UTF8, ISO8859\-1\s0 and \s-1CP1252\s0 characters, and producing output will be all \&\s-1ASCII/UTF8.\s0 .PP Multi-byte \s-1UTF8\s0 characters will be passed through unchanged (although over-long \&\s-1UTF8\s0 byte sequences will be converted to the shortest normal form). Single byte characters will be converted as follows: .PP .Vb 3 \& 0x00 \- 0x7F ASCII \- passed through unchanged \& 0x80 \- 0x9F Converted to UTF8 using CP1252 mappings \& 0xA0 \- 0xFF Converted to UTF8 using Latin\-1 mappings .Ve .SH "OPTIONS" .IX Header "OPTIONS" .IP "\fB\-\-use\-xs 'auto' | 'always' | 'never'\fR" 4 .IX Item "--use-xs 'auto' | 'always' | 'never'" Override default ('auto') behaviour of trying to use \s-1XS\s0 module and falling back to pure-Perl version if not available. Set to 'never' to always use the Perl version or 'always' to always use \s-1XS\s0 and die if not available. .IP "\fB\-\-version\fR (alias \-v)" 4 .IX Item "--version (alias -v)" Display version number of underlying Encoding::FixLatin and \s-1XS\s0 modules. .IP "\fB\-\-help\fR (alias \-?)" 4 .IX Item "--help (alias -?)" Display this documentation. .SH "EXAMPLES" .IX Header "EXAMPLES" This script was originally written to assist in converting a Postgres database from SQL-ASCII encoding to \s-1UNICODE UTF8\s0 encoding. The following examples illustrate its use in that context. .PP If you have a \s-1SQL\s0 format dump file that you would normally restore by piping into 'psql', you can simply filter the dump file through this script: .PP .Vb 1 \& fix_latin < dump_file | psql \-d database .Ve .PP If you have a compressed dump file that you would normally restore using \&'pg_restore', you can omit the '\-d' option on pg_restore and pipe the resulting \&\s-1SQL\s0 through this script and into psql: .PP .Vb 1 \& pg_restore \-O dump_file | fix_latin | psql \-d database .Ve .PP To take a look at non-ASCII lines in the dump file: .PP .Vb 1 \& perl \-ne \*(Aq/^COPY (\eS+)/ and $t = $1; print "$t:$_" if /[^\ex00\-\ex7F]/\*(Aq dump_file .Ve .SH "SEE ALSO" .IX Header "SEE ALSO" This script is implemented using the Encoding::FixLatin Perl module. For more details see the module documentation with the command: .PP .Vb 1 \& perldoc Encoding::FixLatin .Ve .PP In particular you should read the '\s-1LIMITATIONS\s0' section to understand the circumstances under which data corruption might occur. .SH "COPYRIGHT & LICENSE" .IX Header "COPYRIGHT & LICENSE" Copyright 2009\-2014 Grant McLean \f(CW\*(C`\*(C'\fR .PP This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.