.\" Automatically generated by Pod::Man 4.10 (Pod::Simple 3.35) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "PO4A-GETTEXTIZE 1p" .TH PO4A-GETTEXTIZE 1p "2020-08-19" "Po4a Tools" "Po4a Tools" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" po4a\-gettextize \- convert an original file (and its translation) to a PO file .SH "SYNOPSIS" .IX Header "SYNOPSIS" \&\fBpo4a\-gettextize\fR \fB\-f\fR \fIfmt\fR \fB\-m\fR \fImaster.doc\fR [\fB\-l\fR \fI\s-1XX\s0.doc\fR] \fB\-p\fR \fI\s-1XX\s0.po\fR .PP (\fI\s-1XX\s0.po\fR is the output, all others are inputs) .SH "DESCRIPTION" .IX Header "DESCRIPTION" po4a (\s-1PO\s0 for anything) eases the maintenance of documentation translation using the classical gettext tools. The main feature of po4a is that it decouples the translation of content from its document structure. Please refer to the page \&\fBpo4a\fR\|(7) for a gentle introduction to this project. .PP The \fBpo4a\-gettextize\fR script is in charge of converting documentation files into \&\s-1PO\s0 files. You only need it to setup your translation project with po4a, never afterward. .PP If you start from scratch, \fBpo4a\-gettextize\fR will extract the translatable strings from the documentation and write a \s-1POT\s0 file. If you provide a previously existing translated file with the \fB\-l\fR flag, \fBpo4a\-gettextize\fR will try to use the translations that it contains in the produced \s-1PO\s0 file. This process remains tedious and manual, as explained in Section 'Converting a manual translation to po4a' below. .PP If the master document has non-ASCII characters, the new generated \s-1PO\s0 file will be in \s-1UTF\-8.\s0 Else (if the master document is completely in \s-1ASCII\s0), the generated \&\s-1PO\s0 will use the encoding of the translated input document, or \s-1UTF\-8\s0 if no translated document is provided. .SH "OPTIONS" .IX Header "OPTIONS" .IP "\fB\-f\fR, \fB\-\-format\fR" 4 .IX Item "-f, --format" Format of the documentation you want to handle. Use the \fB\-\-help\-format\fR option to see the list of available formats. .IP "\fB\-m\fR, \fB\-\-master\fR" 4 .IX Item "-m, --master" File containing the master document to translate. You can use this option multiple times if you want to gettextize multiple documents. .IP "\fB\-M\fR, \fB\-\-master\-charset\fR" 4 .IX Item "-M, --master-charset" Charset of the file containing the document to translate. .IP "\fB\-l\fR, \fB\-\-localized\fR" 4 .IX Item "-l, --localized" File containing the localized (translated) document. If you provided multiple master files, you may wish to provide multiple localized file by using this option more than once. .IP "\fB\-L\fR, \fB\-\-localized\-charset\fR" 4 .IX Item "-L, --localized-charset" Charset of the file containing the localized document. .IP "\fB\-p\fR, \fB\-\-po\fR" 4 .IX Item "-p, --po" File where the message catalog should be written. If not given, the message catalog will be written to the standard output. .IP "\fB\-o\fR, \fB\-\-option\fR" 4 .IX Item "-o, --option" Extra option(s) to pass to the format plugin. See the documentation of each plugin for more information about the valid options and their meanings. For example, you could pass '\-o tablecells' to the AsciiDoc parser, while the text parser would accept '\-o tabs=split'. .IP "\fB\-h\fR, \fB\-\-help\fR" 4 .IX Item "-h, --help" Show a short help message. .IP "\fB\-\-help\-format\fR" 4 .IX Item "--help-format" List the documentation formats understood by po4a. .IP "\fB\-V\fR, \fB\-\-version\fR" 4 .IX Item "-V, --version" Display the version of the script and exit. .IP "\fB\-v\fR, \fB\-\-verbose\fR" 4 .IX Item "-v, --verbose" Increase the verbosity of the program. .IP "\fB\-d\fR, \fB\-\-debug\fR" 4 .IX Item "-d, --debug" Output some debugging information. .IP "\fB\-\-msgid\-bugs\-address\fR \fIemail@address\fR" 4 .IX Item "--msgid-bugs-address email@address" Set the report address for msgid bugs. By default, the created \s-1POT\s0 files have no Report-Msgid-Bugs-To fields. .IP "\fB\-\-copyright\-holder\fR \fIstring\fR" 4 .IX Item "--copyright-holder string" Set the copyright holder in the \s-1POT\s0 header. The default value is \&\*(L"Free Software Foundation, Inc.\*(R" .IP "\fB\-\-package\-name\fR \fIstring\fR" 4 .IX Item "--package-name string" Set the package name for the \s-1POT\s0 header. The default is \*(L"\s-1PACKAGE\*(R".\s0 .IP "\fB\-\-package\-version\fR \fIstring\fR" 4 .IX Item "--package-version string" Set the package version for the \s-1POT\s0 header. The default is \*(L"\s-1VERSION\*(R".\s0 .SS "Converting a manual translation to po4a" .IX Subsection "Converting a manual translation to po4a" \&\fBpo4a\-gettextize\fR will try to extract the content of any provided translation file, and use this content as msgstr in the produced \s-1PO\s0 file. Be warned that this process is very fragile: the Nth string of the translated file is supposed to be the translation of the Nth string in the original. This will naturally not work unless both files share exactly the same structure. .PP Internally, each po4a parser reports the syntactical type of each extracted strings. This is how desynchronization are detected during the gettextization. For example, if the files have the following structure, it is very unlikely that the 4th string in translation (of type 'chapter') is the translation of the 4th string in original (of type 'paragraph'). It is more likely that a new paragraph was added to the original, or that two original paragraphs were merged together in the translation. .PP .Vb 1 \& Original Translation \& \& chapter chapter \& paragraph paragraph \& paragraph paragraph \& paragraph chapter \& chapter paragraph \& paragraph paragraph .Ve .PP \&\fBpo4a\-gettextize\fR will verbosely diagnose any detected structure desynchronization. When this happens, you should manually edit the files (this probably requires that you have some notions of the target language). You must add fake paragraphs or remove some content in one of the documents (or both) to fix the reported disparities, until the structure of both documents perfectly match. Some tricks are given in the next section. .PP Even when the document is successfully processed, undetected disparities and silent errors are still possible. That is why any translation associated automatically by po4a\-gettextize is marked as \fIfuzzy\fR to require an manual inspection by humans. One has to check that each retrieved msgstr is actually the translation of the associated msgid, and not the string before or after. .PP As you can see, the key here is to have the exact same structure in the translated document and in the original one. The best is to do the gettextization on the exact version of \fImaster.doc\fR that was used for the translation, and only update the \s-1PO\s0 file against the latest master file once the gettextization was successful. .PP If you are lucky enough to have a a perfect match in the file structures, building a correct \s-1PO\s0 file is a matter of seconds. Otherwise, you will soon understand why this process has such an ugly name :) But remember that this grunt work is the price to pay to get the comfort of po4a afterward. Once converted, the synchronization between master documents and translations will always be fully automatic. .PP Even when things go wrong, gettextization often remains faster than translating everything again. I was able to gettextize the existing French translation of the whole Perl documentation in one day, even though the structure of many documents were desynchronized. That was more than two megabytes of original text (2 millions of characters): restarting the translation from scratch would have required several months of work. .SS "Hints and tricks for the gettextization process" .IX Subsection "Hints and tricks for the gettextization process" The gettextization stops as soon as a desynchronization is detected. In theory, it should probably be possible resynchronize the gettextization later in the documents using e.g. the same algorithm than the \fBdiff\fR\|(1) utility. But a manual intervention would still be mandatory to manually match the elements that couldn't be automatically matched, explaining why automatic resynchronization is not implemented (yet?). .PP When this happens, the whole game comes down to the alignment of these damn files' structures again through manual edits. \fBpo4a\-gettextize\fR is rather verbose about what went wrong when it happens. It reports the strings that don't match, their positions in the text, and the type of each of them. Moreover, the \&\s-1PO\s0 file generated so far is dumped as \fIgettextization.failed.po\fR for further inspection. .PP Here are some other tricks to help you in this tedious process: .IP "\(bu" 4 Remove all extra content of the translations, such as the section giving credits to the translators. You can add them back in po4a afterward, using an addenda (see \fBpo4a\fR\|(7)). .IP "\(bu" 4 If you need to edit the files to align their structures, you should prefer editing the translation if possible. Indeed, if the changes to the original are too intrusive, the old and new versions will not be matched during the \s-1PO\s0 update, and the corresponding translation will be dumped anyway. But do not hesitate to also edit the original document if required: the important thing is to get a first \s-1PO\s0 file to start with. .IP "\(bu" 4 Do not hesitate to kill any original content that would not exist in the translated version. This content will be automatically reintroduced afterward, when synchronizing the \s-1PO\s0 file with the document. .IP "\(bu" 4 You should probably inform the original author of any structural change in the translation that seems justified. Issues in the original document should reported to the author. Fixing them in your translation only fixes them for a part of the community. Plus, it is impossible to do so when using po4a ;) .IP "\(bu" 4 Sometimes, the paragraph content does match, but not their types. Fixing it is rather format-dependent. In \s-1POD\s0 and man, it often comes from the fact that one of them contains a line beginning with a white space while the other does not. In those formats, such paragraph cannot be wrapped and thus become a different type. Just remove the space and you are fine. It may also be a typo in the tag name in \s-1XML.\s0 .Sp Likewise, two paragraphs may get merged together in \s-1POD\s0 when the separating line contains some spaces, or when there is no empty line between the \fB=item\fR line and the content of the item. .IP "\(bu" 4 Sometimes, the desynchronization message seems odd because the translation is attached to the wrong original paragraph. It is the sign of an undetected issue earlier in the process. Search for the actual desynchronization point by inspecting \fIgettextization.failed.po\fR, and fix the problem where it really is. .IP "\(bu" 4 In some unfortunate settings, you will get the feeling that po4a ate some parts of the text, either the original or the translation. \fIgettextization.failed.po\fR indicates that both files matched as expected up to the paragraph N. But then, an (unsuccessful) attempt is made to match the N+1 paragraph in the original file not with the N+1 paragraph in the translation as it should, but with the N+2 paragraph. Just as if the N+1 paragraph that you see in the document simply disappeared from the file during the process. .Sp This unfortunate situation happens when the same paragraph is repeated over the document. In that case, no new entry is created in the \s-1PO\s0 file, but a new reference is added to the existing one instead. .Sp So, the previous situation occurs when two similar but different paragraphs are translated in the exact same way. This will apparently remove a paragraph of the translation. To fix the problem, it is sufficient to slightly alter one of the translations in the document. You can also prefer to kill the second paragraph in the original document. .Sp To the opposite, if the same paragraph appearing twice in the original document is not translated in the exact same way at both locations, you will get the feeling that one paragraph of the original document just vanished. Just copy the best translation over the other one in the translated document to fix the problem. .IP "\(bu" 4 As a final note, do not be too surprised if the first synchronization of your \s-1PO\s0 file takes a long time. This is because most of the msgid of the \s-1PO\s0 file resulting from the gettextization don't match exactly any element of the \s-1POT\s0 file built from the recent master files. This forces gettext to search for the closest one using a costly string proximity algorithm. .Sp For example, the first \fBpo4a\-updatepo\fR of the Perl documentation's French translation (5.5 \s-1MB PO\s0 file) took about 48 hours (yes, two days) while the subsequent ones only take a dozen of seconds. .SH "SEE ALSO" .IX Header "SEE ALSO" \&\fBpo4a\fR\|(1), \&\fBpo4a\-normalize\fR\|(1), \&\fBpo4a\-translate\fR\|(1), \&\fBpo4a\-updatepo\fR\|(1), \&\fBpo4a\fR\|(7). .SH "AUTHORS" .IX Header "AUTHORS" .Vb 3 \& Denis Barbier \& Nicolas François \& Martin Quinson (mquinson#debian.org) .Ve .SH "COPYRIGHT AND LICENSE" .IX Header "COPYRIGHT AND LICENSE" Copyright 2002\-2020 by \s-1SPI,\s0 inc. .PP This program is free software; you may redistribute it and/or modify it under the terms of \s-1GPL\s0 (see the \s-1COPYING\s0 file).