.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "PO4A-GETTEXTIZE 1p" .TH PO4A-GETTEXTIZE 1p "2023-01-03" "Po4a Tools" "Po4a Tools" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" po4a\-gettextize \- convert an original file (and its translation) to a PO file .SH "SYNOPSIS" .IX Header "SYNOPSIS" \&\fBpo4a\-gettextize\fR \fB\-f\fR \fIfmt\fR \fB\-m\fR \fImaster.doc\fR \fB\-l\fR \fI\s-1XX\s0.doc\fR \fB\-p\fR \fI\s-1XX\s0.po\fR .PP (\fI\s-1XX\s0.po\fR is the output, all others are inputs) .SH "DESCRIPTION" .IX Header "DESCRIPTION" po4a (\s-1PO\s0 for anything) eases the maintenance of documentation translation using the classical gettext tools. The main feature of po4a is that it decouples the translation of content from its document structure. Please refer to the page \&\fBpo4a\fR\|(7) for a gentle introduction to this project. .PP The \fBpo4a\-gettextize\fR script helps you converting your previously existing translations into a po4a\-based workflow. This is only to be done once to salvage an existing translation while converting to po4a, not on a regular basis after the conversion of your project. This tedious process is explained in details in Section 'Converting a manual translation to po4a' below. .PP You must provide both a master file (e.g., the source in English) and an existing translated file (e.g., a previous translation attempt without po4a). If you provide more than one master or translation files, they will be used in sequence, but it may be easier to gettextize each page or chapter separately and then use \fBmsgmerge\fR to merge all produced \s-1PO\s0 files. As you wish. .PP If the master document has non-ASCII characters, the new generated \s-1PO\s0 file will be in \s-1UTF\-8.\s0 If the master document is completely in \s-1ASCII,\s0 the generated \&\s-1PO\s0 will use the encoding of the translated input document. .SH "OPTIONS" .IX Header "OPTIONS" .IP "\fB\-f\fR, \fB\-\-format\fR" 4 .IX Item "-f, --format" Format of the documentation you want to handle. Use the \fB\-\-help\-format\fR option to see the list of available formats. .IP "\fB\-m\fR, \fB\-\-master\fR" 4 .IX Item "-m, --master" File containing the master document to translate. You can use this option multiple times if you want to gettextize multiple documents. .IP "\fB\-M\fR, \fB\-\-master\-charset\fR" 4 .IX Item "-M, --master-charset" Charset of the file containing the document to translate. .IP "\fB\-l\fR, \fB\-\-localized\fR" 4 .IX Item "-l, --localized" File containing the localized (translated) document. If you provided multiple master files, you may wish to provide multiple localized file by using this option more than once. .IP "\fB\-L\fR, \fB\-\-localized\-charset\fR" 4 .IX Item "-L, --localized-charset" Charset of the file containing the localized document. .IP "\fB\-p\fR, \fB\-\-po\fR" 4 .IX Item "-p, --po" File where the message catalog should be written. If not given, the message catalog will be written to the standard output. .IP "\fB\-o\fR, \fB\-\-option\fR" 4 .IX Item "-o, --option" Extra option(s) to pass to the format plugin. See the documentation of each plugin for more information about the valid options and their meanings. For example, you could pass '\-o tablecells' to the AsciiDoc parser, while the text parser would accept '\-o tabs=split'. .IP "\fB\-h\fR, \fB\-\-help\fR" 4 .IX Item "-h, --help" Show a short help message. .IP "\fB\-\-help\-format\fR" 4 .IX Item "--help-format" List the documentation formats understood by po4a. .IP "\fB\-k\fR \fB\-\-keep\-temps\fR" 4 .IX Item "-k --keep-temps" Keep the temporary master and localized \s-1POT\s0 files built before merging. This can be useful to understand why these files get desynchronized, leading to gettextization problems .IP "\fB\-V\fR, \fB\-\-version\fR" 4 .IX Item "-V, --version" Display the version of the script and exit. .IP "\fB\-v\fR, \fB\-\-verbose\fR" 4 .IX Item "-v, --verbose" Increase the verbosity of the program. .IP "\fB\-d\fR, \fB\-\-debug\fR" 4 .IX Item "-d, --debug" Output some debugging information. .IP "\fB\-\-msgid\-bugs\-address\fR \fIemail@address\fR" 4 .IX Item "--msgid-bugs-address email@address" Set the report address for msgid bugs. By default, the created \s-1POT\s0 files have no Report-Msgid-Bugs-To fields. .IP "\fB\-\-copyright\-holder\fR \fIstring\fR" 4 .IX Item "--copyright-holder string" Set the copyright holder in the \s-1POT\s0 header. The default value is \&\*(L"Free Software Foundation, Inc.\*(R" .IP "\fB\-\-package\-name\fR \fIstring\fR" 4 .IX Item "--package-name string" Set the package name for the \s-1POT\s0 header. The default is \*(L"\s-1PACKAGE\*(R".\s0 .IP "\fB\-\-package\-version\fR \fIstring\fR" 4 .IX Item "--package-version string" Set the package version for the \s-1POT\s0 header. The default is \*(L"\s-1VERSION\*(R".\s0 .SS "Converting a manual translation to po4a" .IX Subsection "Converting a manual translation to po4a" \&\fBpo4a\-gettextize\fR synchronizes the master and localized files to extract their content into a \s-1PO\s0 file. The content of the master file gives the \fBmsgid\fR while the content of the localized file gives the \fBmsgstr\fR. This process is somewhat fragile: the Nth string of the translated file is supposed to be the translation of the Nth string in the original. .PP Gettextization works best if you manage to retrieve the exact version of the original document that was used for translation. Even so, you may need to fiddle with both master and localized files to align their structure if it was changed by the original translator, so working on files' copies is advised. .PP Internally, each po4a parser reports the syntactical type of each extracted strings. This is how desynchronization are detected during the gettextization. In the example depicted below, it is very unlikely that the 4th string in translation (of type 'chapter') is the translation of the 4th string in original (of type 'paragraph'). It is more likely that a new paragraph was added to the original, or that two original paragraphs were merged together in the translation. .PP .Vb 1 \& Original Translation \& \& chapter chapter \& paragraph paragraph \& paragraph paragraph \& paragraph chapter \& chapter paragraph \& paragraph paragraph .Ve .PP \&\fBpo4a\-gettextize\fR will verbosely diagnose any structure desynchronization. When this happens, you should manually edit the files to add fake paragraphs or remove some content here and there until the structure of both files actually match. Some tricks are given below to salvage the most of the existing translation while doing so. .PP If you are lucky enough to have a perfect match in the file structures out of the box, building a correct \s-1PO\s0 file is a matter of seconds. Otherwise, you will soon understand why this process has such an ugly name :) Even so, gettextization often remains faster than translating everything again. I gettextized the French translation of the whole Perl documentation in one day despite the \fImany\fR synchronization issues. Given the amount of text (2Mb of original text), restarting the translation without first salvaging the old translations would have required several months of work. In addition, this grunt work is the price to pay to get the comfort of po4a. Once converted, the synchronization between master documents and translations will always be fully automatic. .PP After a successful gettextization, the produced documents should be manually checked for undetected disparities and silent errors, as explained below. .PP \fIHints and tricks for the gettextization process\fR .IX Subsection "Hints and tricks for the gettextization process" .PP The gettextization stops as soon as a desynchronization is detected. When this happens, you need to edit the files as much as needed to re-align the files' structures. \fBpo4a\-gettextize\fR is rather verbose when things go wrong. It reports the strings that don't match, their positions in the text, and the type of each of them. Moreover, the \s-1PO\s0 file generated so far is dumped as \&\fIgettextization.failed.po\fR for further inspection. .PP Here are some tricks to help you in this tedious process and ensure that you salvage the most of the previous translation: .IP "\(bu" 4 Remove all extra content of the translations, such as the section giving credits to the translators. They should be added separately to \fBpo4a\fR as addendas (see \&\fBpo4a\fR\|(7)). .IP "\(bu" 4 When editing the files to align their structures, prefer editing the translation if possible. Indeed, if the changes to the original are too intrusive, the old and new versions will not be matched during the first po4a run after gettextization (see below). Any unmatched translation will be dumped anyway. That being said, you still want to edit the original document if it's too hard to get the gettextization to proceed otherwise, even if it means that one paragraph of the translation is dumped. The important thing is to get a first \s-1PO\s0 file to start with. .IP "\(bu" 4 Do not hesitate to kill any original content that would not exist in the translated version. This content will be automatically reintroduced afterward, when synchronizing the \s-1PO\s0 file with the document. .IP "\(bu" 4 You should probably inform the original author of any structural change in the translation that seems justified. Issues in the original document should reported to the author. Fixing them in your translation only fixes them for a part of the community. Plus, it is impossible to do so when using po4a ;) But you probably want to wait until the end of the conversion to \fBpo4a\fR before changing the original files. .IP "\(bu" 4 Sometimes, the paragraph content does match, but not their types. Fixing it is rather format-dependent. In \s-1POD\s0 and man, it often comes from the fact that one of them contains a line beginning with a white space while the other does not. In those formats, such paragraph cannot be wrapped and thus become a different type. Just remove the space and you are fine. It may also be a typo in the tag name in \s-1XML.\s0 .Sp Likewise, two paragraphs may get merged together in \s-1POD\s0 when the separating line contains some spaces, or when there is no empty line between the \fB=item\fR line and the content of the item. .IP "\(bu" 4 Sometimes, the desynchronization message seems odd because the translation is attached to the wrong original paragraph. It is the sign of an undetected issue earlier in the process. Search for the actual desynchronization point by inspecting the file \fIgettextization.failed.po\fR that was produced, and fix the problem where it really is. .IP "\(bu" 4 Other issues may come from duplicated strings in either the original or translation. Duplicated strings are merged in \s-1PO\s0 files, with two references. This constitutes a difficulty for the gettextization algorithm, that is a simple one to one pairing between the \fBmsgid\fRs of both the master and the localized files. It is however believed that recent versions of po4a deal properly with duplicated strings, so you should report any remaining issue that you may encounter. .SS "Reviewing files produced by \fBpo4a\-gettextize\fP" .IX Subsection "Reviewing files produced by po4a-gettextize" Any file produced by \fBpo4a\-gettextize\fR should be manually reviewed, even when the script terminates successfully. You should skim over the \s-1PO\s0 file, ensuring that the \fBmsgid\fR and \fBmsgstr\fR actually match. It is not necessary to ensure that the translation is perfectly correct yet, as all entries are marked as fuzzy translations anyway. You only need to check for obvious matching issues because badly matched translations will be dumped in subsequent steps while you want to salvage them. .PP Fortunately, this step does not require to master the target languages as you only want to recognize similar elements in each \fBmsgid\fR and its corresponding \&\fBmsgstr\fR. As a speaker of French, English, and some German myself, I can do this for all European languages at least, even if I cannot say one word of most of these languages. I sometimes manage to detect matching issues in non-Latin languages by looking at string length, phrase structures (does the amount of interrogation marks match?) and other clues, but I prefer when someone else can review those languages. .PP If you detect a mismatch, edit the original and translation files as if \&\fBpo4a\-gettextize\fR reported an error, and try again. Once you have a decent \s-1PO\s0 file for your previous translation, backup it until you get po4a working correctly. .SS "Running \fBpo4a\fP for the first time" .IX Subsection "Running po4a for the first time" The easiest way to setup po4a is to write a \fBpo4a.conf\fR configuration file, and use the integrated po4a program (\fBpo4a\-updatepo\fR and \fBpo4a\-translate\fR are deprecated). Please check the \*(L"\s-1CONFIGURATION FILE\*(R"\s0 Section in \fBpo4a\fR\|(1) documentation for more details. .PP When \fBpo4a\fR runs for the first time, the current version of the master documents will be used to update the \s-1PO\s0 files containing the old translations that you salvaged through gettextization. This can take quite a long time, because many of the \fBmsgid\fRs of from the gettextization do not exactly match the elements of the \s-1POT\s0 file built from the recent master files. This forces gettext to search for the closest one using a costly string proximity algorithm. For example, the first run over the Perl documentation's French translation (5.5 \&\s-1MB PO\s0 file) took about 48 hours (yes, two days) while the subsequent ones only take seconds. .SS "Moving your translations to production" .IX Subsection "Moving your translations to production" After this first run, the \s-1PO\s0 files are ready to be reviewed by translators. All entries were marked as fuzzy in the \s-1PO\s0 file by \fBpo4a\-gettextization\fR, forcing their careful review before use. Translators should take each entry to verify that the salvaged translation actually match the current original text, update the translation on need, and remove the fuzzy markers. .PP Once enough fuzzy markers are removed, \fBpo4a\fR will start generating the translation files on disk, and you're ready to move your translation workflow to production. Some projects find it useful to rely on weblate to coordinate between translators and maintainers, but that's beyond \fBpo4a\fR' scope. .SH "SEE ALSO" .IX Header "SEE ALSO" \&\fBpo4a\fR\|(1), \&\fBpo4a\-normalize\fR\|(1), \&\fBpo4a\-translate\fR\|(1), \&\fBpo4a\-updatepo\fR\|(1), \&\fBpo4a\fR\|(7). .SH "AUTHORS" .IX Header "AUTHORS" .Vb 3 \& Denis Barbier \& Nicolas François \& Martin Quinson (mquinson#debian.org) .Ve .SH "COPYRIGHT AND LICENSE" .IX Header "COPYRIGHT AND LICENSE" Copyright 2002\-2022 by \s-1SPI,\s0 inc. .PP This program is free software; you may redistribute it and/or modify it under the terms of \s-1GPL\s0 (see the \s-1COPYING\s0 file).