Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
PO4A(7)				  Po4a Tools			       PO4A(7)

       po4a - framework	to translate documentation and other materials

       The po4a	(PO for	anything) project goal is to ease translations (and
       more interestingly, the maintenance of translations) using gettext
       tools on	areas where they were not expected like	documentation.

Table of content
       This document is	organized as follow:

       1 Why should I use po4a?	What is	it good	for?
	   This	introducing chapter explains the motivation of the project and
	   its philosophy. You should read it first if you are in the process
	   of evaluating po4a for your own translations.

       2 How to	use po4a?
	   This	chapter	is a sort of reference manual, trying to answer	the
	   users' questions and	to give	you a better understanding of the
	   whole process. This introduces how to do things with	po4a and serve
	   as an introduction to the documentation of the specific tools.

	   HOWTO begin a new translation?
	   HOWTO change	the translation	back to	a documentation	file?
	   HOWTO update	a po4a translation?
	   HOWTO convert a pre-existing	translation to po4a?
	   HOWTO add extra text	to translations	(like translator's name)?
	   HOWTO do all	this in	one program invocation?
	   HOWTO customize po4a?
       3 How does it work?
	   This	chapter	gives you a brief overview of the po4a internals, so
	   that	you may	feel more confident to help us maintaining and
	   improving it. It may	also help you understanding why	it does	not do
	   what	you expected, and how to solve your problems.

       4 FAQ
	   This	chapter	groups the Frequently Asked Questions. In fact,	most
	   of the questions for	now could be formulated	that way: "Why is it
	   designed this way, and not that one?" If you	think po4a isn't the
	   right answer	to documentation translation, you should consider
	   reading this	section. If it does not	answer your question, please
	   contact us on the <> mailing
	   list. We love feedback.

       5 Specific notes	about modules
	   This	chapter	presents the specificities of each module from the
	   translator and original author's point of view. Read	this to	learn
	   the syntax you will encounter when translating stuff	in this
	   module, or the rules	you should follow in your original document to
	   make	translators' life easier.

	   Actually, this section is not really	part of	this document.
	   Instead, it is placed in each module's documentation. This helps
	   ensuring that the information is up to date by keeping the
	   documentation and the code together.

Why should I use po4a? What it is good for?
       I like the idea of open-source software,	making it possible for
       everybody to access software and	its source code. But being French, I'm
       well aware that the licensing is	not the	only restriction to the
       openness	of software: non-translated free software is useless for non-
       English speakers, and we	still have some	work to	make it	available to
       really everybody	out there.

       The perception of this situation	by the open-source actors did
       dramatically improve recently. We, as translators, won the first	battle
       and convinced everybody of the translations' importance.	But
       unfortunately, it was the easy part. Now, we have to do the job and
       actually	translate all this stuff.

       Actually, open-source software themselves benefit of a rather decent
       level of	translation, thanks to the wonderful gettext tool suite. It is
       able to extract the strings to translate	from the program, present a
       uniform format to translators, and then use the result of their works
       at run time to display translated messages to the user.

       But the situation is rather different when it comes to documentation.
       Too often, the translated documentation is not visible enough (not
       distributed as a	part of	the program), only partial, or not up to date.
       This last situation is by far the worst possible	one. Outdated
       translation can turn out	to be worse than no translation	at all to the
       users by	describing old program behavior	which are not in use anymore.

   The problem to solve
       Translating documentation is not	very difficult in itself. Texts	are
       far longer than the messages of the program and thus take longer	to be
       achieved, but no	technical skill	is really needed to do so. The
       difficult part comes when you have to maintain your work. Detecting
       which parts did change and need to be updated is	very difficult,	error-
       prone and highly	unpleasant. I guess that this explains why so much
       translated documentation	out there are outdated.

   The po4a answers
       So, the whole point of po4a is to make the documentation	translation
       maintainable. The idea is to reuse the gettext methodology to this new
       field. Like in gettext, texts are extracted from	their original
       locations in order to be	presented in a uniform format to the
       translators. The	classical gettext tools	help them updating their works
       when a new release of the original comes	out. But to the	difference of
       the classical gettext model, the	translations are then re-injected in
       the structure of	the original document so that they can be processed
       and distributed just like the English version.

       Thanks to this, discovering which parts of the document were changed
       and need	an update becomes very easy. Another good point	is that	the
       tools will make almost all the work when	the structure of the original
       document	gets fundamentally reorganized and when	some chapters are
       moved around, merged or split. By extracting the	text to	translate from
       the document structure, it also keeps you away from the text formatting
       complexity and reduces your chances to get a broken document (even if
       it does not completely prevent you to do	so).

       Please also see the FAQ below in	this document for a more complete list
       of the advantages and disadvantages of this approach.

   Supported formats
       Currently, this approach	has been successfully implemented to several
       kinds of	text formatting	formats:


       The good	old manual pages' format, used by so much programs out there.
       The po4a	support	is very	welcome	here since this	format is somewhat
       difficult to use	and not	really friendly	to the newbies.	 The
       Locale::Po4a::Man(3pm) module also supports the mdoc format, used by
       the BSD man pages (they are also	quite common on	Linux).


       This is the Perl	Online Documentation format. The language and
       extensions themselves are documented that way, as well as most of the
       existing	Perl scripts. It makes easy to keep the	documentation close to
       the actual code by embedding them both in the same file.	It makes
       programmer life easier, but unfortunately, not the translator one.


       Even if somewhat	superseded by XML nowadays, this format	is still used
       rather often for	documents which	are more than a	few screens long. It
       allows you to make complete books. Updating the translation of so long
       documents can reveal to be a real nightmare. diff reveals often useless
       when the	original text was re-indented after update. Fortunately, po4a
       can help	you in that process.

       Currently, only the DebianDoc and DocBook DTD are supported, but	adding
       support to a new	one is really easy. It is even possible	to use po4a on
       an unknown SGML DTD without changing the	code by	providing the needed
       information on the command line.	See Locale::Po4a::Sgml(3pm) for

       TeX / LaTeX

       The LaTeX format	is a major documentation format	used in	the Free
       Software	world and for publications.  The Locale::Po4a::LaTeX(3pm)
       module was tested with the Python documentation,	a book and some


       All the GNU documentation is written in this format (that's even	one of
       the requirement to become an official GNU project).  The	support	for
       Locale::Po4a::Texinfo(3pm) in po4a is still at the beginning.  Please
       report bugs and feature requests.


       The XML format is a base	format for many	documentation formats.

       Currently, the DocBook DTD is supported by po4a.	See
       Locale::Po4a::Docbook(3pm) for details.


       Po4a can	also handle some more rare or specialized formats, such	as the
       documentation of	compilation options for	the 2.4.x kernels or the
       diagrams	produced by the	dia tool. Adding a new one is often very easy
       and the main task is to come up with a parser of	your target format.
       See Locale::Po4a::TransTractor(3pm) for more information	about this.

   Unsupported formats
       Unfortunately, po4a still lacks support for several documentation

       There is	a whole	bunch of other formats we would	like to	support	in
       po4a, and not only documentation	ones. Indeed, we aim at	plugging all
       "market holes" left by the classical gettext tools.  It encompass
       package descriptions (deb and rpm), package installation	scripts
       questions, package changelogs, and all specialized file formats used by
       the programs such as game scenarios or wine resource files.

How to use po4a?
       This chapter is a sort of reference manual, trying to answer the	users'
       questions and to	give you a better understanding	of the whole process.
       This introduces how to do things	with po4a and serve as an introduction
       to the documentation of the specific tools.

   Graphical overview
       The following schema gives an overview of the process of	translating
       documentation using po4a. Do not	be afraid by its apparent complexity,
       it comes	from the fact that the whole process is	represented here. Once
       you converted your project to po4a, only	the right part of the graphic
       is relevant.

       Note that master.doc is taken as	an example for the documentation to be
       translated and translation.doc is the corresponding translated text.
       The suffix could	be .pod, .xml, or .sgml	depending on its format. Each
       part of the picture will	be detailed in the next	sections.

	    :		|		      |			       :
       {translation}	|	  { update of master.doc }	       :
	    :		|		      |			       :
	  XX.doc	|		      V			       V
       (optional)	|		  master.doc ->-------->------>+
	    :		|		    (new)		       |
	    V		V		      |			       |
	 [po4a-gettextize]   doc.XX.po--->+   |			       |
		 |	      (old)	  |   |			       |
		 |		^	  V   V			       |
		 |		|     [po4a-updatepo]		       |
		 V		|	    |			       V
	  translation.pot	^	    V			       |
		 |		|	  doc.XX.po		       |
		 |		|	  (fuzzy)		       |
	  { translation	}	|	    |			       |
		 |		^	    V			       V
		 |		|     {manual editing}		       |
		 |		|	    |			       |
		 V		|	    V			       V
	     doc.XX.po --->---->+<---<---- doc.XX.po   addendum	    master.doc
	     (initial)			 (up-to-date) (optional)   (up-to-date)
		 :			    |		 |	       |
		 :			    V		 |	       |
		 +----->----->----->------> +		 |	       |
					    |		 |	       |
					    V		 V	       V

       On the left part, the conversion	of a translation not using po4a	to
       this system is shown. On	the top	of the right part, the action of the
       original	author is depicted (updating the documentation).  The middle
       of the right part is where the automatic	actions	of po4a	are depicted.
       The new material	are extracted, and compared against the	exiting
       translation. Parts which	didn't change are found, and previous
       translation is used. Parts which	where partially	modified are also
       connected to the	previous translation, but with a specific marker
       indicating that the translation must be updated.	The bottom of the
       figure shows how	a formatted document is	built.

       Actually, as a translator, the only manual operation you	have to	do is
       the part	marked {manual editing}. Yeah, I'm sorry, but po4a helps you
       translate.  It does not translate anything for you...

   HOWTO begin a new translation?
       This section presents the needed	steps required to begin	a new
       translation with	po4a. The refinements involved in converting an
       existing	project	to this	system are detailed in the relevant section.

       To begin	a new translation using	po4a, you have to do the following

       - Extract the text which	have to	be translated from the original
	 <master.doc> document into a new translation template
	 <translation.pot> file	(the gettext format). For that,	use the
	 po4a-gettextize program this way:

	   $ po4a-gettextize -f	<format> -m <master.doc> -p <translation.pot>

	 <format> is naturally the format used in the master.doc document. As
	 expected, the output goes into	translation.pot.  Please refer to
	 po4a-gettextize(1) for	more details about the existing	options.

       - Actually translate what should	be translated. For that, you have to
	 rename	the POT	file for example to doc.XX.po (where XX	is the ISO639
	 code of the language you are translating to, e.g. fr for French), and
	 edit the resulting file. It is	often a	good idea to not name the file
	 XX.po to avoid	confusion with the translation of the program
	 messages, but this your call.	Don't forget to	update the PO file
	 headers, they are important.

	 The actual translation	can be done using the Emacs' or	Vi's PO	mode,
	 Lokalize (KDE based), Gtranslator (GNOME based) or whichever program
	 you prefer to use for them (e.g. Virtaal).

	 If you	wish to	learn more about this, you definitively	need to	refer
	 to the	gettext	documentation, available in the	gettext-doc package.

   HOWTO change	the translation	back to	a documentation	file?
       Once you're done	with the translation, you want to get the translated
       documentation and distribute it to users	along with the original	one.
       For that, use the po4a-translate(1) program like	that (where XX is the
       language	code):

	 $ po4a-translate -f <format> -m <master.doc> -p <doc.XX.po> -l	<XX.doc>

       As before, <format> is the format used in the master.doc	document. But
       this time, the PO file provided with the	-p flag	is part	of the input.
       This is your translation. The output goes into XX.doc.

       Please refer to po4a-translate(1) for more details.

   HOWTO update	a po4a translation?
       To update your translation when the original master.doc file has
       changed,	use the	po4a-updatepo(1) program like that:

	 $ po4a-updatepo -f <format> -m	<new_master.doc> -p <old_doc.XX.po>

       (Please refer to	po4a-updatepo(1) for more details)

       Naturally, the new paragraph in the document won't get magically
       translated in the PO file with this operation, and you'll need to
       update the PO file manually. Likewise, you may have to rework the
       translation for paragraphs which	were modified a	bit. To	make sure you
       won't miss any of them, they are	marked as "fuzzy" during the process
       and you have to remove this marker before the translation can be	used
       by po4a-translate.  As for the initial translation, the best is to use
       your favorite PO	editor here.

       Once your PO file is up-to-date again, without any untranslated or
       fuzzy string left, you can generate a translated	documentation file, as
       explained in the	previous section.

   HOWTO convert a pre-existing	translation to po4a?
       Often, you used to translate manually the document happily until	a
       major reorganization of the original master.doc document	happened.
       Then, after some	unpleasant tries with diff or similar tools, you want
       to convert to po4a.  But	of course, you don't want to loose your
       existing	translation in the process. Don't worry, this case is also
       handled by po4a tools and is called gettextization.

       The key here is to have the same	structure in the translated document
       and in the original one so that the tools can match the content

       If you are lucky	(i.e., if the structures of both documents perfectly
       match), it will work seamlessly and you will be set in a	few seconds.
       Otherwise, you may understand why this process has such an ugly name,
       and you'd better	be prepared to some grunt work here. In	any case,
       remember	that it	is the price to	pay to get the comfort of po4a
       afterward. And the good point is	that you have to do so only once.

       I cannot	emphasis this too much.	In order to ease the process, it is
       thus important that you find the	exact version which were used to do
       the translation.	The best situation is when you noted down the VCS
       revision	used for the translation and you didn't	modify it in the
       translation process, so that you	can use	it.

       It won't	work well when you use the updated original text with the old
       translation. It remains possible, but is	harder and really should be
       avoided if possible. In fact, I guess that if you fail to find the
       original	text again, the	best solution is to find someone to do the
       gettextization for you (but, please, not	me ;).

       Maybe I'm too dramatic here. Even when things go	wrong, it remains ways
       faster than translating everything again. I was able to gettextize the
       existing	French translation of the Perl documentation in	one day, even
       though things did went wrong. That was more than	two megabytes of text,
       and a new translation would have	lasted months or more.

       Let me explain the basis	of the procedure first and I will come back on
       hints to	achieve	it when	the process goes wrong.	To ease	comprehension,
       let's use above example once again.

       Once you	have the old master.doc	again which matches with the
       translation XX.doc, the gettextization can be done directly to the PO
       file doc.XX.po without manual translation of translation.pot file:

	$ po4a-gettextize -f <format> -m <old_master.doc> -l <XX.doc> -p <doc.XX.po>

       When you're lucky, that's it. You converted your	old translation	to
       po4a and	can begin with the updating task right away. Just follow the
       procedure explained a few section ago to	synchronize your PO file with
       the newest original document, and update	the translation	accordingly.

       Please note that	even when things seem to work properly,	there is still
       room for	errors in this process.	The point is that po4a is unable to
       understand the text to make sure	that the translation match the
       original. That's	why all	strings	are marked as "fuzzy" in the process.
       You should check	each of	them carefully before removing those markers.

       Often the document structures don't match exactly, preventing
       po4a-gettextize from doing its job properly. At that point, the whole
       game is about editing the files to get their damn structures matching.

       It may help to read the section Gettextization: how does	it work?
       below.  Understanding the internal process will help you	to make	this
       work. The good point is that po4a-gettextize is rather verbose about
       what went wrong when it happens.	First, it pinpoints where in the
       documents the structures' discrepancies are. You	will learn the strings
       that don't match, their positions in the	text, and the type of each of
       them. Moreover, the PO file generated so	far will be dumped to

       -   Remove all extra parts of the translations, such as the section in
	   which you give the translator name and thank	every people who
	   contributed to the translation. Addenda, which are described	in the
	   next	section, will allow you	to re-add them afterward.

       -   Do not hesitate to edit both	the original and the translation. The
	   most	important thing	is to get the PO file. You will	be able	to
	   update it afterward.	That being said, editing the translation
	   should be preferred when both are possible since it makes things
	   easier when the gettextization is done.

       -   If needed, kill some	parts of the original if they happen to	not be
	   translated. When synchronizing the PO with the document afterward,
	   they	will come back from themselves.

       -   If you changed the structure	a bit (to merge	two paragraphs,	or
	   split another one), undo those changes. If there are	issues in the
	   original, you should	inform the original author. Fixing them	in
	   your	translation only fixes them for	a part of the community. And
	   moreover, it's impossible when using	po4a ;)

       -   Sometimes, the paragraph content does match,	but their types	don't.
	   Fixing it is	rather format-dependant. In POD	and man, it often
	   comes from the fact that one	of the two contains a line beginning
	   with	a white	space where the	other doesn't. In those	formats, such
	   paragraph cannot be wrapped and thus	become a different type. Just
	   remove the space and	you are	fine. It may also be a typo in the tag

	   Likewise, two paragraphs may	get merged together in POD when	the
	   separating line contains some spaces, or when there is no empty
	   line	between	the =item line and the content of the item.

       -   Sometimes, there is a desynchronization between the files, and the
	   translation is attached to the wrong	original paragraph. It is the
	   sign	that the real problem was before in the	files. Check
	   gettextization.failed.po to see when	the desynchronization begins,
	   and fix it there.

       -   Sometimes, you get the strong feeling that po4a ate some parts of
	   the text, either the	original or the	translation.
	   gettextization.failed.po indicates that both	of them	where gently
	   matching, and then the gettextization fails because it tried	to
	   match one paragraph with the	one after (or before) the right	one,
	   as if the right one disappeared. Curse po4a as I did	when it	first
	   happened to me. Generously.

	   This	unfortunate situation happens when the same paragraph is
	   repeated over the document. In that case, no	new entry is created
	   in the PO file, but a new reference is added	to the existing	one

	   So, when the	same paragraph appears twice in	the original but both
	   are not translated in the exact same	way each time, you will	get
	   the feeling that a paragraph	of the original	disappeared. Just kill
	   the new translation.	If you prefer to kill the first	translation
	   instead when	the second one was actually better, remove the second
	   one from where it is	and put	the first one in the place of the
	   second one.

	   In the contrary, if two similar but different paragraphs were
	   translated in the exact same	way, you will get the feeling that a
	   paragraph of	the translation	disappeared. A solution	is to add a
	   stupid string to the	original paragraph (such as "I'm different").
	   Don't be afraid, those things will disappear	during the
	   synchronization, and	when the added text is short enough, gettext
	   will	match your translation to the existing text (marking it	as
	   fuzzy, but you don't	really care since all strings are fuzzy	after

       Hopefully, those	tips will help you making your gettextization work and
       obtain your precious PO file. You are now ready to synchronize your
       file and	begin your translation.	Please note that on large text,	it may
       happen that the first synchronization takes a long time.

       For example, the	first po4a-updatepo of the Perl	documentation's	French
       translation (5.5	Mb PO file) took about two days	full on	a 1Ghz G5
       computer.  Yes, 48 hours. But the subsequent ones only take a dozen of
       seconds on my old laptop. This is because the first time, most of the
       msgid of	the PO file don't match	any of the POT file ones. This forces
       gettext to search for the closest one using a costly string proximity

   HOWTO add extra text	to translations	(like translator's name)?
       Because of the gettext approach,	doing this becomes more	difficult in
       po4a than it was	when simply editing a new file along the original one.
       But it remains possible,	thanks to the so-called	addenda.

       It may help the comprehension to	consider addenda as a sort of patches
       applied to the localized	document after processing. They	are rather
       different from the usual	patches	(they have only	one line of context,
       which can embed Perl regular expression,	and they can only add new text
       without removing	any), but the functionalities are the same.

       Their goal is to	allow the translator to	add extra content to the
       document	which is not translated	from the original document. The	most
       common usage is to add a	section	about the translation itself, listing
       contributors and	explaining how to report bug against the translation.

       An addendum must	be provided as a separate file.	The first line
       constitutes a header indicating where in	the produced document they
       should be placed. The rest of the addendum file will be added verbatim
       at the determined position of the resulting document.

       The header has a	pretty rigid syntax: It	must begin with	the string
       PO4A-HEADER:, followed by a semi-colon (;) separated list of key=value
       fields. White spaces ARE	important. Note	that you cannot	use the	semi-
       colon char (;) in the value, and	that quoting it	doesn't	help.

       Again, it sounds	scary, but the examples	given below should help	you to
       find how	to write the header line you need. To illustrate the
       discussion, assume we want to add a section called "About this
       translation" after the "About this document" one.

       Here are	the possible header keys:

       position	(mandatory)
	   a regexp. The addendum will be placed near the line matching	this
	   regexp.  Note that we're speaking about the translated document
	   here, not the original. If more than	a line match this expression
	   (or none), the addition will	fail. It is indeed better to report an
	   error than inserting	the addendum at	the wrong location.

	   This	line is	called position	point in the following.	The point
	   where the addendum is added is called insertion point. Those	two
	   points are near one from another, but not equal. For	example, if
	   you want to insert a	new section, it	is easier to put the position
	   point on the	title of the preceding section and explain po4a	where
	   the section ends (remember that position point is given by a	regexp
	   which should	match a	unique line).

	   The localization of the insertion point with	regard to the position
	   point is controlled by the mode, beginboundary and endboundary
	   fields, as explained	below.

	   In our case,	we would have:

		position=<title>About this document</title>

       mode (mandatory)
	   It can be either the	string before or after,	specifying the
	   position of the addendum, relative to the position point.

	   Since we want the new section to be placed below the	one we are
	   matching, we	have:


       beginboundary (used only	when mode=after, and mandatory in that case)
       endboundary (idem)
	   regexp matching the end of the section after	which the addendum

	   When	mode=after, the	insertion point	is after the position point,
	   but not directly after! It is placed	at the end of the section
	   beginning at	the position point, i.e., after	or before the line
	   matched by the ???boundary argument,	depending on whether you used
	   beginboundary or endboundary.

	   In our case,	we can choose to indicate the end of the section we
	   match by adding:


	   or to indicate the beginning	of the next section by indicating:


	   In both case, our addendum will be placed after the </section> and
	   before the <section>. The first one is better since it will work
	   even	if the document	gets reorganized.

	   Both	forms exist because documentation formats are different. In
	   some	of them, there is a way	to mark	the end	of a section (just
	   like	the </section> we just used), while some other don't
	   explicitly mark the end of section (like in man). In	the former
	   case, you want to make a boundary matching the end of a section, so
	   that	the insertion point comes after	it. In the latter case,	you
	   want	to make	a boundary matching the	beginning of the next section,
	   so that the insertion point comes just before it.

       This can	seem obscure, but hopefully, the next examples will enlighten

	To sum up the example we used so far, in order to add a	section	called
       "About this translation"	after the "About this document"	one in a SGML
       document, you can use either of those header lines:
	  PO4A-HEADER: mode=after; position=About this document; endboundary=</section>
	  PO4A-HEADER: mode=after; position=About this document; beginboundary=<section>

	If you want to add something after the following nroff section:

	 you should put	a position matching this line, and a beginboundary
	 matching the beginning	of the next section (i.e., ^\.SH). The
	 addendum will then be added after the position	point and immediately
	 before	the first line matching	the beginboundary. That	is to say:


	If you want to add something into a section (like after	"Copyright Big
       Dude") instead of adding	a whole	section, give a	position matching this
       line, and give a	beginboundary matching any line.
	  PO4A-HEADER:mode=after;position=Copyright Big	Dude, 2004;beginboundary=^

       If you want to add something at the end of the document,	give a
       position	matching any line of your document (but	only one line. Po4a
       won't proceed if	it's not unique), and give an endboundary matching
       nothing.	Don't use simple strings here like "EOF", but prefer those
       which have less chance to be in your document.

       In any case, remember that these	are regexp. For	example, if you	want
       to match	the end	of a nroff section ending with the line


       don't use .fi as	endboundary, because it	will match with	"the[ fi]le",
       which is	obviously not what you expect. The correct endboundary in that
       case is:	^\.fi$.

       If the addendum doesn't go where	you expected, try to pass the -vv
       argument	to the tools, so that they explain you what they do while
       placing the addendum.

       More detailed example

       Original	document (POD formatted):

	|=head1	NAME
	|dummy - a dummy program
	|=head1	AUTHOR

       Then, the following addendum will ensure	that a section (in French)
       about the translator is added at	the end	of the file. (in French,
       "TRADUCTEUR" means "TRANSLATOR",	and "moi" means	"me")


       In order	to put your addendum before the	AUTHOR,	use the	following


       This works because the next line	matching the beginboundary /^=head1/
       after the section "NAME"	(translated to "NOM" in	French), is the	one
       declaring the authors. So, the addendum will be put between both

   HOWTO do all	this in	one program invocation?
       The use of po4a proved to be a bit error	prone for the users since you
       have to call two	different programs in the right	order (po4a-updatepo
       and then	po4a-translate), each of them needing more than	3 arguments.
       Moreover, it was	difficult with this system to use only one PO file for
       all your	documents when more than one format was	used.

       The po4a(1) program was designed	to solve those difficulties. Once your
       project is converted to the system, you write a simple configuration
       file explaining where your translation files are	(PO and	POT), where
       the original documents are, their formats and where their translations
       should be placed.

       Then, calling po4a(1) on	this file ensures that the PO files are
       synchronized against the	original document, and that the	translated
       document	are generated properly.	Of course, you will want to call this
       program twice: once before editing the PO file to update	them and once
       afterward to get	a completely updated translated	document. But you only
       need to remember	one command line.

   HOWTO customize po4a?
       po4a modules have options (specified with the -o	option)	that can be
       used to change the module behavior.

       It is also possible to customize	a module or new	/ derivative /
       modified	modules	by putting a module in lib/Locale/Po4a/, and adding
       lib to the paths	specified by the PERLLIB or PERL5LIB environment. For

	  PERLLIB=$PWD/lib po4a	--previous po4a/po4a.cfg

       Note: the actual	name of	the lib	directory is not important.

How does it work?
       This chapter gives you a	brief overview of the po4a internals, so that
       you may feel more confident to help us maintaining and improving	it. It
       may also	help you understanding why it does not do what you expected,
       and how to solve	your problems.

   What's the big picture here?
       The po4a	architecture is	object oriented	(in Perl. Isn't	that neat?).
       The common ancestor to all parser classes is called TransTractor. This
       strange name comes from the fact	that it	is at the same time in charge
       of translating document and extracting strings.

       More formally, it takes a document to translate plus a PO file
       containing the translations to use as input while producing two
       separate	outputs: Another PO file (resulting of the extraction of
       translatable strings from the input document), and a translated
       document	(with the same structure than the input	one, but with all
       translatable strings replaced with content of the input PO). Here is a
       graphical representation	of this:

	  Input	document --\				 /---> Output document
			    \	   TransTractor::	/	(translated)
			     +-->--   parse()  --------+
			    /				\
	  Input	PO --------/				 \---> Output PO

       This little bone	is the core of all the po4a architecture. If you omit
       the input PO and	the output document, you get po4a-gettextize. If you
       provide both input and disregard	the output PO, you get po4a-translate.

       TransTractor::parse() is	a virtual function implemented by each module.
       Here is a little	example	to show	you how	it works. It parses a list of
       paragraphs, each	of them	beginning with <p>.

	 1 sub parse {
	 2   PARAGRAPH:	while (1) {
	 3     $my ($paragraph,$pararef,$line,$lref)=("","","","");
	 4     $my $first=1;
	 5     while (($line,$lref)=$document->shiftline() && defined($line)) {
	 6	 if ($line =~ m/<p>/ &&	!$first--; ) {
	 7	   $document->unshiftline($line,$lref);
	 9	   $paragraph =~ s/^<p>//s;
	10	   $document->pushline("<p>".$document->translate($paragraph,$pararef));
	12	   next	PARAGRAPH;
	13	 } else	{
	14	   $paragraph .= $line;
	15	   $pararef = $lref unless(length($pararef));
	16	 }
	17     }
	18     return; # Did not got a defined line? End of input file.
	19   }
	20 }

       On line 6, we encounter <p> for the second time.	That's the signal of
       the next	paragraph. We should thus put the just obtained	line back into
       the original document (line 7) and push the paragraph built so far into
       the outputs. After removing the leading <p> of it on line 9, we push
       the concatenation of this tag with the translation of the rest of the

       This translate()	function is very cool. It pushes its argument into the
       output PO file (extraction) and returns its translation as found	in the
       input PO	file (translation). Since it's used as part of the argument of
       pushline(), this	translation lands into the output document.

       Isn't that cool?	It is possible to build	a complete po4a	module in less
       than 20 lines when the format is	simple enough...

       You can learn more about	this in	Locale::Po4a::TransTractor(3pm).

   Gettextization: how does it work?
       The idea	here is	to take	the original document and its translation, and
       to say that the Nth extracted string from the translation is the
       translation of the Nth extracted	string from the	original. In order to
       work, both files	must share exactly the same structure. For example, if
       the files have the following structure, it is very unlikely that	the
       4th string in translation (of type 'chapter') is	the translation	of the
       4th string in original (of type 'paragraph').

	   Original	    Translation

	 chapter	    chapter
	   paragraph	      paragraph
	   paragraph	      paragraph
	   paragraph	    chapter
	 chapter	      paragraph
	   paragraph	      paragraph

       For that, po4a parsers are used on both the original and	the
       translation files to extract PO files, and then a third PO file is
       built from them taking strings from the second as translation of
       strings from the	first. In order	to check that the strings we put
       together	are actually the translations of each other, document parsers
       in po4a should put information about the	syntactical type of extracted
       strings in the document (all existing ones do so, yours should also).
       Then, this information is used to make sure that	both documents have
       the same	syntax.	In the previous	example, it would allow	us to detect
       that string 4 is	a paragraph in one case, and a chapter title in
       another case and	to report the problem.

       In theory, it would be possible to detect the problem, and
       resynchronize the files afterward (just like diff does).	But what we
       should do of the	few strings before desynchronizations is not clear,
       and it would produce bad	results	some times. That's why the current
       implementation don't try	to resynchronize anything and verbosely	fail
       when something goes wrong, requiring manual modification	of files to
       fix the problem.

       Even with these precautions, things can go wrong	very easily here.
       That's why all translations guessed this	way are	marked fuzzy to	make
       sure that the translator	reviews	and checks them.

   Addendum: How does it work?
       Well, that's pretty easy	here. The translated document is not written
       directly	to disk, but kept in memory until all the addenda are applied.
       The algorithms involved here are	rather straightforward.	We look	for a
       line matching the position regexp, and insert the addendum before it if
       we're in	mode=before. If	not, we	search for the next line matching the
       boundary	and insert the addendum	after this line	if it's	an endboundary
       or before this line if it's a beginboundary.

       This chapter groups the Frequently Asked	Questions. In fact, most of
       the questions for now could be formulated that way: "Why	is it designed
       this way, and not that one?" If you think po4a isn't the	right answer
       to documentation	translation, you should	consider reading this section.
       If it does not answer your question, please contact us on the
       <> mailing list. We love feedback.

   Why to translate each paragraph separately?
       Yes, in po4a, each paragraph is translated separately (in fact, each
       module decides this, but	all existing modules do	so, and	yours should
       also).  There are two main advantages to	this approach:

       o When the technical parts of the document are hidden from the scene,
	 the translator	can't mess with	them. The fewer	markers	we present to
	 the translator	the less error he can do.

       o Cutting the document helps in isolating the changes to	the original
	 document. When	the original is	modified, finding what parts of	the
	 translation need to be	updated	is eased by this process.

       Even with these advantages, some	people don't like the idea of
       translating each	paragraph separately. Here are some of the answers I
       can give	to their fear:

       o This approach proved successfully in the KDE project and allows
	 people	there to produce the biggest corpus of translated and up to
	 date documentation I know.

       o The translators can still use the context to translate, since the
	 strings in the	PO file	are in the same	order than in the original
	 document. Translating sequentially is thus rather comparable whether
	 you use po4a or not.  And in any case,	the best way to	get the
	 context remains to convert the	document to a printable	format since
	 the text formatting ones are not really readable, IMHO.

       o This approach is the one used by professional translators. I agree,
	 that they have	somewhat different goals than open-source translators.
	 The maintenance is for	example	often less critical to them since the
	 content changes rarely.

   Why not to split on sentence	level (or smaller)?
       Professional translator tools sometimes split the document at the
       sentence	level in order to maximize the reusability of previous
       translations and	speed up their process.	 The problem is	that the same
       sentence	may have several translations, depending on the	context.

       Paragraphs are by definition longer than	sentences. It will hopefully
       ensure that having the same paragraph in	two documents will have	the
       same meaning (and translation), regardless of the context in each case.

       Splitting on smaller parts than the sentence would be very bad. It
       would be	a bit long to explain why here,	but interested reader can
       refer to	the Locale::Maketext::TPJ13(3pm) man page (which comes with
       the Perl	documentation),	for example. To	make short, each language has
       its specific syntactic rules, and there is no way to build sentences by
       aggregating parts of sentences working for all existing languages (or
       even for	the 5 of the 10	most spoken ones, or even less).

   Why not put the original as comment along with translation (or the other
       way around)?
       At the first glance, gettext doesn't seem to be adapted to all kind of
       translations.  For example, it didn't seem adapted to debconf, the
       interface all Debian packages use for their interaction with the	user
       during installation. In that case, the texts to translate were pretty
       short (a	dozen lines for	each package), and it was difficult to put the
       translation in a	specialized file since it has to be available before
       the package installation.

       That's why the debconf developer	decided	to implement another solution,
       where translations are placed in	the same file than the original. This
       is rather appealing. One	would even want	to do this for XML, for
       example.	It would look like that:

	 <title	lang="en">My title</title>
	 <title	lang="fr">Mon titre</title>

	  <text	lang="en">My text.</text>
	  <text	lang="fr">Mon texte.</text>

       But it was so problematic that a	PO-based approach is now used. Only
       the original can	be edited in the file, and the translations must take
       place in	PO files extracted from	the master template (and placed	back
       at package compilation time). The old system was	deprecated because of
       several issues:

       o   maintenance problems

	   If several translators provide a patch at the same time, it gets
	   hard	to merge them together.

	   How will you	detect changes to the original,	which need to be
	   applied to the translations?	In order to use	diff, you have to note
	   which version of the	original you translated. I.e., you need	a PO
	   file	in your	file ;)

       o   encoding problems

	   This	solution is viable when	only European languages	are involved,
	   but the introduction	of Korean, Russian and/or Arab really
	   complicate the picture.  UTF	could be a solution, but there are
	   still some problems with it.

	   Moreover, such problems are hard to detect (i.e., only Korean
	   readers will	detect that the	encoding of Korean is broken [because
	   of the Russian translator])

       gettext solves all those	problems together.

   But gettext wasn't designed for that	use!
       That's true, but	until now nobody came with a better solution. The only
       known alternative is manual translation,	with all the maintenance

   What	about the other	translation tools for documentation using gettext?
       As far as I know, there are only	two of them:

	   This	is the tool developed by KDE people to handle DocBook XML.
	   AFAIK, it was the first program to extract strings to translate
	   from	documentation to PO files, and inject them back	after

	   It can only handle XML, and only a particular DTD. I'm quite
	   unhappy with	the handling of	lists, which end in one	big msgid.
	   When	the list become	big, the chunk becomes harder to swallow.

	   This	program	done by	Denis Barbier is a sort	of precursor of	the
	   po4a	SGML module, which more	or less	deprecates it. As the name
	   says, it handles only the DebianDoc DTD, which is more or less a
	   deprecated DTD.

       The main	advantages of po4a over	them are the ease of extra content
       addition	(which is even worse there) and	the ability to achieve

   Educating developers	about translation
       When you	try to translate documentation or programs, you	face three
       kinds of	problems; linguistics (not everybody speaks two	languages),
       technical (that's why po4a exists) and relational/human.	Not all
       developers understand the necessity of translating stuff. Even when
       good willed, they may ignore how	to ease	the work of translators. To
       help with that, po4a comes with lot of documentation which can be
       referred	to.

       Another important point is that each translated file begins with	a
       short comment indicating	what the file is, how to use it. This should
       help the	poor developers	flooded	with tons of files in different
       languages they hardly speak, and	help them dealing correctly with it.

       In the po4a project, translated documents are not source	files anymore.
       Since SGML files	are habitually source files, it's an easy mistake.
       That's why all files present this header:

	|	*****************************************************
	|	*****************************************************
	| This file was	generated by po4a-translate(1).	Do not store it	(in VCS,
	| for example),	but store the PO file used as source file by po4a-translate.
	| In fact, consider this as a binary, and the PO file as a regular source file:
	| If the PO gets lost, keeping this translation	up-to-date will	be harder ;)

       Likewise, gettext's regular PO files only need to be copied to the po/
       directory. But this is not the case of the ones manipulated by po4a.
       The major risk here is that a developer erases the existing translation
       of his program with the translation of his documentation. (Both of them
       can't be	stored in the same PO file, because the	program	needs to
       install its translation as an mo	file while the documentation only uses
       its translation at compile time). That's	why the	PO files produced by
       the po-debiandoc	module contain the following header:

	#    - you do not need to manually edit	POT or PO files.
	#    - this file contains the translation of your debconf templates.
	#      Do not replace the translation of your program with this	!!
	#	 (or your translators will get very upset)
	#    If	you are	not familiar with the PO format, gettext documentation
	#     is worth reading,	especially sections dedicated to this format.
	#    For example, run:
	#	  info -n '(gettext)PO Files'
	#	  info -n '(gettext)Header Entry'
	#    Some information specific to po-debconf are available at
	#	     /usr/share/doc/po-debconf/README-trans
	#	  or

   SUMMARY of the advantages of	the gettext based approach
       o The translations are not stored along with the	original, which	makes
	 it possible to	detect if translations become out of date.

       o The translations are stored in	separate files from each other,	which
	 prevents translators of different languages from interfering, both
	 when submitting their patch and at the	file encoding level.

       o It is based internally	on gettext (but	po4a offers a very simple
	 interface so that you don't need to understand	the internals to use
	 it).  That way, we don't have to re-implement the wheel, and because
	 of their wide use, we can think that these tools are more or less bug

       o Nothing changed for the end-user (beside the fact translations	will
	 hopefully be better maintained). The resulting	documentation file
	 distributed is	exactly	the same.

       o No need for translators to learn a new	file syntax and	their favorite
	 PO file editor	(like Emacs' PO	mode, Lokalize or Gtranslator) will
	 work just fine.

       o gettext offers	a simple way to	get statistics about what is done,
	 what should be	reviewed and updated, and what is still	to do. Some
	 example can be	found at those addresses:


       But everything isn't green, and this approach also has some
       disadvantages we	have to	deal with.

       o Addenda are...	strange	at the first glance.

       o You can't adapt the translated	text to	your preferences, like
	 splitting a paragraph here, and joining two other ones	there. But in
	 some sense, if	there is an issue with the original, it	should be
	 reported as a bug anyway.

       o Even with an easy interface, it remains a new tool people have	to

	 One of	my dreams would	be to integrate	somehow	po4a to	Gtranslator or
	 Lokalize. When	an SGML	file is	opened,	the strings are	automatically
	 extracted.  When it's saved a translated SGML file can	be written to
	 disk. If we manage to do an MS	Word (TM) module (or at	least RTF)
	 professional translators may even use it.

	Denis Barbier <barbier,>
	Martin Quinson (

Po4a Tools			  2021-02-28			       PO4A(7)

NAME | Introduction | Table of content | Why should I use po4a? What it is good for? | How to use po4a? | How does it work? | FAQ | AUTHORS

Want to link to this manual page? Use this URL:

home | help