GSoC 2010 – week summary: July 5th – July 11th

This week I was working on the message translation feature. I implemented XLIFF file format support. XLIFF is standardized XML format for storing translated messages (labels). There are few translation tools available that support this format (some are still beta though), but there are also converters from/to widely known Gettext’s PO file format.

As Gettext natively supports plural forms, and XLIFF does not (it can be done in many ways), I decided to take it into account during XLIFF parser implementation. There is a specification defining how the conversion from .po to .xlf should be done (with plurals), and it’s what the parser conforms. I think it’s the best way for storing plural forms in XLIFF files, as it is possible to convert .xlf to .po, edit it using .po editor (like very good Poedit), and then convert it back to .xlf without any data loss. There is the Translate Toolkit project which has good converters.

However, the way how Gettext defines plural forms is different to the method used in CLDR. In .po files, you just write a C language expression in header which evaluates to integer number, and this number is an index of plural form. In CLDR, there are separate expressions for every form.

Example from Gettext’s .po file (for Polish language):

"Plural-Forms: nplurals=3; plural=(n==1 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);\n"

And the message is defined like this:

msgid "Source singular"
msgid_plural "Source plural"
msgstr[0] "Translated singular"
msgstr[1] "Translated plural 1"
msgstr[2] "Translated plural 2"

When you convert this to .xlf file, it looks as follow (unimportant tags were stripped):

<?xml version='1.0' encoding='utf-8'?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:1.1" version="1.1">
  <file original="foo.po" source-language="en-US" datatype="po">
    <body>
      <group restype="x-gettext-plurals" id="1" xml:space="preserve">
        <trans-unit id="1[0]">
          <source>Source singular</source>
          <target>Translated singular</target>
        </trans-unit>
        <trans-unit id="2[1]">
          <source>Source plural</source>
          <target>Translated plural 1</target>
        </trans-unit>
        <trans-unit id="3[2]">
          <source>Source plural</source>
          <target>Translated plural 2</target>
        </trans-unit>
      </group>
    </body>
  </file>
</xliff>

In CLDR, there are no indices, but keys: “one”, “zero”, “two”, “few”, “many”, “other”. Particular language uses only subset of these keys. I simplified this and the keys are mapped to indices for particular language. For example, if language has “one”, “few”, and “many” forms (like Polish), they are mapped to 0, 1, and 2, so correct form is chosen from XLIFF file, and automatic .po – .xlf conversion is possible.

The Translator class (which is not done yet) will use implementations of TranslationProviderInterface in order to translate strings. I implemented only one concrete translation provider (the XliffTranslationProvider), but it will be easy to add other providers, to support different file formats (or even other storage methods).

When implementing the XLIFF parser, I noticed that some class hierarchy could be improved. Actually, the CldrModel written few weeks earlier was an abstract for XML file, same as the XliffModel which I created this week. So I created an abstract class, AbstractXmlModel, which is common base for former two classes. Also, they all use parsers in order to convert CLDR or XLIFF file to an internal representation. So there is now an AbstractXmlParser, from which CldrParser and XliffParser extends. I think it reduced the redundant code quite a lot.

Then I came up with “brilliant” idea to get it even further and make HierarchicalCldrModel extend CldrModel, and change the readers (DatesReader, PluralsReader, NumbersReader) so they are models extending HierarchicalCldrModel. When I was almost done I realized that it complicated things and the hierarchy was vague, so I reverted it all back ;-).

This weeks’ changes were commited in r4827.

Next week I will implement Translator class, and Fluid Viewhelper for message translation feature. Then the next important feature to do will be an input parsing, and validators. So far, I’m good with my proposal’s timeline ;-).

Tags: ,

  • Sebastian Kurfürst
    Thanks for your great work on the Localization Framework, Karol!

    Best Regards from Dresden,
    Sebastian Kurfürst
  • I'm glad you like it! I hope it will be useful for TYPO3 community. :-)
blog comments powered by Disqus