Grammar as a leaky abstraction

Internationalisation, or i18n for short, is the process of making the user interface of a program ready for translation into multiple languages. This usually means to factor out texts from the program source code into separate files, often called translation bundles. These files have a key-value structure. The program code then only refers to keys that are resolved into the actual texts from the translation bundle for the selected target language.

Here’s a simple example of two translation bundles, one for English and one for German:

# translations_en.properties
quit_confirm_message=Do you really want to quit?
yes_option=Yes
no_option=No
# translations_de.properties
quit_confirm_message=Wollen Sie die Anwendung wirklich beenden?
yes_option=Ja
no_option=Nein

The actual source code might look like this:

var answer = showDialog(
  t("quit_confirm_message"),
  t("yes_option"),
  t("no_option")
);

Here the function t looks up the key in the currently active translation bundle and returns the translated message as a string.

How not to do it

Last month I discovered an amusing attempt at internationalisation in a third-party code base. It looked similar to this:

# translations_en.properties
a=A
an=An
is=is
not=not
available=available
article=article
# translations_de.properties
a=Ein
an=Ein
is=ist
not=nicht
available=verfügbar
article=Artikel

These translation keys were used like this:

"${t('an')} ${t('article')} ${t('is')}
${available ? '' : t('not')} ${t('available')}."

to produce messages like

"An article is available."
"An article is not available."

… or in German:

"Ein Artikel ist verfügbar."
"Ein Artikel ist nicht verfügbar."

Why is this a clumsy attempt at internationalisation?

Because it uses single words as translation units, and it relies on the fact that English and German have the same sentence structure in this particular case. In general, of course, languages do not have the same sentence structure, not even related languages like English and German.

The author of the code also introduced separate translation keys for “a” and “an”. The German translation for both keys was “ein”. The author was lucky so far that all texts with “a” or “an” in this particular program translated to “ein” in German, not “eine”, “einen”, “einem”, “einer”, or “eines”.

How to do it

So what would be the correct way to do it? The internationalisation should have looked like this:

# translations_en.properties
article_available=An article is available.
article_not_available=An article is not available.
# translations_de.properties
article_available=Ein Artikel ist verfügbar.
article_not_available=Ein Artikel ist nicht verfügbar.
available ? t("article_available")
          : t("article_not_available")

By using whole phrases and sentences as translation units the translations into various languages have the freedom to use their own word orders and grammatical structures.

Internationalization of a React application with react-intl

For the internationalization of a React application I have recently used the seemingly popular react-intl package by Yahoo.

The basic usage is simple. To resolve a message use the FormattedMessage tag in the render method of a React component:

import {FormattedMessage} from "react-intl";

class Greeting extends React.Component {
  render() {
    return (
      <div>
        <FormattedMessage id="greeting.message"
            defaultMessage={"Hello, world!"}/>
      </div>
    );
  }
}

Injecting the “intl” property

If you have a text in your application that can’t be simply resolved with a FormattedMessage tag, because you need it as a string variable in your code, you have to inject the intl property into your React component and then resolve the message via the formatMessage method on the intl property.

To inject this property you have to wrap the component class via the injectIntl() function and then re-assign the wrapped class to the original class identifier:

import {intlShape, injectIntl} from "react-intl";

class SearchField extends React.Component {
  render() {
    const intl = this.props.intl;
    const placeholder = intl.formatMessage({
        id: "search.field.placeholder",
        defaultMessage: "Search"
      });
    return (<input type="search" name="query"
               placeholder={placeholder}/>);
  }
}
SearchField.propTypes = {
    intl: intlShape.isRequired
};
SearchField = injectIntl(SearchField);

Preserving references to components

In one of the components I had captured a reference to a child component with the React ref attribute:

ref={(component) => this.searchInput = component}

After wrapping the parent component class via injectIntl() as described above in order to internationalize it, the internal reference stopped working. It took me a while to figure out how to fix it, since it’s not directly mentioned in the documentation. You have to pass the “withRef: true” option to the injectIntl() call:

SearchForm = injectIntl(SearchForm, {withRef: true});

Here’s a complete example:

import {intlShape, injectIntl} from "react-intl";

class SearchForm extends React.Component {
  render() {
    const intl = this.props.intl;
    const placeholder = intl.formatMessage({
        id: "search.field.placeholder",
        defaultMessage: "Search"
      });
    return (
      <form>
        <input type="search" name="query"
               placeholder={placeholder}
               ref={(c) => this.searchInput = c}/>
      </form>
    );
  }
}
SearchForm.propTypes = {
  intl: intlShape.isRequired
};
SearchForm = injectIntl(SearchForm,
                        {withRef: true});

Conclusion

Although react-intl appears to be one of the more mature internationalization packages for React, the overall experience isn’t too great. Unfortunately, you have to litter the code of your components with dependency injection boilerplate code, and the documentation is lacking.

Translating strings in internationalized applications

Internationalization (“i18n”) and localization (“l10n”) of software is a complex topic with many facets. One aspect of internationalization is the translation of strings in programs into different languages.

Here’s an example of how not to do it (assuming t is a translation lookup function):

StringBuilder sb = new StringBuilder(t("User "));
sb.append(user.name());
sb.append(t(" logged in "));
sb.append(minutes);
sb.append(" ");
if (minutes == 1) {
    sb.append(t("minute"));
} else {
    sb.append(t("minutes"));
}
sb.append(t(" ago."));
return sb.toString();

Translatable strings and concatenation don’t mix well, be it via StringBuilder, the plus operator or in template files like JSPs. Different languages have different sentence structures. You can’t know in advance in which order the parts must appear in the translated text. So the most basic rule is: never construct sentences programmatically from sentence fragments if they are intended for translation.

Here’s a slightly better variant:

if (minutes == 1) {
    return t("User {0} logged in {1} minute ago.", user.name(), minutes);
}
return t("User {0} logged in {1} minutes ago.", user.name(), minutes);

I18n frameworks always offer the possibility to pass arguments to the translation lookup function. This way translators can freely choose the positions of these arguments via placeholders in the translated string.

However, not all languages have pluralization rules similar to English, where you have to handle only two cases (one and zero/many). For example, Russian and Polish use different forms of nouns with different numerals higher than one. Here’s an extensive table listing the plural rules for different languages: The rules are classified into these categories: “one”, “two”, “few”, “many”, “other”. Good i18n frameworks provide translation lookup functions where you can pass the count as an additional argument. The framework then dispatches to different translation keys, depending on the count and the target language:

user.login.minutes.one=...
user.login.minutes.two=...
user.login.minutes.many=...
user.login.minutes.other=...

There are other traps that you have to watch out for, e.g.

  • different punctuation marks: you can’t simply assume that you can convert any translated text into a label by appending “:” to it, or that you can convert any translated text into a quotation by surrounding it with ” and “.
  • gender rules, which can be handled similarly to the pluralization rules

Conclusion

This article gave a small glimpse into the topic of internationalization, to help avoid the most basic mistakes. Check out the documentation of your internationalization framework to see what it can offer.

The perils of \u0027

Adventures (read: pitfalls) of internationalization with Struts2, concerning the principle “stacked smartness doesn’t add up”.

u0027Struts2 is a framework for web application development in Java. It’s considered mature and feature-rich and inherits the internationalization (i18n) capabilities of the Java platform. This is what you would expect. In fact, the i18n features of Struts2 are more powerful than the platform ones, but the power comes with a price.

Examples of the sunshine path

If you read a book like “Struts 2 in Action” written by Donald Brown and others, you’ll come across a chapter named “Understanding internationalization” (it’s chapter 11). You’ll get a great overview with a real-world example of what is possible (placeholder expansion, for example) and if you read a bit further, there is a word of warning:

“You might also want to further investigate the MessageFormat class of the Java platform. We saw its fundamentals in this chapter when we learned of the native Java support for parameterization of message texts and the autoformatting of date and numbers. As we indicated earlier, the MessageFormat class is much richer than we’ve had the time to demonstrate. We recommend consulting the Java documentation if you have further formatting needs as well. “

If you postpone this warning, you’re doomed. It’s not the fault of the book that their examples are the sunshine case (the best circumstances that might happen). The book tries to teach you the basics of Struts2, not its pitfalls.

A pitfall of Struts2 I18N

You will write a web application in Struts2, using the powerful built-in i18n, just to discover that some entries aren’t printed right. Let’s have an example i18n entry:

impossible.action.message=You can't do this

If you include this entry in a webpage using Struts2 i18n tags, you’ll find the apostrophe (unicode character \u0027) missing:

You cant do this

What happened? You didn’t read all about MessageFormat. The apostrophe is a special character for the MessageFormat parser, indicating regions of non-interpreted text (Quoted Strings). As there is only one apostrophe in our example, it just gets omitted and ignored. If there were two of them, both would be omitted and all expansion effort between them would be ceased.

How to overcome the pitfall

You’ll need to escape the apostrophe to have it show up. Here’s the paragraph of the MessageFormat APIDoc:

Within a String, "''" represents a single quote. A QuotedString can contain arbitrary characters except single quotes; the surrounding single quotes are removed. An UnquotedString can contain arbitrary characters except single quotes and left curly brackets. Thus, a string that should result in the formatted message “‘{0}'” can be written as "'''{'0}''" or "'''{0}'''".

That’s bad news. You have to tell your translators to double-type their apostrophes, else they won’t show up. But only the ones represented by \u0027, not the specialized ones of the higher unicode regions like “grave accent”  or “acute accent”. If you already have a large amount of translations, you need to check every apostrophe if it was meant to be printed or to control the MessageFormat parser.

The underlying principle

This unexpected behaviour of an otherwise powerful functionality is a common sign of a principle I call “stacked smartness doesn’t add up”. I will blog about the principle in the near future, so here’s just a short description: A powerful (smart) behaviour makes sense in the original use case, but when (re-)used in another layer of functionality, it becomes a burden, because strange side-effects need to be taken care of.