What you should know before starting a localisation

Like anything concerning people, making a software speak multiple languages is more complex than it seems. (That is actually the reason why this blog is in English only!)

The localisation (like any other software feature) doesn't have to be perfect, but to make an informed decision about how much you want/need to invest in it, I think it is good to know the possible pitfalls in advance before you just Ctrl+C Ctrl+V your English GUI to a spreadsheet, toss it over to a translator and try to replace the strings 1:1 thinking you don't need to touch the code.

Reuse thoughtfully

Firstly, it is not a bad idea to think about possible future localisation a bit from the moment you start creating an application because proper identification and extraction of hardcoded strings spread across many implementation layers can be the most expensive part of the whole localisation process. One important aspect that overlaps with the software design is planning what parts are reusable.

If you have the same "Are you sure? Yes / No" pop-up used in many different places, it is probably a good candidate for specifying the text in one place only and also translate it the same way in all languages.

Or maybe not: Generic action verbs like "remove", "start", "run" etc. (often used on buttons) may need to be translated very differently based on the context they appear in. So a good strategy may be to not rush into reuse, but mark duplicates so they can be refactored when you are sure it is worth it. Remember to prefer duplication over the wrong abstraction.

Preserve context

I have learned a few languages and it fascinates me how different languages use very different ways to express a meaning. And the same language can use very different words or phrases for something sounding "the same" in another language based on the context. A good translator will try to understand as much context as possible before choosing the right translation. So letting the translators play with a test version of your application or at least provide them with some screenshots can improve the quality of their output a lot. And to make their quality work possible, you have to structure the translated strings by whole sentences (that carry a clear non-ambiguous meaning), not words or pairs of words.

Translating sentences brings the problem of how to insert variable data (numbers, names or visually distinct components) in the sentence. To solve it we use placeholders:

"You have {{message count}} messages"
"Document last modified by {{author name}}"
"Click {{details link}} to find out more"

The placeholders should be used consistently, concatenating strings to be dispalyed to users without using placeholders should be considered a code smell.

Grammar differences

But it gets even more complex. The translated sentence may be dependent on the variable parts, in the above example, we should have a variant "You have 1 message" if message count is 1. In different languages the pluralization can require more than two variants, for example in Slovak:

You have 1 message = Máte 1 správu
You have 2-4 messages = Máte 2-4 správy
You have 5- messages = Máte 5- správ

There is a great article about the grammar rules to take into account.

Keep the translations up-to-date

When you already support multiple languages, all the different language (or regional) variants become part of your software. So the impact of any change to the software will be multiplied by the number of supported languages (regions). At some point you will need to have a process and tools (typically a Content management system, CMS) that will allow both the programmers to make changes rapidly and the translators to update the translations where needed, including removing the translations that are not used anymore.

Graphics

Another area where the impact of supporting different languages (or even cultures) may not be obvious is the graphical design and layout of texts displayed by an application. The design has to consider

the length of texts in different languages can be very different, English usually being among the shortest ones
word order is different
there are languages with text flowing from the right to the left
even symbols (pictograms or icons) may not be usable universally

In some situations you may need to "translate images" instead of just text strings.

Technical considerations

In addition to the above-mentioned problems directly connected to the difference among languages themselves, there are some problems caused by how software is implemented and how data are stored and processed by computers.

Character sets

Languages use different sets of letters comparing to English, from some slight extensions of the Latin alphabet to a completely different systems like Chinese or Korean. The character encoding has to support all the languages. One of the motivations for me to start using the Java language was it was one of the first languages natively integrating the Unicode standard. Unfortunately it chose to use the UTF16 encoding internally (changed to Latin-1 where possible in Java 9), but wherever possible (like in your APIs and text files) it is recommended to use UTF8 (see the detailed explanation why). It is important to test your application with texts using characters outside the ASCII and Latin-1 sets to check if they are not corrupted by some incorrect encoding conversion or usage of a font not containing all the letters.

Besides the ability to preserve and correctly display the characters, you should check if and where your application needs to sort text alphabetically. Be careful with the SQL database SORT BY clause. It depends on the "collation" setting of the database. Supporting multiple aplhabetical orders depending on the language may mean you will have to implement the sorting in your application code. Similarly, it is dangerous to convert between "lowercase" and "uppercase" letters: You should check if it works for letters like ŕ/Ŕ - and most non-European alphabets don't have the "uppercase"/"lowercase" concept at all.

Searching is another interesting topic. Users of some languages may expect some special "aliases" to work:

"skusam" (latin text without diacritic signs) matching similarly looking "skúšam"
Latin transcription "sladoled" matching Cyrillic "сладолед".

Date, time and number formats

It is quite obvious that different languages and regions use different formatting, but nevertheless it is often forgotten and operating system defaults are used instead. The best approach is to use well-known abstractions provided by libraries for your programming language as soon as possible to parse inputs coming as strings and to format them again to output strings as late as possible. And of course you have to make sure to pass the correct locale (language+region combination) to the formatters.

Check all the layers

Localisation is primarily a concern for the so-called front-end layer of the application. But beware, there may be user-facing texts that are fully generated in the "back-end", skipping the "front-end" layer localisation mechanism:

Email messages sent to users
PDF, CSV or other dynamically generated files for download
errors or other messages received from a 3rd party back-end system
product names form a database
country names
...

These usually need to be assigned unique snd stable codes (like ISO 3166 for countries) so that we have a reliable and maintainable list of strings to translate.

Conclusion

Even though internationalisation/localisation of software is commonly considered a technical implementation detail, in fact it is an important architectural aspect that requires a collaboration of different professions. Programmers need to understand how to make their code ready to communicate in different languages. Translators need to take care not to make typos in string placeholders and graphics designers must know that the resulting view is built up from separate components that need to be flexible enough so that different languages fit into them.

I hope this article can help you when you work on your next application localisation.

You can comment on this post on LinkedIn

Back to all blog posts