Internationalizing websites
Internationalized content is somewhat of a new business for me. Prior to joining the team here at PRPL, I had never done a project where translations or localization was necessary. Now, coming Monday, I will have developed 3 of PRPL's 3 custom internationalized websites. Through that, I've learned a lot and thought I would share a bit. I don't intend for this to be an end-all-be-all, just sweeping over a few points that I believe are important.
In hopes the wrath of Bobby or Aaron don't strike me down, I'm going to expose a little our of sacred planning process. We spend careful time organizing client content into an inventory: Massive spreadsheet(s) with everything you can imagine to do with web site content. It's a living, breathing set of documents that are born in planning and evolve throughout all stages of a project into and through maintenance. For shame, the one misfortune we had in the past is it's inability to allow safe internationalized development.
Developing I18n-Safe Websites
First, it's best to define what a i18n-safe website is, and what it is not. It is not a web site built on a framework with i18n and l10n (localization) components. That's a good step, but it's not ready. I18n-safe means that a website has been developed not necessarily with languages built-in, but that it is capable of adding them without code overhauls. This is done by making all content on the website readily available for translation. For example:
// Not i18n-safe:
echo 'Welcome back, ' . $username;
// I18n-safe:
echo $this->translate('WELCOME_BACK_1', $username);
That may not make a whole lot of sense: let me explain. "WELCOME_BACK_1" is the key for whatever is being used to translate. Why use a code instead of actual text? Well the preferences are on the developer; some people prefer verbose strings, short-code, a constant that generally gets the idea across of the content that it represents, which is what I favor.
The reason I choose this method is quite simply it leaves code readable and flexible for the inevitable client copy changes. The two following examples could be easily interchanged and would still make sense on the code side:
msgid "WELCOME_BACK_1"
msgstr "Welcome back, %1$s"
or
msgid "WELCOME_BACK_1"
msgstr "Hi, %1$s! Welcome back!"
Both accomplish the same thing: They are a greeting and allow placement of variables, like a person's name. As you may already know, the fancy stuff "%1$s" is a placeholder for a variable. The importance of this isn't just the copy change above but far more importantly, to allow different languages to sentence structure differently. Languages just simply aren't structured the same and your internationalization plans need to account for that.
Aside from this basic example, here's a few other quick points:
- Find translators whose native language is what needs to be translated to. This is very important to avoid having your content sound strange to native speakers.
- New features mean new translations. When quoting new work for an internationalized project, make sure to take into account the time for doing translations.
- Provide context to the translators, if you provide a word or sentence alone, it may not be properly translated. Briefly explain the objective or business context of the copy.
- Numbers, dates and economics are all localized differently. Thankfully, we have Zend_Locale in the Zend Framework to heavily assist PRPL in tackling this for custom projects.
Automate the Process
All things considered, if you have the application programmed properly, you still need to make the process of changes as easily as possible. No client wants to pay for you to sit there doing monkey data entry work, and neither do you. Standardizing the format your translators provide data to you will make your life all the easier. For example, you can drop a spreadsheet into a script which will generate the translation MO file for you (in this case, if you're using GetText). In doing so, you reduce bugs from duplicate keys, typos or general apathy -- because lets all admit it: data entry sucks.
Storing Translations of Dynamic Content
The last thing I'll touch on is storing dynamic page content. It's easy to store translation data in GetText for static content, but for things like page content in different languages, it just doesn't make sense. A typical database page schema would have a title and content fields. Something like this, for example:
CREATE TABLE `pages` (
`id` int(11) unsigned auto_increment,
`slug` varchar(100) null,
`title` varchar(100) null,
`content` text null,
PRIMARY KEY (`id`),
KEY (`slug`) );
So. Where do you put the internationalized content? Certainly you wouldn't do something as tragic as serializing an array and stuffing it into the content column, would you? First of all, gross and second of all, why retrieve more data than necessary? Instead, try something similar to the following:
CREATE TABLE `pages` (
`id` int(11) unsigned auto_increment,
`slug` varchar(100),
PRIMARY KEY (`id`),
KEY (`slug`) );
CREATE TABLE `page_content` (
`id` int(11) unsigned auto_increment,
`language_id` tinyint(2) unsigned default 1,
`page_id` int(11) unsigned default 0,
`title` varchar(100) null,
`content` text null, PRIMARY KEY (`id`),
KEY (`language_id`),
KEY (`page_id`) );
CREATE TABLE `languages` (
`id` tinyint(2) unsigned auto_increment,
`locale` varchar(6),
`title` varchar(40),
PRIMARY KEY (`id`),
KEY (`locale`) );
By doing so, you can have as many pages as you want. Translators aren't super-human, though, many times a client will publish English content first then translations will slowly trickle in. This is to be expected, so you can still provide a seamless user experience by setting a default language, and just pull that as a fallback. Of course, using a Key/Value datastore like CouchDB would certainly make it less complex, but that's for another day. I hope this helps you.
If you have your own style, I'd love to hear it.








Comments
David Rogers (not verified) says:
Published on Jul 24, 2009 @ 12:23pm
Ya makes me proud, boy... :]
Internationalizing Your ZF Application « Rob’s Blog | Purple (not verified) says:
Published on Sep 15, 2009 @ 18:17pm
[...] lined up. Coincidentally enough, it goes perfectly with my blog post from about two months ago on Internationalizing websites. While that was a more topical post and not so much on actual implementation, I thought I’d [...]
Post new comment