Building a better localization system for GameMaker

We recently added localization support to Ghost Croquet!

In doing so, I have also improved the localization system that I use in my projects (and perhaps you could too).

This is a post about the process, architecture, and challenges.

Problems & existing solutions

Suppose we have a menu.

I could have thought of an entirely hypothetical menu, but I don't have to, because I have a game open right here.

Let's have a look at the multiplayer menu in Ghost Croquet.

We use a somewhat imGUI-inspired menu system that I made, so our multiplayer/lobby list menu code looks like this (sans comments and some naming choices):

function mdef_multiplayer() {
    var n = lobbylist_get_entry_count();
    menu_label($"{n} public game{n != 1 ? "s" : ""}:");
    for (var i = 0; i < n; i++) {
        if (menu_button(string("{0}/{1} {2}'s game",
            lobbylist_get_entry_slots_taken(i),
            lobbylist_get_entry_slots_total(i),
            lobbylist_get_entry_title(i),
        ))) lobbylist_join_public(i);
    }
    if (menu_button("Refresh")) lobbylist_refresh_entries();
    if (menu_button("Host public")) lobbylist_host_public();
    if (menu_button("Host private")) lobbylist_host_private();
    if (menu_exit()) lobbylist_close();
}

Pretty neat, right? One of these days I'll get around to writing proper documentation for it so that it can be released and used in other games too. But that's besides the subject of this post.

So we have eventually decided that it'd be nice to have the game available in languages besides English - particularly since it's not exactly a text-heavy game (~1000, a few hundred of which are the level editor manual and messages).

So how's this code going to look after we add in localization?

There are no less than a few existing localization systems for GameMaker (example list).

A short (and potentially slightly biased) poll across developer friends suggests that Lexicon is a relatively popular option (along with just rolling your own), so let's use that as an example.

The code would look something like this if we used Lexicon:

function mdef_multiplayer() {
    var n = lobbylist_get_entry_count();
    menu_label(lexicon_text("multiplayer.header"), n);
    for (var i = 0; i < n; i++) {
        if (menu_button(lexicon_text("multiplayer.entry",
            lobbylist_get_entry_slots_taken(i),
            lobbylist_get_entry_slots_total(i),
            lobbylist_get_entry_title(i),
        ))) lobbylist_join_public(i);
    }
    if (menu_button(lexicon_text("multiplayer.refresh"))) lobbylist_refresh_entries();
    if (menu_button(lexicon_text("multiplayer.host_public"))) lobbylist_host_public();
    if (menu_button(lexicon_text("multiplayer.host_private"))) lobbylist_host_private();
    if (menu_exit()) lobbylist_close();
}

There are a few things that I don't like here, subjectively (and don't think that this is a dunk at Lexicon - these exist in most systems):

  1. lexicon_text is a bit of a mouthful for what is possibly going to become the most-called function in the game.
    Fortunately, GameMaker has macros, so we could shorten it to LT or even __ like some other extensions do.
  2. Having the multiplayer. prefix duplicated across the menu is redundant, but kind of necessary - you cannot assume that every English string is unique or that if a word can be used in multiple contexts in English, other languages can also use the same word for all of them.
  3. The header now has to read # public game(s) unless we add a special case for English - different languages have different rules for pluralization. More on this shortly.
  4. It is now slightly harder to quickly tell what the menu's going to read without peeking into the English translation file, or to quickly look up text that you see in-game without searching the translation file.
  5. If we made a typo somewhere or a string is missing from the current locale, what is the function going to return? An empty string? A string from the parent language? The localization key?

    The documentation doesn't say that at the time of writing, but a quick look at the source code suggests that it's language.key for Lexicon.
    (and also this function is marked as forceinline! Isn't it kind of big for that?)

    Now, you might be thinking "Making mistakes? Couldn't be me!", and you might be correct, but sometimes it happens to anyone. Even to a critically acclaimed title like "Halo: The Master Chief Collection":

    Missing challenge name string in Halo: The Master Chief Collection
    (...are these received from a server?)

Most of these would have been like this regardless of the solution used - this is how localization is usually done. But can we (I) do better?

Getting shorter

The first thing I would like to get rid of is the duplicated multiplayer. prefix.

Not to make you think that I'm going to butcher Lexicon for this, let's assume the localization function prefix to be testloc_ or something.

So instead of having one big map for all of the language strings, let's split it into "groups" (main menu, multiplayer, etc.). Then each group could have its own little struct that holds a handful of strings and has a get(key, ...values to insert) function.

This would make the code a little shorter:

function mdef_multiplayer() {
    var g = testloc_get_group("multiplayer");
    var n = lobbylist_get_entry_count();
    menu_label(g.get("entry_count"), n);
    for (var i = 0; i < n; i++) {
        if (menu_button(g.get("entry",
            lobbylist_get_entry_slots_taken(i),
            lobbylist_get_entry_slots_total(i),
            lobbylist_get_entry_title(i),
        ))) lobbylist_join_public(i);
    }
    if (menu_button(g.get("refresh"))) lobbylist_refresh_entries();
    if (menu_button(g.get("host_public"))) lobbylist_host_public();
    if (menu_button(g.get("host_private"))) lobbylist_host_private();
    if (menu_exit()) lobbylist_close();
}

So that's a little nice, but can we get shorter? I mean, apart of renaming the get function to s or something.

The answer is yes: recent GameMaker versions have methods, and methods can be bound to structs/instances. That means that we can have a function that returns a bound get function of a group. And thus:

function mdef_multiplayer() {
    var L = testloc_get_group_func("multiplayer");
    var n = lobbylist_get_entry_count();
    menu_label(L("EntryCount"), n);
    for (var i = 0; i < n; i++) {
        if (menu_button(L("entry",
            lobbylist_get_entry_slots_taken(i),
            lobbylist_get_entry_slots_total(i),
            lobbylist_get_entry_title(i),
        ))) lobbylist_join_public(i);
    }
    if (menu_button(L("refresh"))) lobbylist_refresh_entries();
    if (menu_button(L("host_public"))) lobbylist_host_public();
    if (menu_button(L("host_private"))) lobbylist_host_private();
    if (menu_exit()) lobbylist_close();
}

Base text and fallback

Managing localization strings externally is preferable if you have a writer on your project, but if it's one of the developers writing the strings (or at least their initial revisions) or the strings are already in the source code, might as well keep them in the code.

Thus,

menu_label(L("EntryCount"), n);

would become

menu_label(L("EntryCount", "{0} public game(s)"), n);

and that's good, but

if (menu_button(L("refresh"))) lobbylist_refresh_entries();

would become

if (menu_button(L("refresh", "Refresh"))) lobbylist_refresh_entries();

and that's less good.

Having contemplated the situation for a bit, I have settled on a workaround: if the key is the same as the value and you don't need template insertions, you can omit the value argument. And with that, the code would look like this:

function mdef_multiplayer() {
    var L = testloc_get_group_func("multiplayer");
    var n = lobbylist_get_entry_count();
    menu_label(L("EntryCount", "{0} public game(s)"), n);
    for (var i = 0; i < n; i++) {
        if (menu_button(L("Entry", "{0}/{1} {2}'s game",
            lobbylist_get_entry_slots_taken(i),
            lobbylist_get_entry_slots_total(i),
            lobbylist_get_entry_title(i),
        ))) lobbylist_join_public(i);
    }
    if (menu_button(L("Refresh"))) lobbylist_refresh_entries();
    if (menu_button(L("Host public"))) lobbylist_host_public();
    if (menu_button(L("Host private"))) lobbylist_host_private();
    if (menu_exit()) lobbylist_close();
}

And that's pretty close to how the extension has it! But we're not done yet.

String extraction

So the strings are now in the code, which means that they are not in an external file.

And you wouldn't want to copy keys to a template file by hand, would you?

Auto-generating the files to be translated isn't a new idea - POEdit can do that, for example, though only for simpler (func("key")) localization systems that can be matched with a regular expression.

I wanted something nicer, so I wrote a tool that figures out what your macros are and then reads the code while keeping those macros in mind.

This means that you can do something like this

#macro LF cmn_loc_get_func
#macro LC_SETTINGS "Settings"
// ...
var L = LF(LC_SETTINGS + ".Audio");
label = L("Volume", "Volume: {0}%", 65);

and still get a Settings.AudioVolumeVolume: {0}% in your output file!

As for formats, I have settled on INI, JSON (of few kinds), and CSV as the most common options.

So, for example, if you chose JSON, for the function shown earlier would produce something like the following:

{
    "multiplayer": {
        "EntryCount": "{0} public game(s)",
        "Entry": "{0}/{1} {2}'s game",
        "Refresh": "Refresh",
        "Host public": "Host public",
        "Host private": "Host private"
    }
}

Pluralization

And now, back to those "public game(s)".

On the Unicode website, you can find a table that lists pluralization rules for over 200 languages and is assumed to be generally-correct (with occasional missing rules for less-common languages).

After thinking about this for a bit, I figured that trying to parse (and subsequently maintain) formulas from this table might not be such a good idea, so I wrote a slightly more GML-like parser that can take plural group expression definitions like this (English)

one: # == 1

or like this (Ukrainian)

one: is_int(#) and # % 10 == 1 and # % 100 != 11
few: is_int(#) and # % 10 in [2..4] and # % 100 not in [12..14]
many: is_int(#) and (# % 10 == 0 or # % 10 in [5..9] or # % 100 in [11..14])

and use them inside template strings. And thus, {0} public game(s) becomes

menu_label(L("EntryCount", "{0} public {0, plural,\n"
    + "one {game}\n"
    + "other {games}\n"
+ "}:", n));

And the Ukrainian localization would be able to use

{0} онлайн {0, plural,
one {гра}
few {гри}
other {ігор}
}:

to have the text read "1 онлайн гра", "2 онлайн гри", ..., "5 онлайн ігор" as the language's rules have it.

However, if you are writing a system that'll only be used in a game or two, it is much easier to hard-code the pluralization rules for your supported languages. Parsers are cool, but time-consuming to make.

Context/comments

Another thing that's nice to have but is often overlooked is context!

It's nice to include explanations of what exactly something means or when the text appears in-game, which can spare the translators of having to pester you for this information (or providing potentially-inaccurate information).

So, for example, if you have

if (menu_button(L("Entry", "{0}/{1} {2}'s game",
    lobbylist_get_entry_slots_taken(i),
    lobbylist_get_entry_slots_total(i),
    lobbylist_get_entry_title(i),
))) { /** @loc
        Shows for each lobby that you can join.
        The variables are:
        {0}: Current number of players
        {1}: Maximum number of players
        {2}: Steam name of the person hosting the lobby
    */
    lobbylist_join_public(i);
}

The extractor will pick up the text from that @loc comment and put it next to the string in the generated file (if format permits).

Crowdin

When I made an earlier version of this system for use in Nuclear Throne, the primary use case was DIY translations, so I generated a single lang-example.ini file that people can base their translations on.

This time, I wanted to try something a little easier for people to collaborate on.

After looking at a few localization platforms, I decided to try Crowdin as it seemed pretty nice:

  • If you give it multiple files, strings will be shown in groups
    (instead of a single big, intimidating pile - such as Minecraft's 138 pages of strings)
  • Context/comments is supported (including multi-line snippets) and is shown next to the strings in the editor.
  • But also you can upload screenshots and mark strings on them!
    That's also a good way to show where and how something appears.
  • A fairly generous free tier - for Ghost Croquet's word count, we could host a few dozen languages in Crowdin before we'd have to pay a subscription.

And thus, after figuring out what input format looks the cleanest in the editor (JSON key-value pairs for Chrome extensions, oddly enough), I have tweaked the generator to output sets of these, uploaded them to Crowdin, and now we could edit the translations!


Editing localization strings in Crowdin editor

The workflow's pretty reasonable - you set things up for translation, people translate them, and you download the translations in one or other format.

Conclusions and links

Overall, we have gone from this (no localization support)

function mdef_multiplayer() {
    var n = lobbylist_get_entry_count();
    menu_label($"{n} public game{n != 1 ? "s" : ""}:");
    for (var i = 0; i < n; i++) {
        if (menu_button(string("{0}/{1} {2}'s game",
            lobbylist_get_entry_slots_taken(i),
            lobbylist_get_entry_slots_total(i),
            lobbylist_get_entry_title(i),
        ))) lobbylist_join_public(i);
    }
    if (menu_button("Refresh")) lobbylist_refresh_entries();
    if (menu_button("Host public")) lobbylist_host_public();
    if (menu_button("Host private")) lobbylist_host_private();
    if (menu_exit()) lobbylist_close();
}

to this (full localization support)

function mdef_multiplayer() {
    var L = cmn_loc_get_func("Menu.Multiplayer");
    var n = lobbylist_get_entry_count();
    menu_label(L("EntryCount", "{0} public {0, plural,\n"
        + "one {game}\n"
        + "other {games}\n"
    + "}:", n));
    for (var i = 0; i < n; i++) {
        if (menu_button(L("Entry", "{0}/{1} {2}'s game",
            lobbylist_get_entry_slots_taken(i),
            lobbylist_get_entry_slots_total(i),
            lobbylist_get_entry_title(i),
        ))) lobbylist_join_public(i);
    }
    if (menu_button(L("Refresh"))) lobbylist_refresh_entries();
    if (menu_button(L("Host public"))) lobbylist_host_public();
    if (menu_button(L("Host private"))) lobbylist_host_private();
    if (menu_exit()) lobbylist_close();
}

So I would consider this a success!

There's only a little more code per script/event, we have localized the game into a few languages, and the system should be intuitive enough for other people to use.

This extension can be found on itch.io.

Its documentation can be found on my website.

And if you'd like to read more about integrating localization in Ghost Croquet, check out the Steam announcement!

Thanks for reading!

Related posts:

5 thoughts on “Building a better localization system for GameMaker

  1. it’s strangely empowering to know that the system i hacked together basically on my own is pretty similar to the one that you ended up making (except for plurals, which i expect i’ll have to deal with somehow eventually)

    • Thank you! I forgot to respond at the time, but I’ve had a look at your generator and that’s just about how the early version of this system worked too – just little regular expressions, none of this multi-step parsing nonsense.

      If it wasn’t for my desire to support macros and string concatenation, I would probably still stick to that approach.
      .
      As you have remarked earlier, “item x5” is a fairly clean-looking way to avoid dealing with plurals.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.