GameMaker: Split string on delimiter

Time to time I see people come up with increasingly strange solutions for a simple problem of splitting a string into multiple substrings on a delimiter.

So here's a post about the algorithm and with code samples for all existing versions of GameMaker.

The algorithm

The idea, as outlined in the above illustration, is pretty simple:

  • On each iteration, the function finds the offset of first occurrence of delimiter in the string.
  • Part of string from start and until the delimiter is added to the resulting list.
  • String is then offset to start after the delimiter.
  • On last iteration (when delimiter is no longer found in the string), the remaining string is added to the resulting list.

Implementations follow,

GameMaker 8.1+

In recent and semi-recent version of GameMaker, the implementation is pretty straightforward - store (or create) a list for results, write down the length of delimiter, and perform the loop exactly as outlined above:

/// string_split(:string, delimiter:string, ?list<string>):list<string>
var s = argument[0], d = argument[1], r;
if (argument_count >= 3) {
    r = argument[2];
    ds_list_clear(r);
} else r = ds_list_create();
var p = string_pos(d, s);
var dl = string_length(d);
if (dl) while (p) {
    p -= 1;
    ds_list_add(r, string_copy(s, 1, p));
    s = string_delete(s, 1, p + dl);
    p = string_pos(d, s);
}
ds_list_add(r, s);
return r;

The if (dl) check handles an exception-case that handles a case when empty string is passed in as delimiter. If you want to split the string into an array of individual characters in such case (like ECMAScript and some other implementations do), you would change the code from that point forward to the following instead:

if (dl) {
    while (p) {
        p -= 1;
        ds_list_add(r, string_copy(s, 1, p));
        s = string_delete(s, 1, p + dl);
        p = string_pos(d, s);
    }
    ds_list_add(r, s);
} else repeat (string_length(s)) {
    ds_list_add(r, string_char_at(s, p));
    p += 1;
}
return r;

Legacy versions (8.0 and older)

Things are much alike, except there wasn't support for optional arguments or var name = value at the time, so you've got to work with what you've got:

/// string_split(:string, delimiter:string, list<string>):list<string>
var s, d, r, p, dl;
s = argument0;
d = argument1;
r = argument2;
if (r < 0) r = ds_list_create(); else ds_list_clear(r);
p = string_pos(d, s);
dl = string_length(d);
if (dl) while (p) {
    p -= 1;
    ds_list_add(r, string_copy(s, 1, p));
    s = string_delete(s, 1, p + dl);
    p = string_pos(d, s);
}
ds_list_add(r, s);
return r;

Future versions (GMS 2.x)

As of writing this post, it's not out yet, but one of the upcoming versions of GameMaker: Studio 2 is said to introduce a string_pos_ext(substring, string, startoffset) function, which would allow to perform the action with ~2 times fewer string operations:

/// string_split(:string, delimiter:string, ?list<string>):list<string>
var s = argument[0], d = argument[1], r;
if (argument_count >= 3) {
    r = argument[2];
    ds_list_clear(r);
} else r = ds_list_create();
var p = string_pos(d, s), o = 1;
var dl = string_length(d);
if (dl) while (p) {
    ds_list_add(r, string_copy(s, o, p - o));
    o = p + dl;
    p = string_pos_ext(d, s, o);
}
ds_list_add(r, string_delete(s, 1, o - 1));
return r;

Array variant (GameMaker: Studio 1.x)

Some people will argue that returning an array is better than passing/returning a list.
At a cost of a minor overhead you can have just that:

/// string_split(:string, delimiter:string):array<string>
var s = argument[0], d = argument[1];
var rl = global.string_split_list;
var p = string_pos(d, s);
var dl = string_length(d);
ds_list_clear(rl);
if (dl) while (p) {
    p -= 1;
    ds_list_add(rl, string_copy(s, 1, p));
    s = string_delete(s, 1, p + dl);
    p = string_pos(d, s);
}
ds_list_add(rl, s);
// create an array and store results:
var rn = ds_list_size(rl), rw;
if (os_browser != browser_not_a_browser) {
    rw[0] = rl[|0]; // initial allocation
    for (p = 1; p < rn; p++) rw[p] = rl[|p];
} else {
    p = rn; while (--p >= 0) rw[p] = rl[|p];
}
return rw;

Where global.string_split_list would need be assigned a new list (ds_list_create()) on game start as it will be reused on each call.

If you are wondering about what's up with os_browser check, that's behaviour difference between native and JS-based targets:

  • On native, any write operation outside the array' bounds results in array being resized to be large enough to fit the new element, meaning that it's more efficient to write from highest to lowest index (since the write to highest index will create an array large enough to fit all elements.
  • On JS, arrays have to be initialized from lowest to highest, as a write outside the [0...length] range turns the array into a hashtable, which gets slower on larger arrays.

As per copying from a list to an array instead of creating a large enough array to begin with, the reason is simple - there's no fast and reliable method of finding the number of non-overlapping delimiter' occurences in a string because string_count("..", "1...2") would yield 2.

Array variant (GameMaker: Studio 2.x)

GameMaker: Studio 2 corrects the aforementioned oddity by introducing a function specifically for making an array of given size. Combined with aforementioned string_pos_ext, you get this:

/// string_split(:string, delimiter:string):array<string>
var s = argument[0], d = argument[1];
var rl = global.string_split_list;
var p = string_pos(d, s), o = 1;
var dl = string_length(d);
ds_list_clear(rl);
if (dl) while (p) {
    ds_list_add(rl, string_copy(s, o, p - o));
    o = p + dl;
    p = string_pos_ext(d, s, o);
}
ds_list_add(rl, string_delete(s, 1, o - 1));
// create an array and store results:
var rn = ds_list_size(rl);
var rw = array_create(rn);
for (p = 0; p < rn; p++) rw[p] = rl[|p];
return rw;

Additional notes

  • Two common naive approaches to the problem involve doing series of string_copy comparisons (to catch the delimiter; slow) or having a second nested loop for per-character comparison between strings (a little faster).
  • With incredibly large (thousands of symbols) strings, it is likely to be more beneficial to rewrite the function to be buffer-based instead (if in GMS 1.x\2.x), but that is also a point where you would usually stop and reconsider why you are working with so much data in such format to begin with.

In conclusion

Despite being a pretty trivial algorithm, there are different small improvements to be found with advancement of GameMaker' versions.

Have fun!

Related posts:

Leave a Reply

Your email address will not be published. Required fields are marked *