GameMaker: Minifying JSON

If you've spent some time working with JSON in GameMaker: Studio, you may have noticed that the built-in functions aren't exactly interested at making output laconic - they always write numbers with 6-digit precision, meaning that 3.5 always becomes 3.500000 even though the extra zeroes serve no purpose whatsoever.

The problem becomes more prominent when encoding data that is largely numbers - say, if writing down a shuffled sequence of numbers,

var map = ds_map_create();
//
var list = ds_list_create();
for (var i = 0; i < 32; i++) ds_list_add(list, i);
ds_list_shuffle(list);
ds_map_add_list(map, "order", list);
//
var json = json_encode(map);
ds_map_destroy(map); // destroys the list in the map as well
show_debug_message(json);

The output is as following:

{ "order": [ 30.000000, 22.000000, 2.000000, 8.000000, 17.000000, 14.000000, 25.000000, 9.000000, 20.000000, 29.000000, 10.000000, 26.000000, 6.000000, 15.000000, 21.000000, 1.000000, 11.000000, 3.000000, 24.000000, 12.000000, 19.000000, 31.000000, 7.000000, 28.000000, 18.000000, 4.000000, 0.000000, 5.000000, 27.000000, 16.000000, 23.000000, 13.000000 ] }

Where, out of 357 bytes, 250 bytes are whitespace or unneeded "precision".
A little inconvenient, if you don't have a compression algorithm on hand.
But, of course, that can be helped with a script.

(post revised in 2018)

The idea

The idea is simple enough, you loop over string, strip whitespace, and strip any trailing zeroes on numeric values, all while making sure that you don't "minify" anything inside strings.

A naive implementation

Here's the script from original 2016 version of this post.

/// json_minify(json_string)
// (PLEASE REFER TO THE NEW VERSION INSTEAD)
var s = argument0;
var i = 1, n = string_length(s);
var r = buffer_create(n + 1, buffer_grow, 1); // string buffer, for perforamnce.
while (i <= n) {
    var q = i;
    var c = string_ord_at(s, i);
    i += 1;
    switch (c) {
        case 9: case 10: case 13: case 32: // (insignificant whitespace)
            break;
        case ord('"'): // string
            while (i <= n) {
                c = string_ord_at(s, i);
                if (c != ord("\")) { // regular characters
                    i += 1;
                    if (c == ord('"')) break; // string ends
                } else i += 2; // skip over escape characters, e.g. `\"`
            }
            buffer_write(r, buffer_text, string_copy(s, q, i - q));
            break;
        default:
            if (c >= ord("0") && c <= ord("9")) { // numbers
                var pre = true; // whether reading pre-dot or not
                var till = q; // index at which meaningful part of the number ends
                while (i <= n) {
                    c = string_ord_at(s, i);
                    if (c == ord(".")) {
                        pre = false;
                        i += 1;
                    } else if (c >= ord("0") && c <= ord("9")) {
                        // write all pre-dot, and till the last non-zero after dot:
                        if (pre || c != ord("0")) till = i;
                        i += 1;
                    } else break;
                }
                buffer_write(r, buffer_text, string_copy(s, q, till - q + 1));
            } else buffer_write(r, buffer_text, string_char_at(s, q)); // other things
    } // switch
} // while
//
buffer_write(r, buffer_u8, 0); // string delimiter `\0`
buffer_seek(r, buffer_seek_start, 0); // rewind the buffer
s = buffer_read(r, buffer_string); // read the string written to it
buffer_delete(r); // remove the string buffer
return s;

It was done more or less the same way as it would be in other languages - read characters from a string, write matching characters and substrings into a buffer (some languages have a separate StringBuilder class, in GM you just use a "grow" type buffer).

And that would seem fine, but there's a catch - GameMaker stores strings in UTF-8 format.

While this is a good choice for a number of reasons, it also means that each character takes up between 1 and 4 bytes depending on it's "group", and thus operations like string_char_at or string_copy may be doing a small loop to locate the exact offset in a string.

There are two usual solutions to this. If you very specifically need to peek bytes out of string, you can use string_byte_at, and if you need subregion manipulations, you can put it into a buffer.

2018 implementation

So here we are doing the second approach,

/// json_minify(json_string)
// initialization
// in old versions of GMS, you'd have this ran separately instead.
// in GMS2 it'd need to be @"..." instead of just "..."
gml_pragma("global", "
global.g_json_minify_fb = buffer_create(1024, buffer_fast, 1);
global.g_json_minify_rb = buffer_create(1024, buffer_grow, 1);
");
var src = argument0;
// copy text to string buffer:
var rb = global.g_json_minify_rb;
buffer_seek(rb, buffer_seek_start, 0);
buffer_write(rb, buffer_string, src);
var size = buffer_tell(rb) - 1;
// then copy it to "fast" input buffer for peeking:
var fb = global.g_json_minify_fb;
if (buffer_get_size(fb) < size) buffer_resize(fb, size);
buffer_copy(rb, 0, size, fb, 0);
//
var rbpos = 0; // writing position in output buffer
var start = 0; // start offset in input buffer
var pos = 0; // reading position in input buffer
var next; // number of bytes to be copied
while (pos < size) {
    var c = buffer_peek(fb, pos++, buffer_u8);
    switch (c) {
        case 9: case 10: case 13: case 32: // `\t\n\r `
            // flush:
            next = pos - 1 - start;
            buffer_copy(fb, start, next, rb, rbpos);
            rbpos += next;
            // skip over trailing whitespace:
            while (pos < size) {
                switch (buffer_peek(fb, pos, buffer_u8)) {
                    case 9: case 10: case 13: case 32: pos += 1; continue;
                    // default -> break
                } break;
            }
            start = pos;
            break;
        case 34: // `"`
            while (pos < size) {
                switch (buffer_peek(fb, pos++, buffer_u8)) {
                    case 92: pos++; continue; // `\"`
                    case 34: break; // `"` -> break
                    default: continue; // else
                } break;
            }
            break;
        default:
            if (c >= ord("0") && c <= ord("9")) { // `0`..`9`
                var pre = true; // whether reading pre-dot or not
                var till = pos - 1; // index at which meaningful part of the number ends
                while (pos < size) {
                    c = buffer_peek(fb, pos, buffer_u8);
                    if (c == ord(".")) {
                        pre = false; // whether reading pre-dot or not
                        pos += 1; // index at which meaningful part of the number ends
                    } else if (c >= ord("0") && c <= ord("9")) {
                        // write all pre-dot, and till the last non-zero after dot:
                        if (pre || c != ord("0")) till = pos;
                        pos += 1;
                    } else break;
                }
                if (till < pos) { // flush if number can be shortened
                    next = till + 1 - start;
                    buffer_copy(fb, start, next, rb, rbpos);
                    rbpos += next;
                    start = pos;
                }
            }
    } // switch (c)
} // while (pos < size)
if (start == 0) return src; // source string was unchanged
if (start < pos) { // flush if there's more data left
    next = pos - start;
    buffer_copy(fb, start, next, rb, rbpos);
    rbpos += next;
}
buffer_poke(rb, rbpos, buffer_u8, 0); // terminating byte
buffer_seek(rb, buffer_seek_start, 0);
return buffer_read(rb, buffer_string);

Notes are as following:

  • buffer_fast does not allow to write strings to it, so we write the string into the output/string buffer first, and copy it there. This also prepares the output buffer to fit the string.
  • buffer_copy is used to copy sections from original to output buffer.
  • There's no need to check for buffer size becaue result cannot be longer than source.

When ran on aforementioned snippet, it yields

{"order":[30,22,2,8,17,14,25,9,20,29,10,26,6,15,21,1,11,3,24,12,19,31,7,28,18,4,0,5,27,16,23,13]}

which is 97 bytes and about 1/3 of the original size.

In conclusion

In "real-world" uses there are usually slightly more strings and property names, so minifying may not compact the data this well, but 30..45% size reduction is still common, and nice to have.

Have fun !

Related posts:

8 thoughts on “GameMaker: Minifying JSON

  1. A lot of good content on this site! I’ve been reading through and a lot of the snippets have been helpful. Correct me if I’m wrong, but would this shorten a value like 35.09 to 35? Seems like this function can only be used safely when you know the json holds integers, or when you know it’s safe to round off your numbers.

    • 35.09 would stay as 35.09 – that is handled in the number branch (by finding the last “noteworthy” digit in the number).

    • Alas, no. I at one point tried porting LZMAv1 (used in 7-zip), but ran into issues supposedly related to signed/unsigned bit math differences between GML and C# [from which it was ported]. Since debugging compression algorithms is not something that is even remotely bearable to do, and it was something that could only slightly benefit several projects, I abandoned the idea at the time. Perhaps later on…

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.