Introducing: HaxMin

haxmin

It've been a long long time. How have you been?
Meanwhile I have been studying, working on some things (you can also see from the finally linked Works page), some of which I am going to highlight in next few days here.
One for today is the Haxe-made JavaScript minifier+obfuscator, called HaxMin. You could just navigate to it's Github repository or read the full post below.

A bit of background

If you've been ever developing JavaScript application with use of Haxe programming language, you might have noticed that reduction of output size can be a little bit... problematic. Especially if you are to ever use Reflect. Consider the following Haxe code:

class Main {
	var adr:String;
	var fld:String;
	function new() {
		adr = "fld"; 
		Reflect.setField(this, adr, "Value");
		trace(Reflect.field(this, adr));
		trace(fld);
	}
	static function main() new Main();
}

After compilation it results in following JS:

(function () { "use strict";
var Main = function() {
	this.adr = "fld";
	this[this.adr] = "Value";
	console.log(Reflect.field(this,"fld"));
	console.log(this.fld);
};
Main.main = function() {
	new Main();
}
var Reflect = function() { }
Reflect.field = function(o,field) {
	var v = null;
	try {
		v = o[field];
	} catch( e ) {
	}
	return v;
}
Main.main();
})();

Nice, clean, but not really... compact, in that sense. Fortunately, we can fix that with use of JavaScript minifiers, such as Closure or YUI. After processing code through Closure it looks like this:

(function(){var a=function(){this.adr="fld";this[this.adr]="Value";console.log(b.field(this,"fld"));console.log(this.fld)};a.main=function(){new a};var b=function(){};b.field=function(a,b){var c=null;try{c=a[b]}catch(d){}return c};a.main()})();

Much better! Almost 1/3 less filesize (244 bytes), although some identifier names are left untouched, and, mind it, those are class fields, which may matter the most. But Closure also has a "advanced" mode, so why not use that, right? After processing the code looks like this:

(function(){function b(){}function a(){this.a="fld";this[this.a]="Value";console.log(b.field(this,"fld"));console.log(this.c)}a.b=function(){new a};b.field=function(a,b){var c=null;try{c=a[b]}catch(d){}return c};a.b()})();

Fancy! Also smaller in size (222 bytes), and this would have been it, if not a small detail: the code no longer works. Let's look at pretty-print:

(function() {
  function b() {
  }
  function a() {
    this.a = "fld";
    this[this.a] = "Value";
console.log(b.field(this, "fld")); console.log(this.c);
} a.b = function() { new a; }; b.field = function(a, b) { var c = null; try { c = a[b]; } catch (d) { } return c; }; a.b(); })();

As can be seen from hinted lines, it's no surprise that it does not work actually - the existing "fld" changed into "c" in one case but Reflect references were kept as "fld", thus efficiently splitting variable into two. It's no surprise generally - it'd take some extra analysis to be able to tell that we're using array access and field access at once.
Maybe we could just stop using Reflect alltogether, but it's not an easy task either - while being categorized as "helper" API, Reflect is the heart of almost every tweening or animation library, as well as many others.
So, to summarize the problems between Reflect and minification:

  • Field & property access breaks with advanced renaming.
  • Getter/setter compatibility breaks upon almost any renaming.
  • Simple optimization remains functional but may not be always sufficient.

After some research, it became apparent that existing tools do not exactly have support for required features, I thought - why not make my own?

And, some days and a number of bugfixes and improvements later, the program is ready.

The logic in things

The principle behind HaxMin is pretty simple actually:

  1. Incoming source code is parsed into a list of tokens.
    One could argue, that it is not necessary to parse code, but this contributes to structure of program, permitting to make minification and obfuscation into something better than a giant loop.
  2. Identifiers (variable, field, and method names) are counted and ordered by number of occurrences in code. Identifiers prefixed with get_ / set_ are counted towards unprefixed versions. On a second pass, strings that consist solely of identifiers (e.g. "main" as opposed to main) are also counted towards the total.
  3. Each unique identifier is given a new name depending on it's frequency. This means that the most commonly used names are renamed into single-letter ones, then two letters, and so on. Reserved language structures and whitelisted names (more about this later) are omitted. get_ / set_ prefixed are also given new names.
  4. To not leave it too simple, new identifier names are also shuffled between the groups (based on length). This adds some pleasing randomization and a space of 4096 possible names for now-2-letter identifiers.
  5. Identifiers are assigned new names. String contents are modified if fully matching "to be renamed" identifier name (thus may be used for Reflect) or being a dot-separated combination of identifier names (e.g. com.site.package, as used in Type API).
  6. Resulting code is formed by passing a modified list of tokens through a "printer" function, which compacts the resulting code where appropriate while keeping it functional and strict mode compatible.

In practice

Theory is good, but practice is better, isn't it? So let's pass the previously mentioned JS code through HaxMin. Result looks like the following:

(function(){"use strict";var k=function(){this.l="h";this[this.l]="Value";console.log(g.i(this,this.l));console.log(this.h)};k.$=function(){new k()};var g=function(){};g.i=function(j,i){var _=null;try{_=j[i]}catch(e){};return _};k.$()})();

At the size of 239 bytes, you could call it fairly... fascinating. As fact, identifier renaming not only degrades the readability of code (for the curious) by a whole level, but even wins us some bytes versus the standard Closure optimization (while keeping the code strict mode compatible, mind it).
Oh, and it still works too!

Obviously, strict mode is not a must in many cases, and Closure noticeably does the job of cutting optional operators better, so why not combine the two? After passing HaxMin-made code through Closure at "basic optimization" level, we get the following:

(function(){var a=function(){this.l="h";this[this.l]="Value";console.log(b.i(this,this.l));console.log(this.h)};a.$=function(){new a};var b=function(){};b.i=function(a,b){var c=null;try{c=a[b]}catch(d){}return c};a.$()})();

"use strict" is gone, and so are some parenthesis, but in result, the code is now just... 223 bytes. That's just a single byte more than result from "advanced optimization" of Closure (as such, the size would have been smaller than it, should the identifier names have been longer). And the code still works. And you can still use functionality without worrying about minifier or obfuscator breaking the code on processing.

Notes and things

To answer a few common possible questions:

  • Identifier renaming is unable to efficiently tell between plain strings and ones used in Reflect. This means that a string "score" can get renamed to name of variable score, even if it's not used to reference it. Such can be easily fixed by either changing the string slightly (splitting into two, adding/removing spaces or characters, etc.) or adding name to whitelist.
  • To provide callable externs, whitelists should be used. For shorter syntax, you can also add names to whitelist via command line arguments.
  • While HaxMin includes identifier name obfuscation, it is primarily filesize-targeted obfuscation. If you are after something to reduce code readability extremely, you can use HaxMin in conjunction with other programs intended for this purpose (provided that they do not attempt to rename identifiers).

In conclusion

Overall, HaxMin has it's uses for both Haxe-JS and custom-written JS. So should it be of interest, feel free to grab the project from Github, stare at code (spoiler: it's terrific), or leave a comment. Have fun!

Github repository

Related posts:

9 thoughts on “Introducing: HaxMin

  1. Ok I found the problem: with a try{}catch(){};finally : there’s a semi colon before finally that should not be there. Is this something on your side? If so do you want me to create a new issue on github?

  2. Hi, it may be a late question, but I wanted to give your project a try.
    So my case is: I use several external js libs. How can I make sure the same names get used everywhere? Whitelist all keywords? Merge all js files before using Haxmin? Thanks!

    • Currently you would need to merge them beforehand. I’ll look into adding support for processing multiple files in future.

        • Hi!
          I’ve just generated a single js file from multiple js sources (using uglifyjs). Now I passed this file through HaxMin but it does not work.
          So here is my noob questions:
          Do I need to use a whitelist and then how do I use a whitelist? For instance if I use pixi.js what do I need to add to the whitelist?
          Thanks!

  3. Great job! Finally we can minify code without breaking it in such aggressive mode. I wonder how minification affects JS script performance. I heard that enabling “strict mode” can enable some optimizations.

  4. Wow, that’s really pretty awesome! I’m not have really much knowledge about writing an own script, but like to “tune up” some Apps by using iFunBox and a “to do” from thus guys who know.
    This is a gift for ppl like me :-D
    I agree: keep up your good work!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.