Web Reflection: packed

My JavaScript book is out! Don't miss the opportunity to upgrade your beginner or average dev skills.

Showing posts with label packed. Show all posts

Tuesday, August 16, 2011

Last Version Of JSON Hpack

Update

created github repository with (currently) JavaScript, PHP5 and Python versions.

Update after quick chat on twitter with @devongovett who pointed out there is a similar standard called JSONDB I have created a JSONH(Flat) version. It looks slightly faster on mobile so I may opt for this one rather than Array of keys at index 0.
The whole array is flat and it changes from [{a:"A"},{a:"B"}] to [1,"a","A","B"] where the empty collection would be [0] rather than [[]].

Also more details here on how to JSONH Hybrid JS Objects.

A while ago I proposed a homogeneous collections optimizer nick named JSON.hpack.

What I wasn't expecting is that actually different projects and developers adopted this technique to shrink down JSON size.

Basic Advantage Of JSON.hpack

Gzip and deflate work really good with repeated chunks of strings and this is why homogeneous collections have really good compression ratio there.
However, gzip and deflate compression does not come for free!
If we compress everything on server side we can easily test the CPU overload compared with uncompressed data.
Via JSON.hpack we can still serve small or huge amount of dynamic and static data without necessarily use realtime compression.

Basic Notions On Compressors

There is no ideal algorithm yet for compressed data, any of them has pros and cons.
A really good compression ratio may cost a lot and the algorithm is efficient if at least decompression is fast. 7-Zip is one example, it takes more than normal zip to create a single file, but final ratio is usually much better and decompression extremely fast.
An incremental compressor as GIF is is both fast to encode and fast to decode. However, it's average compression ratio is really poor compared with PNG, which again is not fast as GIF to encode, but almost as fast as GIF to decode and capable of bringing much more complex data inside.
On the client side we may like a truly fast compressor in order to send data to the server where more horses power can decompress in a reasonable time. Still servers have not unlimited resources.

My Latest Improvements Over JSON.hpack

On the web is all about network layer latency, completely unpredictable specially in these smartphones/pads days.
We also need to consider high traffic, if things go really well, and most important mobile platforms computation power, basically the equivalent of a Pentium 3 with a GeForce card from 2001.

Which Is The Best Compromise

The original version of JSON.hpack is able to understand which compression level is the best one for the current collection of objects. Unfortunately this is slow both on the server side and even more on the client side.
In my opinion an intermediate layer as JSON.hpack is should bring advantages as fast as possible in both client and server.
I probably failed the first time because I was more focused on coolness rather than efficiency.
As example, if it takes 3X CPU load to save 5% of bytes compared with the most basic compression ratio, something is wrong because it's simply not worth it.
As summary, the best compromise for the latest version of this compressor is to be freaking fast with small overhead and providing a good average compression ratio.

Welcome JSONH

In a single call this object is able to pack and unpack homogeneous collections faster than native JSON and specially on mobile platforms.

How Is It Possible

To be honest I have no idea and I was surprised as well. All I could think about is the fact that JSONH makes data flat which means no recursions per each object in the original list.
This seems to boost up performances while packing and make JSON.parse life easier while unpacking.
The extreme simplification of the algorithm may have helped a lot as well.

JSONH Source Code

now on github!
~~And I had no time to create the equivalent C#, PHP, and Python version~~.
In any case you can see how simple is the logic and I bet anybody can easily reproduce those couple of loops in whatever programming language it is.
The minzipped size is 323 bytes but advantages over network calls can be massive. As example, if we check the console and the converted size in the test page, we can see the JSONH version of the same collection is 54% smaller ... and for a faster stringify and parse? ... it cannot be that good, isn't it :)

JSONH Is Suitable For

any RESTful API that returns homogenous collections

gzip on the fly costs too much due high traffic

map applications and routes, [{"latitude":1.23,"longitude":5.67},{"latitude":2.23,"longitude":6.67}]
will be [["latitude","longitude"],1.23,5.67,2.23,6.67]

any other case I am not thinking about right now

As Summary

It is good to take old projects created a while ago and think what could be done better in current days. It's both about re-thinking with different skills and experience over real world cases. I am not sure I made everybody happy with this latest version but I am pretty sure I won't ask client or server side to be slower than native JSON + native gzip compression since at that point all advantages will be simply lost.
This revisited version of JSONH is surprisingly faster, smaller, and easier to implement/maintain than precedent one so ... enjoy if you need it ;)

Saturday, February 21, 2009

On My Vicious JavaScript Code

Disclaimer

This post is a reply to kangax question:I’m sorry but why such vicious coding style?
I'd like to explain my point of view about JavaScript code style so, if interested, please read until the end, thank you :)

Some Background

I study and use JavaScript since year 2000 and one of the most interesting thing about this language is that even if it is "that old" you never stop to learn something new (techniques, strategies, adapted patterns, etc).
At the same time, the "biggest trouble" of this language is its weight in a page, which is the reason we all use minified and gzipped or packed versions of our application. Coming from a "33Kb modem internet era" I always tried to pay attention about the size and I studied and created both a JavaScript minifier and a compressor.

Compressors Rules

Unless we are not using an "incremental" compressor, as GIF is, we should consider that packer algorithm, as gzip and deflate, are dictionary based compressors (Huffman coding).
These techniques to minimize our scripts size give us best compression ratio if we use a reduced dictionary. In few words, if we have these two piece of codes:


// code A
function tS(obj){
    return obj.toString();
};

// code B
function toString(obj){
    return obj.toString();
};

even if the length of the code A is smaller, it will use 1 or more bytes than code B once compressed.
The reason is really simple. Try to imagine that both codes are converted in this way:


// code A
0 1(2){3 2.4();};[function,tS,obj,return,toString]

// code B
0 1(2){3 2.1();};[function,toString,obj,return]

Code B has a reduced dictionary, and reduced dictionary means best ratio. That's why when I introduced years ago the little bytefx library I defined it "packer friendly" (variables name are similar in the entire code).

How Compressors Changed My Way To Code in JavaScript

Accordingly, and probably against every common programming naming convention practice, those studies let me think that as long as a variable name is meaningful enough, it does not matter if it is perfectly clear to recognize or not, since in every case when we want to understand a code, we need to read it from the beginning to the end. Here there are a couple of example:


// (pseudo)compilable language practices
var splitted = "a.b.c".split(".");
function argsToArray(args){
    return Array.prototype.slice.call(args);
};

// my code style
var split = "a.b.c".split(".");
function slice(arguments){
    return Array.prototype.slice.call(arguments);
};

The split variable is meaningful enough, it is something that has been splitted via split method, and the same is for the slice function: I use it to call the slice Array method over a generic ArrayLike object, as the Object arguments is.

Confusion, Ambiguity, WTF About arguments? Not Really!

The reason I chose inside the slice function the variable name arguments is simple: in that function I do not need its own ArrayObject arguments, while I will use that function mainly with an ArrayObject arguments variable from other functions.


function doStuff(){
    var Array = slice(arguments); // other stuff
};

How can be then that arguments variable considered ambiguous?
Moreover, that function "will cost zero once compressed", because in every code I've seen so far the generic function arguments variable, plus the slice method from Array's prototype, are always part of the rest of the code. So, in terms of bytes, I have added a useful function in only 8 extra bytes: (){();};

OMG! I even called a variable Array inside the doStuff function!!!

Yes I did indeed, and the reason is still the same: almost every library or piece of code use the Array keyword but if I do not need the global Array constructor in that scope, why should I leave such meaningful dictionary word untouched?
Moreover, the Array global constructor is used mainly because of its prototype, but once I have an Array variable called Array I can simply use directly its methods instead of the global one, considering those are exactly the same.


function doStuff(){
    var Array = slice(arguments);
    Array.push.apply(this, Array);
    return this
};

Finally, the day we will need the native global constructor, String, Object, Array, Function, we can always use the safe window.Array or, if we decided to name a variable window inside a scope, the self.Array or the this.Array or the in line global scope resolution call:


var toString = function(){return this.Object.prototype.toString}();

Is It Just About Naming Convention?

I use different strategies to be sure my code will be the smallest possible. As example, and as showed above on toString assignment, if I need a runtime called function which returns a value and it is not that big function, I do not use parenthesis around.
If an assignment or a return is before an end bracket "}" I do nt usually put the semi colon at the end. For the interpreter, it does not matter, because it transform in its own way the function.


alert(function(){return this.Object.prototype.toString});

// produced output
function(){
    return this.Object.prototype.toString;
}
// indentation, plus semicolons where necessary

The rest is usually managed by the minifier, so the fact that a return"a" does not need a space between return and the String "a" is superfluous if we use, for example, a clever minifier as the YUI Compressor is. The same is for loop with a single line to execute, I do not use brackets (I love Python indentation style).

for(var key in obj)
    if(typeof obj[key] == "string")
        obj[key] = parse(obj[key]);

My Style Cons

tools like JSLint often fail to recognize my code as valid so I do not use JSLint (... but fortunately I do not need it, I now my code is beautiful :P)

in a Team we should all respect Team code conventions, so my code style could not be the best choice if other programmers do not know JavaScript that much and/or use a specific/different code style

if the development product will not use minifier + compressors my code style could produce bigger size instead of smaller (Object instead of obj or o)

my code could require a better lecture than a quick one, but this is true for every library if we would like to understand properly a trick, the entire scope, an assignment, etc, etc ...

My Style Pros

final size, minified and packed/gzipped, is most of the case reduced if compared with other "code scripters"

more big is the library I would like to extend, less my implementation costs, thanks to a wide range of words to use (common like key, Object, arguments, Function, etc plus others)

variable names are often more meaningful than other and instead of function(str, callback, obj){} I bet we cannot understand variables type with a function like function(String, Function, Object){}. So, if possible and when necessary, my code style is closer to strict programming language than every other, allowing me to save time writing directly String as variable name, instead of /* String */ str

Conclusion

I know for somebody my vicious style could be considered a non-sense or something too "extreme" or hard to maintain. But, on the other hand, I am not developing public libraries (I prefer to extend them rather than reinvent the wheel if it is not necesary at all) and my implementations, for me, are much more easy to understand/maintain than others since I often perform linear (sometime tricky) operations which make absolutely sense to me and in a single line when it is possible. You like it? You don't? It does not matter, at least now you know how much pervert a programmer mind could be for an "out of the common rules language" as JavaScript is.
Enjoy my craziness :D

Update
Since telling you I have studied these cases was not enough, here is a truly simple demonstration for the Anonymous user that left the comment:


<?php
$output = array();
for($i = 0; $i < 80; $i++)
    $output[] = 'testMe'; 
$str = implode('.', $output);
echo 'WebReflection: '.strlen(gzcompress($str, 9)); // 21
echo '
';
$output = array();
for($i = 0; $i < 80; $i++)
    $output[] = 'test'.str_pad($i + 1, 2, '0', STR_PAD_LEFT); 
$str = implode('.', $output);
echo 'Anonymous: '.strlen(gzcompress($str, 9));     // 144
?>

21 bytes against 144 so please, if you think my point is wrong, demonstrate it :)

Monday, March 31, 2008

Bases, merits, and defects of packed code

How many times we have seen an included JavaScript apparently incomprehensible?
These examples could explain better what I mean:
packed.it via MyMin


eval((function(M,i,n){return '0("1");'.replace(/\w+/g,function(m){return (n[m]!=i[m]&&i[m])||(i[m]=M[parseInt(m,36)])})})('alert.test'.split('.'),{},Object.prototype))

Dean Edwards packer


eval(function(p,a,c,k,e,d){e=function(c){return c};if(!''.replace(/^/,String)){while(c--)d[c]=k[c]||c;k=[(function(e){return d[e]})];e=(function(){return'\w+'});c=1};while(c--)if(k[c])p=p.replace(new RegExp('\b'+e(c)+'\b','g'),k[c]);return p}('1('0');',2,2,'test|alert'.split('|'),0,{}))

These are only two parsers which aim is to reduce code size using an inline decompression technique that is able to evaluate source code after common keywords replacement.

Basis of portable and packed code

The main goal of these services is to compress a source reducing occurrences of repeated words, using different technique to create both compressed string and inline function with decompression algo.
In this paragraph we will see a rudimentary example on how can be possible to create our own compressor. Let's go with the first function:


function pack(code, keywords){

 var re = /\w+/g,
  filter = {},
  key;

 // foreach word or number in string
 while(key = re.exec(code))
  // make found one unique
  // if does not exists it creates them
  // otherwise it overwrite them
  filter[key[0]] = 0;
  // for example, if code is "a b a", filter 
  // will contain only two properties, a and b
  // with value 0 for both of them

 // with found list of unique words or numbers
 for(key in filter)
  // avoid inherited Object.prototype parameters or methods
  if(filter.hasOwnProperty(key))
   // add key to array
   // save into filter key position in the array
   // converting them into base 36 string
   filter[key] = (keywords.push(key) - 1).toString(36);
   // for example, if code is "a b a", filter.a value will be 0, and filter.b will be 1
 
 // for each word or number
 // return keyword index in base 36 format
 return code.replace(re, function(key){
  return filter[key];
  // for example, if code is "a b a"
  // returned code will be "0 1 0"
 });
};


var myCode  = 'alert("this alert will show this text");',
 myKeywords = [],
 packed = pack(myCode, myKeywords);
alert(packed); // 0("1 0 2 3 1 4");

Reading comments, we can understand the logic behind a client side based compressor.
And since compression operation follows usually a "one to many" logic, where one is the operation to compress, and many are clients that will decompress code, more simple and fast will be the decompression operation, more powerful, portable, and compatible will be our algorithm.


function unpack(code, keywords){

 // foreach word or number
 return code.replace(/\w+/g, function(key){
  // return related keywords value
  // converting base 36 string into
  // base 10 index
  return keywords[parseInt(key, 36)];
  // for example, if code is "0 1 0"
  // returned string will be like
  // keywords[0] + " " + keywords[1] + " " + keywords[0]
 });
};

(unpack(packed, myKeywords) === myCode); // true

The last ring of our chain, is a function that is able to create automatically final result code:


function portablePack(code){
 var
  // keywords container
  keywords = [],
  
  // packed version of the code
  packed  = pack(code, keywords),
  
  // object to make returned string evaluable
  safe  = {"'":"\\'", "\\":"\\\\", "\n":"\\n", "\r":"\\r"};
  
 // to solve problems with some special char inside the string
 // make packed result more safe replacing chars using safe object keys values
 packed = packed.replace(/'|\\|\n|\r/g, function(match){
  return safe[match];
 });
 
 // return an evaluable string
 return "eval((function(p,l){return p.replace(/\\w+/g,function(k){return l[parseInt(k,36)]})})('" + packed + "','" + keywords.join(".") + "'.split('.')))";
 
 // created string has to contain an inline function
 // that should be able to return original code
 // to eval function. Created string will be
 // something like
 // eval((function(packedString,keywords){return unpack(packedString,keywords)})("packed string", "key.words.list".split(".")))
};

This is our first home made client side compression example, not really so efficient, but good enough to understand the logic behind.

Merits of client side compression technique

It simply does not necessary require server side operations to optimize the result size of our scripts, that are every day more than ever "thanks" to the Web 2.0 era.
Required bandwidth will be less than before, while download speed will be increased, and more code we pack, more possibilities we have that ratio between source and packed code will be greater, thanks to common names for common tasks programming routines.

Defects of packed code

Nowadays, quite every browser supports gzip or deflate runtime decompression.
These compression algorithms are really efficient thanks to decompression speed, 10 to 100 faster than pure JavaScript operations plus evaluation, and thanks to their support provided by every server side program language.
Every time we download a client side packed code, even if file will be saved in browser cache, it has to be executed every time we will visit that page again.
So, if our goal is to increase page interaction speed during navigation, JavaScript decompression delay will be a problem.
If this problem will be hilarious or heavy, it depends only on client hardware and its browser performances.
At the same time, using a gzip or deflate compression over a packed code, will not truly increase performances and result will be bigger than clear code gzip compression.

The reason is that common compression algorithms create compressed code using a dictionary that will contain common, repeated, words or letters, found in original string.

Since JavaScript compressors usually replace numbers or words with a unique identfier to be able to recreate original string, resulted string will contain much more characters pairs than before and every baseN encoded key, that is not human friendly and for this reason rarely wrote in original code, could be one more byte inside final compressed string.

Finally, client side compressors are not, usually, 100% compatible with every kind of code, and this is another reason to prefer minifiers + gzip|deflate to obtain the best, fastest, and finally smallest, result.

Conclusion

Nowadays, pure JavaScript client side runtime decompression technique is not really a necessary, but in some case, it could do things that gzip or deflate will never be able to do, for example merging both JavaScript and CSS using a single and common keywords list to produce a unique file that could contain both JavaScript and CSS with the best size result requiring only 1 download instead of 2 (1 for JavaScript, 1 for CSS). That is what packed.it is able to do since 2007, but never forget speed issue and please remember that more great will be packed code, more delay there will be in every page that will use them.