Web Reflection: bad

My JavaScript book is out! Don't miss the opportunity to upgrade your beginner or average dev skills.

Showing posts with label bad. Show all posts

Sunday, August 19, 2012

Why JSON Won ... And Is Good As It Is

I keep seeing developers complaining about different things with JSON protocol and don't get me wrong, I've been the first one trying to implement any sort of alternative starting from JSOMON and many others ... OK?

Well, after so many years of client/server development is not that I've given up on thinking "something could be better or different", is just that I have learned on my skin all reasons JSON is damn good as it is, and here just a few of these reasons.

Reliable Serialization ?

No, 'cause YAGNI. There are few serialization processes I know that kinda work as expected and since ever, PHP serialize is a good example.
Recursion is not a problem, is part of the serialization process to solve it, as well as classes together with protected and private properties. You can save almost any object within its state, even if this object won't be, as reference, the same you serialized .. and I would say: of course!
There are also two handy methods, __sleep and __wakeup, able to let you save an object state in a meaningful way and retrieve it back or perform some action during deserialization.

Are these things available in JSON ? Thanks gosh NO! JSON should not take care of recursive objects ... or better, it's freaking OK if it's not compatible 'cause recursion is a developer matter or issue, not a protocol one!
All JSON can do is to provide a way to intercept serialization so that any object with a .toJSON() method can return it's own state and any time JSON.parse() is performed, it could bring back, if truly necessary, its recursive property.

So, at the end of the day, JSON implementations might provide already a similar way to __sleep and __wakeup objects but it should be the JSON string owner, the service, the developer, to take care of these problems, and simply because ....

Universal Compatibility

JSON is a protocol and as a protocol it should be as compatible as possible with all languages, not only those C like or others with similar comments ... there won't be comments ever in JSON, 'cause the moment you need comments, you don't need a transport protocol 'cause programming languages have always ignored developers comments ... and also, for compatibility reasons, not all programming languages would like to have // or /* */ or even # as inline or multiline comment ... why would they?

Specially in .NET world most of documentation is written in a pseudo XML, can you imagine you bothering yourself to write such redundant markup language to write something often ignored by developers ? Would you like to have that "crap" as part of the data you are sending or receiving via JSON as part of that protocol? I personally don't ... thanks! 'cause I believe a transport protocol should be as compact as possible and without problems.
Here JSON wins once again 'cause it's compatible, with its few universal rules, with basically everything.

Different Environments

This is the best goal ever reached from a protocol, the fact that every programming language can represent somehow what JSON transports.
Lists, Arrays, Dictionaries, Objects, Maps, Hashes, call them as you want, these are the most used and cross language entities we all deal with on daily bases, together with booleans, strings, and numbers.

OK, OK, specially numbers are quite generic but you might admit that the world is still OK with a generic Int32 or Float32 number and with 64bits compatible environments, these numbers could be of a different type but only if you will never deal with 32 bits environments ... make you choice ... you want a truly big number? Go for it, and loose the possibility to "talk" with any other 32 bit env ... not a big deal if you own your data, kinda pointless memory and CPU consumption if you deserialize everything as 64 bits ... but I am pretty sure you know what you are doing so ... JSON is good in that case too.

No Classes

And again thanks gosh! You don't want a protocol that deals with classes, trust me, 'cause you cannot write a class in all possible programming languages, can you? If you can, even in those programming languages where classes never existed 'cause classes are simply an abstract concept represented by the word "class" but representable in billion ways with other languages (e.g. via just objects in JavaScript).
Classes and namespaces issues, if you want, are there in any case.
The good part of JSON, once again, is the ability to intercept serialize and unserialize process so that if you like to send instances, rather than just objects, you can use all tools provided by the implementation, and I am showing in this case a JavaScript example;


function MyClass() {
  // doesn't matter what we do here
  // for post purpose, we do something
  this.initialized = true;
}
MyClass.prototype.toJSON = function () {
  this.__class__ = "window.MyClass";
  return this;
};

var myClassObject = JSON.stringify(new MyClass);
// "{"initialized":true,"__class__":"window.MyClass"}"

Once we send this serialized version of our instance to any other client, the .__class__ property could be ignored or simply used to understand what kind of object was it.

Still in JavaScript, we can deserialize easily the string in such way:


function myReviver(key, value) {
  if (!key) {
    var instance = myReviver.instance;
    delete instance.__class__;
    delete myReviver.instance;
    return instance;
  }
  if (key == "__class__") {
    myReviver.instance = myReviver.createInstance(
      this, this.__class__
    );
  }
  return value;
}

myReviver.createInstance = "__proto__" in {} ?
  function (obj, className) {
    obj.__proto__ = myReviver.getPrototype(className);
    return obj;
  } :
  function(Bridge) {
    return function (obj, className) {
      Bridge.prototype = myReviver.getPrototype(className);
      return new Bridge(obj);
    };
  }(function(obj){
    for (var key in obj) this[key] = obj[key];
  })
;

myReviver.getPrototype = function (global) {
  return function (className) {
    for (var
      Class = global,
      nmsp = className.split("."),
      i = 0; i < nmsp.length; i++
    ) {
      // simply throws errors if does not exists
      Class = Class[nmsp[i]];
    }
    return Class.prototype;
  };
}(this);

JSON.parse(myClassObject, myReviver) instanceof MyClass;
// true

Just imagine that __class__ could be any property name, prefixed as @class could be, or with your own namespace value @my.name.Space ... so no conflicts if more than a JSON user is performing same operations, isn't it?

Simulating __wakeup Call

Since last example is about __sleep, at least in JavaScript easily implemented through .toJSON() method, you might decide to implement a __wakeup mechanism and here what you could add in the proposed revival method:


function myReviver(key, value) {
  if (!key) {
    var instance = myReviver.instance;
    delete instance.__class__;
    delete myReviver.instance;
    // this is basically last call before the return
    // if __wakeup was set during serialization
    if (instance.__wakeup) {
      // we can remove the prototype shadowing
      delete instance.__wakeup;
      // and invoke it
      instance.__wakeup();
    }
    return instance;
  }
  if (key == "__class__") {
    myReviver.instance = myReviver.createInstance(
      this, this.__class__
    );
  }
  return value;
}

Confused ? Oh well, it's easier than it looks like ...


// JSON cannot bring functions
// a prototype can have methods, of course!
MyClass.prototype.__wakeup = function () {
  // do what you need to do here
  alert("Good Morning!");
};

// slightly modified toJSON method
MyClass.prototype.toJSON = function () {
  this.__class__ = "window.MyClass";
  // add __wakeup own property
  this.__wakeup = true;
  return this;
};

Once again, any other environment can understand what's traveling in therms of data, but we can recreate a proper instance whenever we want.

How To Serialize

This is a good question you should ask yourself. Do you want to obtain exactly the same object once unserialized? Is that important for the purpose of your application? Yes? Follow my examples ... no? Don't bother, the less you preprocess in both serializing and unserializing objects, the faster, easier, slimmer, will be the data.

If you use weird objects and you expect your own thing to happen ... just use tools you have to intercept before and after JSON serialization and put there everything you want, otherwise just try to deal with things that any other language could understand or you risk to think JSON is your own protocol that's missing this or that, while you are probably, and simply, overcomplicating whatever you are doing.

You Own Your Logic

Last chapter simply demonstrates that with a tiny effort we can achieve basically everything we want to ... and the cool part is that JSON, as it is, does not limit us to create more complex structures to pass once stringified or recreate once parsed and this is the beauty of this protocol so please, if you think there's something missing, think twice before proposing yet another JSON alternative: it works, everywhere, properly, and it's a protocol, not a JS protocol, not a X language protocol ... just, a bloody, protocol!

Thanks for your patience

Wednesday, June 30, 2010

JavaScript Random Hints

Update some point has been made more clear, thanks to Dmitry Soshnikov for suggestions.

Forget the global undefined

Too many developers relies into undefined variable in ES3, and all they should do is to set undefined = true on global scope and see if the application or all unit tests break or not. I am going to demonstrate how simple is, at least in ES3, to redefine by mistake the global undefined.


// inside whatever closure/scope
function setItem(key, value) {
  this[key] = value;
}

// later in the scope, setItem may be reused
// through call or apply for whatever object
setItem.call(myObject, myKey, "whatever value");

// if for some reason the first argument used as context
// is undefined, the "this" will point to the global context

// if for some reason the second argument used as key
// is undefined, the accessor will cast it as undefined string

// the result of latter call with these two
// common or simple conditions
// is the quivalent of:

window.undefined = "whatever value";

// where window["undefined"] or window[undefined]
// are the equivalent of window.undefined

undefined; // "whatever value"

Here a list of side effects every time we deal with undefined:

it is not safe to compare variables against undefined since it is simply implicitly declared as unassigned variable, as var u; could be, but in the global scope

it is not possible to minify it since it is a well known variable

its access is slow, since requires scope lookup potentially up to the global one every time we write that bloody variable name in our code, wherever it is

Here is a list of best practices to avoid the usage of undefined:

define your own undefined variable, if necessary, via var undefined;, and only if you are sure that nothing can change it's value in the current scope (e.g. eval)

the first point will allow minifiers/compilers to shrink the undefined variable into, possibly, one char, so it is size safe

if null value can be considered undefined as well, where both undefined and null do not support accessors such unknown.stuff, compare the potential undefined variable against null, since by specs null == null && null == undefined && null != 0 && null != "" && null != NaN && null != false && null != whateverThatIsNotNullOrUndefined

About latter point, null is a constant so no lookup is performed since we cannot re-assign the null value and every time somebody tells you something like: "doooode, JSLint is complaining about that 'v == null'" simply tell him that JSLint is suggesting a bad practice and point this person to this post :D
About typeof v === "undefined" ? Bullshit! a typeof operation with an eqeqeq against a string that cannot be minified ... are you a programmer that knows the language or you think JSLint, as automation tool, is the bible? In the second case I have already posted JSLint: The Bad Part why this tool is not always ideal: have a look!
It must be told that in ES5 the global undefined won't be enumerable/writable/configurable anymore, and that showed example will fail since this reference, when null or undefined is passed through call/apply, will be null as well (errors).

Cache the bloody variable or namespace !

It does not matter how fast and cool are nowadays CPUs, it's about common sense.
If you spot something like:


my.lib.utils.Do.stuff(some);
my.lib.utils.Do.stuff(thing);

fix it ASAP!
This is a list of side effects caused by duplicated access for whatever it is:

a namespace requires a lookup usually up to the global scope, this costs time behind the scene

minifiers/compilers cannot optimize anything so far since properties cannot be shrinked so this technique is bigger application size prone

getters are always invoked, and 99.9% of the time this is not what we are looking for. JavaScript has a beautiful and easy interface exposed to developers but behind the scene there are 90% of the time getters which means slower performances for everybody.

About latter point, we can simply check, from one of the fastest browsers in the market, how much a simple node.childNodes[0] could cost.
If you are not familiar with C++, just imagine this piece of JavaScript every time we access an index of some Array:


Array.prototype.item = function (index) {
  var
      undefined,
      pos = 0,
      // useless if we return lastItem but for some reason there ...
      n = this.slice(0, 1)
  ;
  // optimized for multiple access with the same index
  if (this._isItemCacheValid) {
      if (index == this._lastItemOffset)
          return this._lastItem;
    
      var
          diff = index - this._lastItemOffset,
          dist = Math.abs(diff)
      ;
      if (dist < index) {
          n = this._lastItem;
          pos = this._lastItemOffset;
      }
  }
  if (this._isLengthCacheValid) {
      if (index >= this._cachedLength)
          return undefined;

      var
          diff = index - pos,
          dist = Math.abs(diff)
      ;
      if (dist > (this._cachedLength || (this._cachedLength = this.length)) - 1 - index) {
          n = this[this._cachedLength - 1];
          pos = this._cachedLength - 1;
      }
  }
  if (pos <= index) {
      while (n && pos < index) {
          n = this[pos];
          ++pos;
      }
  } else {
      while (n && pos > index) {
          n = this[pos];
          --pos;
      }
  }
  if (n) {
      this._lastItem = n;
      this._lastItemOffset = pos;
      this._isItemCacheValid = true;
      return n;
  }
  return undefined;
};

Array.prototype._lastItemOffset = 0;

[1,2,3].item(0);

Now, consider above code against what we usually do which is simply arr[0] ... and consider that this is just the single item access for a ChildNodeList collection ... how many other operations we want to perform through DOM searches, namespaces, Array access, etc etc? Cache It Whenever It Is Possible!, and this should be the point number one in every "performances oriented" article or book.

The only thing to consider when we cache are object methods, if we "de-context" a method that use this reference inside, we can simply cache the object, it is going to be enough, but if we access a property twice, as often happens with domNode.style property, as example, cache it!


// my.name.space.Do.stuff is a method
// of my.name.space.Do where this is used

// WRONG
var stuff = my.name.space.Do.stuf;
stuff(); // global this in ES3, error in ES5

// BETTER
var Do = my.name.space.Do;
Do.stuff();

Use the in operator

For the same getter/access reason, this classic check can be harmful:


if (someObject.property) {
    // do stuff with property
}

Specially if we are dealing with host objects, some access could cause errors (e.g. (domNode || unknown).constructor in IE or similar operations) while a classic:


if ("property" in object) {
    // do stuff with object.property
}

can "save the world" since we do not access the property but we simply check if it is accessible ... a tiny difference extremely important and fast in any case.

Avoid redundant Function Expressions

We are kinda lucky here, since functions as first class objects, are truly fast to create in JavaScript. These do not require a class to be used, neither an object or special tricks, these are simply variables able to be invoked executing what has been defined inside their body through an activation context process, plus named arguments, the length of these arguments, the name of the function, if any, plus arguments variable if accessed in the body, and "almost nothing else" ... but we can already get the fact functions do not come for free, do we?

Here there are a couple of function expression common mistakes.

Closure inside a Loop


// WRONG
// the classic way to avoid
// unexpected behavior on lazy evaluation
// the equivalent of 20 functions
for (var i = 0; i < 10; ++i) {
  (function (i) {
      setTimeout(function () {
          alert(i);
      }, 15);
  }(i)); // trap it!
}

// BETTER
// 11 functions rather than 20
// same behavior, better performances
for (var
  getTimeout = function (i) {
      return function () {
          alert(i);
      };
  },
  i = 0; i < 10; ++i
) {
  setTimeout(getTimeout(i), 15);
}

Array.extras Misunderstood


// WRONG
// a new expression for each Array.extra operation
// a lookup to access another this reference
var self = this;
what.forEach(function (value, index, what) {
  // do something
  self[index] = value;
});
ever.forEach(function (value, index, what) {
  // do something
  self[index] = value;
});

// BETTER
// 1 function against N
// this reference through the native interface
// easier to debug/maintain/improve/change
function forThisEachCase(value, index, what) {
  this[index] = value;
}
what.forEach(forThisEachCase, this);
ever.forEach(forThisEachCase, this);

Use Natives !!!

Newcomers are lazy, it does not matter if they are noob or they have 10 years of Java, PHP, Python, C#, or Ruby over their shoulders, they will always look for a framework able to do truly simple stuff for the simple reason that they don't know/get yet JavaScript which is different from every other common programming language. This is the best starting point to slow down every little operation.
Many frameworks offer classes, mixins, native wrapper which aim is often the one to invert arguments for whatever reason simplifying operations (e.g. the classic $.eash in jQuery which is making junior developers think that the native forEach will pass the index as first argument and this as current reference).
If three lines based on native prototypes/functionality are more than 1 magic method call, go for it!
Specially if standard, natives will never change while libraries are constantly improving and APIs changing as well for whatever valid reason.
If you need a for in loop, do the for in knowing what you are doing, ignoring JSLint if necessary 'cause you are dealing with objects that inherits from objects and you may be interested into inherited properties/methods as well.
If the problem is the list of property, we can always create safer ways to interact with what we would like to enumerate, as example:


var SafeLoop = (function (id) {
  function SafeLoop() {
      this[id] = [];
  }
  SafeLoop.prototype.keys = function() {
      return this[id];
  };
  SafeLoop.prototype.enum = function (key) {
      var enumerable = this.keys();
      enumerable.push.apply(
          enumerable,
          typeof key !== "object" ? arguments : key
      );
      return this;
  };
  return SafeLoop;
}(Math.random()));

var o = new SafeLoop().enum("a", "b");
o.a = o.b = o.c = o.d = 123;
o.e = 456;

// enum accepts N arguments or an array
o.enum(["c", "d"]);

// fast and safe, without an hasOwnProperty call for each item
for (var key = o.keys(), i = key.length; i--;) {
  alert([key[i], o[key[i]]]);
  // d, c, b, and a with 123
}

// extend the prototype if necessary
SafeLoop.prototype.forEach = function (callback, context) {
  for (var
      enumerable = this.keys(),
      i = 0, length = enumerable.length,
      key;
      i < length; ++i
  ) {
      key = enumerable[i];
      callback.call(context, this[key], key, this);
  }
};

o.forEach(function (value, key, o) {
  alert([value, key, o]);
  // 123,a|b|c|d,[object Object]
});

Above code is just an example "AS IS" and there are many part to improve. The concept is that JavaScript allows us to define what we need in such simple way and most of the time we don't want to include and "move" a whole framework to do something simple as loops are, as example, do we? If we do, well, we are creating redundant function expressions, including extra bytes for just some extra functionality, and potentially making the application slower ... remember: we are in the mobile era, CPUs are not those you have in your MacBook Pro and frameworks should be used only when we have real benefits, e.g. selector engines or much more complicated methods. Do you agree?