Web Reflection: CDN

My JavaScript book is out! Don't miss the opportunity to upgrade your beginner or average dev skills.

Showing posts with label CDN. Show all posts

Sunday, August 14, 2011

Once Again On Script Loaders

It's a long story I would like to summarize in few concrete points ...

Three Ways To Include A Script In Your Page

First of all, you may not need loaders at all.
Most likely you may need an easy to go and cross platform build process, and JSBuilder is only one of them.

The Most Common Practice

This way lets users download and visualize content first but it lets developers start the JS logic ASAP as well without mandatory need to set DOMContentLoaded or onload event.


<!doctype html>

    <head>

        <!-- head content here -->

    </head>

    <body>

        <!-- body content here -->

        <script src="app.minzipped.js">/* app here */</script>

    </body>

</html>

The "May Be Better" Practice

I keep saying that a web application that does not work without JavaScript should never be accessed by a user. As example, if a form won't submit without JavaScript what's the point to show it before it can possibly work? If your page strongly depends on JavaScript don't be afraid to let the user wait slightly more before the layout is visualized. The alternative is a broken experience as welcome. Accordingly, use the DOMContentLoaded listener over this ordered layout:


<!doctype html>

    <head>

        <!-- head content here -->

        <script src="app.minzipped.js">/* app here */</script>

    </head>

    <body>

        <!-- body content here -->

    </body>

</html>

If you don't trust the DOMContentLoaded listener you can combine both layouts:


<!doctype html>

    <head>

        <!-- head content here -->

        <script src="app.minzipped.js">/* app here */</script>

    </head>

    <body>

        <!-- body content here -->

        <script>initApp();</script>

    </body>

</html>

The Optional "defer" Attribute

We can eventually try to avoid the blocking problem using a defer attribute. However, this attribute is not yet widely supported cross browser and the result may be unexpected.
Since this attribute is basically telling the browser to do not block downloads, in the very next future it could be specified both on head script or before the end of the body.
Everything I have said about possible broken UX is still valid so ... use carefully.

The Loading Practice

Classic example is twitter on mobile browsers and any native application with a loading bootstrap screen. Also Flash based websites use this technique since ages and users are used to it.
If the amount of javascript plus CSS and assets is massive, both precedent techniques will fail.
The first one will fail because the user doesn't know when the script will be loaded plus it's blocking so the page won't respond. Bye bye user.
The second approach will result into a too long waiting time over a blank page ... bye bye user.
This loading approach will entertain the user for a little while, it will be lightweight, fast to visualize, and it can hold the user up to "5 seconds" with cleared cache ( and hopefully much less next time with cache but if more we should really think to split the logic and lazy load as much as possible ).


<!doctype html>

    <head>

        <!-- head content here -->

        <!-- most basic CSS -->

    </head>

    <body>

        <!-- most basic content -->

        <!-- "animated gif" or loader spin -->

        <script src="bigstuff.minzipped.js">/* code */</script>

        <!-- optional BIG CSS -->

    </body>

</html>

This page should be as attractive as possible and no interaction that depends on JavaScript should be shown.

Why Scripts Loaders

Because an articulated website may have articulated logic split in different files.
The main page may rely into jQuery, commonLogic, mainPageLogic.
Any sub section in the site may depend on jQuery, commonLogic, subSectionLogic, adHocSectionLogic, etc.
The build approach will fail big time here because every page will contain a different script to download in all its variants.
Moreover, thanks to CDN some library can be cached cross domain, including as example jQuery from that CDN.
In this scenario a script loader is the best solution:


$LAB

    .script("http://commoncdn.com/jquery")

    .script("commonLogic.js")

    .wait()

    .script("subSectionLogic.js")

    .wait()

    .script("adHocSectionLogic.js")

    .wait(function () {

        // eventually ready to go

        // in this section

    })

;

Above example is based on LAB.js, a widely adopted library I have actually indirectly contributed as well solving one conflict with jQuery.ready() method.

script() and wait()

LAB.js has been created with performances in mind where every script will be pre downloaded as soon as it's defined in the chained logic.
The wait() method is a sort of "JS interpretation break point" and it's really useful when a script depends on another script.
Let's say commonLogic is just a set of functions while subSectionLogic starts with a jQuery.ready(function () { ... }) call, LAB.js will ensure that latter script won't be executed until jQuery is ready ... got it?
LAB.js size once minzipped is about 2.1Kb and the best way to use it is to include LAB.js as very first script in whatever page.
AFAIK LAB.js is not yet hosted in any major CDN but I do believe that will happen soon.

Preload Compatibility

LAB.js uses different techniques to ensure both pre downloads and wait() behavior. Unfortunately some adopted fallback looks inevitably weak to me.
As example, I am not a big fun of "empty setTimeouts" solutions since these are used as workaround over unpredictable behaviors.
~~One of these behaviors is the readyState script property that on "complete" state may have or may have not already interpreted the script on "onreadystatechange" notification.~~
If we have a really slow power machine, as my netbook is, the timeout used to decide that the script has been already parsed may not be enough.
I don't want to bother you with details, I simply would like you to understand why I came out with an alternative loader.
Before I reach that point I wanna show an alternative technique to get rid of wait() calls.

Update
It looks like few setTimeout calls will be removed soon plus apparently the setTimeout I pointed out has nothing to do with wait: my bad.
In any case I don't fancy empty timers plus LAB.js logic is focused on cross browser parallel pre-downloads and for this reason a bit more bigger in size than all I needed for my purpose.

Avoiding wait() Calls

JavaScript let us successfully download and parse scripts like this without problems:


function initApplication() {

    jQuery.ready(function () {

        // whatever we need to do

    });

}

Please note that no error will be thrown even if jQuery has not been loaded yet.
The only way to have an error is to invoke the function initApplication() without jQuery library in the global scope.
In few words, we are not in Java or C# world where the compiler will argue if some namespace is accessed in any defined method and not present/included as dependency before ... we are in JavaScript, much cooler, isn't it? ;)
Accordingly, if the current page initialization is wrapped in a single function we could simply use a single wait call at the end.


$LAB // no direct jQuery calls in the global scope

    .script("http://commoncdn.com/jquery")

    .script("commonLogic.js")

    .script("subSectionLogic.js")

    .script("adHocSectionLogic.js")

    .wait(function () {

        initApplication();

    })

;

The potential wait() problem I am worried about is still there but at least in a single exit point rater than distributed through the whole loading process ... still bear with me please.

The Namespace Problem

The generic init function can be part of a namespace as well. If we have namespaces the problem is different 'cause we cannot assign my.namespace.logic.init = function () {} before my.namespace object has been defined.
In this case we either create a global function for each namespace initialization/assignment or we impose a wait() call between every included namespace based file.

yal.js - Yet Another ( JavaScript ) Loader

Update
yal.js now on github

As written in yal.js landing page I have been dealing with JS loaders for a while.
This library created a sort of "little twitter war" between me and @getify where Kyle main arguments were "why another loader?" followed by "LAB.js has better performances".

Why yal.js

It's really a tiny script that took me 1 hour tests included plus 20 minutes of refactoring in order to implement a sort of "forced preload" alternative ( which kinda works but I personally don't like and neither does Kyle ).
yal.js is just an alternative to LAB.js and we all like alternatives, don't we?
The main focus of yal.js is being as small and as cross browser as possible using KISS and YAGNI principles.

No Empty Timers Essential Script Logic

yal.js is based on script "onload" event which behavior is already defined as standard and it's widely compatible.
If not usable in some older browser, the more reliable "loaded" state of readyState property is used instead. This state comes always after the "loading" or "complete" one.
I could not trigger any crash or problem wit this approach and together with next point no need to use unpredictable timers.

Simplified Wait Logic

In the basic version of the script any wait() call will block other scripts. These won't be pre downloaded until the previous call has been completed.
However, if we consider we may not even need wait calls:


yal // no direct jQuery calls in the global scope

    .script("http://commoncdn.com/jquery")

    .script("commonLogic.js")

    .script("subSectionLogic.js")

    .script("adHocSectionLogic.js")

    .wait(function () {

        initApplication();

    })

;

yal will perform parallel downloads same way LAB.js does and, being yal just 1.5Kb smaller, performances will be slightly better on yal rather than LAB.js
Also for my bad experience with "complete" state, I feel a bit more secure with the fact that when wait() is invoked in yal.js, everything before has been surely already interpreted and executed ( but please prove me wrong if you want with a concrete example I can test online, thanks )

Just What I Need

For my random and sporadic personal projects yal.js fits all my requirements. I do not use the forced parallel downloads and I don't care. I have asked Kyle to be able to grab a subset of LAB.js getting rid of all extra features surely useful for all possible cases out there but totally unnecessary for mine. Unfortunately that would not have happened any soon so I created the simplest solution for all I personally needed.

As Summary

I am actually sorry Kyle took my little loader as a "non sense waste of time" and if that's what you think as well or if you need much more from a loader, feel free to ignore it and go happily with LAB.js
Also I am not excluding that the day LAB.js will be in any major CDN I will start using it since at that point there won't be any overhead at all and cross domain.

Finally, in this post I have tried to summarize different techniques and approaches to solve a very common problem in this RIA era, hope you appreciated.

Saturday, August 13, 2011

How To JSONP A Static File

Update I have discussed this object a part and I agree that the url could be used as unique id as well.
In this case the server should use the static url as unique id:


StaticJSONP.notify("http://cdn.com/static/article/id.js",{..data..});

So that on client side we can use the simplified signature:


StaticJSONP.request(

    "http://cdn.com/static/article/id.js",

    function (uid, data) {

    }

);

The callback will receive the uid in any case so that we can create a single callback and handle behaviors accordingly.
The script has been updated in order to accept 2 arguments but, if necessary, the explicit unique id is still supported.

Under the list of "incomplete and never posted stuff" I found this article which has been eventually reviewed.
I know it's "not that compact" but I really would like you to follow the reason I thought about a solution to a not so common, but quite nasty, problem.

Back in 2001, my early attempts to include callbacks remotely were based on server side runtime compilation of some JavaScript data passed through a single function.


<?php // demo purpose only code



// do something meaningful with server data



// create runtime the output data

$output = '{';

foreach ($data as $key => $value) {

    $output .= $key.':"'.$value.'"';

}

$output .= '}';



echo 'jsCallback('.$output.')';



?>

Above technique became deprecated few years ago thanks to the widely adopted JSON protocol and its hundreds of programming languages native/coded implementations.
Moreover, above technique became the wrong way to do it thanks to a definitively better solution as JSONP has been since the very beginning.
Here an example of what JSONP services do today:


<?php // still demo purpose only code



echo $_GET['callback'].'('.json_encode($data).')';



?>

JSONP Advantages

The callback parameter is defined on the client side, which means it can be "namespaced" or it can be unique per each JSONP request.
If we consider the first example every script in the page should rely into a single global jsCallback function.
At that time I was using my code and my code only so problems like conflicts or the possibility that another library would have defined a different jsCallback in the global scope were not existent.
Today I still use "my code and my code only" :D when it comes to my personal projects, but at least I am more than ever aware about multiple libraries conflicts the primordial technique may cause, even if all these libraries are my own one.

JSONP Disadvantages

Well, the same reason that makes JSONP powerful and more suitable technique, is the one that could make JSONP the wrong solution.
If we still consider the first code example, nobody could stop me to be "really smart" and precompile that file into a static one.


// static_service.js by cronjob 2011-08-14T10:00:00.000Z

jsCallback({category:'post',author:'WebReflection',title:'JSONP Limits'});

While precompiled static content may be or may be not what we need for our application/service, it is clear that if no server side language is involved the common JSONP approach will fail due limitations of "the single exit point" any callback in the main page depends on: the jsCallback function.

Advantages Of Precompiled Static Files

The fastest way to serve a file from a generic domain is a static one.
A static file can be both cached into disk memory, rather than be seek and retrieved each time, or directly into server RAM.
Also a static file does not require any programming language involved at all and the only code that will be executed will eventually be the one in charge of serving the file over the network, aka: the HTTP Server.
The most common real world example about static files is represented by a generic CDN where the purpose is indeed to support as many requests per second as possible and where static files are most likely the solution.
The only extra code that would be eventually involved is the one in charge of statistics on the HTTP Server layer but every file can be easily mirrored or stored in any sort of RAID configuration and be served as fast as possible.

Another real world example could be a system like blogger.com where pages do not necessarily need to be served dynamically.
Most of the content in whatever blog system can be precompiled runtime and many services/blog applications are doing it indeed.

Same is for any other application/service that does not require real times data computations and different cron job behind the scene are in charge of refreshing the content every N minutes or more.
If we think about any big traffic website we could do this basic analysis:


# really poor/basic web server performances analysis



# cost of realtime computation

1% of average CPU + RAM + DISK ACCESS per each user

# performances

MAX_USERS = 100;

AVERAGE_MAX_USERS = 100;



# cost of a threaded cron job

20% of average CPU + RAM + DISK ACCESS per iteration

# cost of static file serving

0.1% of CPU + RAM + DISK ACCESS per user

# performances

MAX_USERS_NOCRON = 1000;

MAX_USERS_WHILECRON = 800; # MAX_USERS_NOCRON - 20%

AVERAGE_MAX_USERS = 900;

If we consider that we may chose to delegate the cronjob to a server a part behind the intranet and the only operation per each changed static file will be a LOCK FILE $f EXCLUSIVE, WRITE NEW CONTENT INTO $f, UNLOCK FILE $f EXCLUSIVE so that basically only the DISK ACCESS will be involved, we can even increase AVERAGE_MAX_USERS to 950 or more.
I know this is a sort of off topic and virtual/conceptual analysis but please bear with me, I will bring you there soon.

Static Content And RESTful APIs

There is a huge amount of services out there based on JSONP. Many of them requires realtime but many probably do not. Specially in latter case, I bet nobody is implementing the technique I am going to describe.

A Real World Example

Let's imagine I work for Amazon and I am in charge of the RESTful API able to provide any sort of article related data.
If we think about it, a generic online shopping cart article is nothing more than a group of static info that will rarely change much during the day, the week, the month, or even the year.
Do online users really need to be notified realitme and per each request about current user rating, reviews, related content, article description, author, and any sort of "doesn't change so frequently" related to the article itself? NO.
The only field that should be as much updated as possible is the price but still, does the price change so frequently during the lifecycle of an Amazon article? NO.
Can my infrastructure be so smart that if, and only if, a single field of this article is change the related static file could be updated so that everybody will receive instantly the new info? YES.
... but how can do that if JSONP does not scale with static files ?

My StaticJSONP Proposal

The only difference from a normal JSONP request is that passing through the callback call any sort of library should be able to be notified.
Being the client side library in charge of creating the requested url and having the same library knowledge about what is going to be received and before what is going to ask, all this library needs is to be synchronized with the unique id the static server file will invoke. I am going to tell you more but as quick preview, this is how the static server file will look:


StaticJSONP.notify("unique_request_id", {the:response_data});

Server Side Structure Example

Let's say we would like to keep the folder structure as clear as possible. In this Amazon example we can think about splitting articles by categories.


# / as web server root



/book/102304.js # the book id

/book/102311.js

/book/102319.js



/gadgets/1456.js

/gadgets/4567.js

A well organized folder structure will result in both better readability for humans and easier access for most common filesystems.
Every pre compiled file on the list will contain a call to the global StaticJSONP object, e.g.


// book id 102311

StaticJSONP.notify("amazon_apiv2_info_book_102311",{...data...});

The StaticJSONP Object

The main, and only, purpose of this tiny piece of script that almost fits in a tweet once minzipped (282 bytes) is to:

let any library, framework, custom code, be able to request a static file

avoid multiple scripts injection / concurrent JSONP for the same file if this has not been notified yet

notify any registered callback with the result

Here an example of a StaticJSONP interaction on the client side:


var

    // just as example

    result = [],



    // library 1

    client1 = function (uri, uid, delay) {

        function exec() {

            StaticJSONP.request(uri, uid, function (uid, evt) {

                result.push("client1: " + evt.data);

            });

        }

        delay ?

            setTimeout(exec, delay) :

            exec()

        ;

    },



    // library 2

    client2 = function (uri, uid, delay) {

        function exec() {

            StaticJSONP.request(uri, uid, function (uid, evt) {

                result.push("client2: " + evt.data);

            });

        }

        delay ?

            setTimeout(exec, delay) :

            exec()

        ;

    }

;

// library 1 does its business

client1("static/1.js", "static_service_1", 250);

// so does library 2

client2("static/2.js", "static_service_2", 250);



setTimeout(function () {

    // suddenly both requires same service/file

    client1("static/3.js", "static_service_3", 0);

    client2("static/3.js", "static_service_3", 0);

    

    setTimeout(function () {

        alert(result.join("\n"));

    }, 500);

}, 1000);

It is possible to test the live demo ... just wait a little bit and you will see this alert:


// order may be different accordingly

// with website response time x file

client1: 1

client2: 2

client1: 3

client2: 3

If you monitor network traffic you will see that static/3.js is downloaded only once.
If the response is really big and the connection not so good ( 3G or worse than 3G ) it may happen that same file is required again while the first request is not finished yet.
Since the whole purpose of StaticJSONP is to simplify server side life any redundant request will be avoided on the client side.

The Unique ID ...

StaticJSONP can be easily integrated together with normal JSONP service.
As example, if we need to obtain the list of best sellers, assuming this list is not static due too frequent changes, we can do something like this:


// this code is an example purpose only

// it won't work anywhere



// JSONP callback to best sellers

JSONP("http://amazon/restful/books/bestSellers", function (evt) {

    // the evt contains a data property

    var data = evt.data;



    // data is a list of books title and ids

    for (var i = 0, li = []; i < data.length; i++) {

        li[i] = '<a href="javascript:getBookInfo(' + data[i].id + ')">' + data[i].title + '</a>';

    }



    // show the content

    document.body.innerHTML = '<ul><li>' + li.join('</li><li>') + '</li></ul>';





});



// the function to retrieve more info

function getBookInfo(book_id) {

    StaticJSONP.request(

        

        // the url to call

        "http://amazon/restful/static/books/" + book_id + ".js",



        // the unique id accordingly with the current RESTful API

        "amazon_apiv2_info_book_" + book_id,



        // the callback to execute once the server respond

        function (uid, evt) {

            // evt contain all book related data

            // we can show it wherever we want

        }

    );

}

Now just imagine how many users in the world are performing similar requests right now to the same list of books, being best sellers ...

Unique ID Practices

It is really important to understand the reason StaticJSONP requires a unique id.
First of all it is not possible, neither convenient, to "magically retrieve it from the url" because any RESTful API out there may have a "different shape".
The unique id is a sort of trusted, pre-agreeded, and aligned information the client side library must be aware of since there is no way to change it on the server side, being the file created statically.
It is also important to prefix the id so that debugging will be easier on client side.
However, the combination to generate the unique id itself may be already ... well, unique, so it's up to us on both client and server side to define it in a possibly consistent way.
The reason I did not use the whole uri + id info on StaticJSONP request method is simple:
if both gadgets/102.js and books/102.js contains a unique 102 id there is no way on the client side to understand which article has been required and both gadgets and books registered callbacks will be notified, one out of two surely with the wrong data.
It's really not complicated to namespace a unique id prefix and this should be the way to go imho.

Conclusion

It's usually really difficult to agree unanimously to a solution for a specific problem and I am not expecting that from tomorrow everyone will adopt this technique to speed up server side file serving over common "JSONP queries" but I hope you understood the reason this approach may be needed and also how to properly implement a solution that does not cause client side conflicts, that scales, that does not increase final application size in any relevant way, and it's ready to go for that day when, and if, you gonna need it. Enjoy