Wednesday, July 8, 2015

Async Made Easy

Asynchronous Code Doesn't Have to be Complicated
One of the best things about functional programming with Node is how simple it can be to perform lots of potentially slow functions asynchronously.

That's also one of the worst things.

It's very easy to write really bad code or code that doesn't act the way you expect if you lose sight of the asynchronicity of it all. Fortunately, there are great utility packages like async that make it simple to write good code that acts properly.

Take the following example. Let's say I have a web service that performs some function to its input and returns a new value. For test purposes, let's use the following Node+Express web application that has a single route which reverses a given string.

var express = require("express");
var app = express();

function reverse(s) {
    var o = "";
    for (var i = s.length - 1; i >= 0; i--) {
        o += s[i];
    }
    return o;
}

app.get("/reverse/:s", function(req, res, next) {

    res.setHeader("Content-Type", "text/plain");
    res.send(reverse(req.params.s));

});

// ------------ start listening
var server = app.listen(3000, function() {
    console.log("listening on port %d", server.address().port);
});

Now if I have another application that wants to call that reversal web service with 1000 strings and wait for them all to complete before moving on, what should I do? Making HTTP calls is an asynchronous operation. I make the call and then supply a callback function to let me know when it finishes. Here's the code that sets up the test: a function that makes the web service call and invokes a callback when complete, and an array of 1000 random test strings.

var http = require('http');

function makeString() {
    var text = "";
    var possible = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
    for( var i=0; i < 25; i++ ) {
        text += possible.charAt(Math.floor(Math.random() * possible.length));
    }
    return text;
}

function reverseIt(str, callback) {

    return http.get({
        host: 'localhost',
        port: 3000,
        path: '/reverse/' + str
    }, function(response) {
        var body = '';
        response.on('data', function(d) {
            body += d;
        });
        response.on('end', function() {
            callback(body);
        });
    });

}

var testStrings = [];
for (var i = 0; i < 1000; i++) {
    testStrings.push(makeString());
}

First look at what is probably the most commonly used - and incorrect - way of making all the web service calls.

// ----- incorrect - no idea when they are all done

for (var j = 0; j < testStrings.length; j++) {

    reverseIt(testStrings[j], function(r) {
        console.log("got: " + r);
    });

}

console.log("finished!");

The problem is that you have no idea when all the calls complete. The console.log("finished!") is output almost immediately after you start the program, and the individual logs of the backwards strings keep coming out. Don't do this.

The thing that people do after realizing that first method fails is to use the callback from the completed web service invocation to call the web service again, and again, and again until all the strings are processed. Something like this:

// ----- does everything in series.
//       really slow, but you know when it's finished.

function rev(idx, callback) {
    reverseIt(testStrings[idx], function(r) {
        console.log("got: " + r);
        if (idx === testStrings.length - 1) {
            callback();
        } else {
            rev(idx+1, callback);
        }
    });
}

console.time("type2");
rev(0, function() {
    console.log("did them all!");
    console.timeEnd("type2");
});

Defining a wrapper function that makes the call to reverseIt and then calls itself in reverseIt's callback does allow you to know when all 1000 calls have completed. But doing one request at a time invalidates all the performance gains made possible by the asynchronous nature of the http calls. The console.time and console.timeEnd functions will be used as proof of how this is the worst of the "correct" methods. In my local tests, the complete set of 1000 string reversals took on average around 675 milliseconds.

What is the better way? We can still make all the web service calls asynchronously if we set up some way of tracking their completion. Something like this:

// ----- better

function revManager(arr, callback) {

    var counter = 0;
    return {
        go: function() {
            for (var i = 0; i < arr.length; i++) {
                reverseIt(arr[i], function(r) {
                    console.log("got: " + r);
                    counter++;
                    if (counter === arr.length) {
                        callback();
                    }
                });
            }
        }
    };

}

console.time("type3");
revManager(testStrings, function() {
    console.log("all done!");
    console.timeEnd("type3");
}).go();

This method creates a manager for reversing all the strings that will only invoke a callback when all have completed. To keep track of how many have finished, it sets up a closure with a counter variable and returns an object containing a function to start the process - the go function on line 7. It loops over the array of strings and calls reverseIt on each one, much like in the incorrect example above. But in this callback function for reverseIt, the counter variable is incremented and then checked to determine if all have finished. If they have, the manager's callback is invoked. That's the point when we are sure that all 1000 have completed. The timing on this method proves that it completes much faster - my tests averaged 203 milliseconds.

But way back up near the top I mentioned how utility packages like async make this easier. So now that I've done it the hard way, take a look at how async makes it easy with a general-purpose each function:

// ----- easy with async

var async = require('async');

console.time("type4");
async.each(testStrings, function(item, cb) {
    reverseIt(item, function(r) {
        console.log("got: " + r);
        cb();
    });
}, function() {
    console.log("complete!");
    console.timeEnd("type4");
});

Async has a bunch of methods for different patterns, but the each function corresponds to the desired behavior in this example. It takes three parameters. The first is the array of items, and second is what they call the iterator function. This function is called and passed each item as well as a callback. The general contract is that the iterator function should do its thing and then call the callback when complete. The third parameter to each is the callback function for when the entire array has been processed.

The timing indicates that async's each function performs exactly the same as my own function, but is much more generic and allows me to write fewer lines of code. That's a win-win!

Follow my example and use the time you save by writing less code to enjoy today's comic!

Amphibian.com comic for 8 July, 2015

No comments:

Post a Comment