Saturday, April 16, 2011

Building by the Byte - the HTML5 File API

One of the major features needed in JavaScript to make it truly useful as an application language is file processing. I'm talking about handling the contents of a file totally in your web browser. No server needed. Even non-text format files. Now with HTML5 we have this capability in the File API! I've been thinking about the awesome new possibilities opened up by this development, and put together an example of what it can be used for.

First, let's talk about the browser support. The latest Chrome, Safari, and Firefox browsers support the new JavaScript File API. Internet Explorer? Nope. Get a real browser.

The first thing to understand is the FileReader object. It's a new built-in object, sort of like the XMLHttpRequest object. Like the familiar XHR, FileReader is designed to work asynchronously. That means you'll need to specify your own onload function to the object, which will be called when the browser is done with the file. Think about it - it could take a while to process a file and you don't necessarily need your app tied up waiting for it. Look at this simple example...
var reader = new FileReader();
reader.onload = function(event) {
// file is loaded, contents are in event.target.result
// do something with it!
}
reader.readAsBinaryString(file);
Now you're probably asking a few questions at this point. Where did you get the file object? How does JavaScript handle binary data? What if there's an error? How do they get the peanut butter inside the peanut butter cups? I can answer all but that last one.

First, there are a few ways to get a file object. My favorite is to grab one simply by dragging it into the browser window. This is accomplished via the dataTransfer property of the event object. For example, let's say you have the following div in your page...

<div id="drophere" style="text-align: center; width: 200px; height: 100px;">drop a file here</div>
And then you had some JavaScript like this...
document.getElementById('drophere').ondrop = function (evt) {
evt.preventDefault();
var file = evt.dataTransfer.files[0];
// now you've got a file object, which is the file you dropped
return false; // don't let the browser navigate away
}
Now just drag a file into your browser window and drop it on your div. Awesome! You've got a file. Now just combine this function with the previous one and you're all set to process anything you can drag in. Well, almost. There's still that binary data issue. JavaScript doesn't really have a data structure designed for binary data.

This is where you break out the FileReader and pass in that file object. Add the code from the first example into the second example....

function handleDrop(evt) {
evt.preventDefault();
var file = evt.dataTransfer.files[0];
var reader = new FileReader();
reader.onload = function(event) {
// file is loaded, contents are in event.target.result
// do something with it!
}
reader.readAsBinaryString(file);
return false; // don't let the browser navigate away
}
So when you get the event.target.result object (in the reader's onload function), what will it be? It's actually going to be a String where each character code is between 0 and 255. To read the "bytes" of the file, just loop through all the characters calling charCodeAt on each one. I made an object to help with all the functions you might want to do with the "byte array"...

function DataReader(a) {
 
 this.bytes = a;
 this.index = 0;
 this.byteRead = 0;
 this.bitIndex = 0;
 this.endian = "big";
 
}

DataReader.prototype.readByte = function() {
 if (this.eof()) return;
 var ret = this.bytes.charCodeAt(this.index);
 this.index++;
 return ret;
}

DataReader.prototype.readBytes = function(howMany) {
 if (this.eof()) return;
 var ret = new Array();
 for (var i = 0; i < howMany; i++) {
  ret.push(this.readByte());
 }
 return ret;
}


DataReader.prototype.readInteger = function(numBytes) {
 
 if (this.eof()) return;
 
 var howMany = 4; // default to a 4-byte integer
 if (numBytes) {
  howMany = numBytes;
 }
 
 var ret = 0;
 if (this.endian == "little") {
  var origIndex = this.index;
  for (var n = this.index + howMany - 1; n >= origIndex; n--) {
   ret = ((ret << 8) | this.bytes.charCodeAt(n));
   this.index++;
  }
 } else {
  for (var n = 0; n < howMany; n++) {
   ret = ((ret << 8) | this.bytes.charCodeAt(this.index));
   this.index++;
  }
 }
 return ret;
 
}

DataReader.prototype.readString = function(len) {
 if (!len || this.eof()) return;
 var ret = this.bytes.substring(this.index, this.index + len);
 this.index += len;
 return ret;
}

DataReader.prototype.readNullTerminatedString = function() {
 if (this.eof()) return;
 var slen = 0;
 var n = this.index;
 var finished = false;
 while (!finished && n <= this.bytes.length) {
  var c = this.bytes.charCodeAt(n);
  if (c == 0) {
   finished = true;
  }
  slen++;
  n++;
 }
 var ret = this.bytes.substring(this.index, this.index + (slen - 1));
 this.index += slen;
 return ret;
 
}

DataReader.prototype.skip = function(num) {
 if (this.eof()) return;
 this.index += num;
}

DataReader.prototype.eof = function() {
 return (this.index >= this.bytes.length - 1);
}

I should mention that there are other options for processing the file. If you used readAsText instead of readAsBinaryString, you'd just get a normal string containing the contents of the file. That's only really useful if you know the file will only contain text data. A third option is readAsDataURL, which returns a data: URL instead of a string. You can use this to directly set the src attribute of an img tag with the dropped file. Again, this will have limited usefulness. Getting the binary string is the most powerful.

This is a good time to talk about the onerror function. If you tried the above example in Chrome using a local HTML file, it won't work. You'll get an error. You'll only know that if you specify an onerror function as well as an onload. Don't expect a whole lot of details in the error, however.
reader.onerror = function (event) {
console.log(this.error.code);
}
You'll see a "4" in the console. That's helpful... It actually means that Chrome, by default, does not allow local files (your test HTML file) access other local files (the file you drop in). Firefox does. It's not a real big deal, you can either test using a local server instead of just loading the file or add the "--allow-file-access-from-files" flag to Chrome when you start it. Security thing.

Okay, okay, okay...now what can you build with this? Well, some really amazing things. I put together this nifty example that will read PNG files and display them in the browser not as images, but as a bunch of DIVs (one for each pixel). To accomplish this, I just needed two things. One, the PNG specification which can be found here. And two, a way to inflate compressed data blocks inside the files. For that part, I used my pure JavaScript Inflater that I talked about in my last post.


If you don't have your own PNG file handy, use this one: http://www.amphibian.com/blogstuff/small_dr_frog.png

Make sure you check out the page source to see how it all works. It turns out that PNG files are fairly easy to work with once you have the data inflated.

I know my example is not particularly practical, but I hope it can at least inspire you to make something of your own that uses these splendid new HTML5 features. Use your imagination and let me know what you come up with!