Over on the Mozilla Webdev blog, I just posted about a new library of ours, django-mozilla-product-details. This tongue twister allows you to periodically update the latest Mozilla product version information as well as language details from our SVN server.
The geeks among you are surely wondering, isn't that going to lead to a lot of useless traffic if the data does not change as frequently as it is being updated?
You are right. Because re-downloading unchanged data is evil and because we like our servers, we are using a fun little trick to keep the data transferred as little as possible:
Every time the update script is run, we first issue a HEAD request to the SVN server: A HEAD request is a type of HTTP request that asks for some location from a server, but instead of receiving the actual data in return (an HTML document, for example, or some binary data), the server only returns the response headers, not the actual data.
From these headers, which are very small, we can read the Last-Modified timestamp and compare that to the time we last updated our local copy of the product data. If the timestamp hasn't changed since then, there's no need for us to download further data.
Instead of blindly downloading the data files on every update, we send the time of our last successful update along to the server, in a If-Modified-Since
HTTP request header. If the files have changed since then, the server will send us the updated list, but if nothing has changed in the meantime, the server will just return a "304 Not Modified
" status.
This is how we ensure that (almost) no matter how often you choose to update the product data, neither your nor our resources will be wasted.
This is not only a good idea for this specific library: Next time you consume RSS feeds or other "pull" data from various places on the Internet, make sure to query for updates before downloading unnecessary data. Caveat: This method only works if the server can handle an If-Modified-Since header. Servers that serve bogus timestamps or no such header at all leave you no choice but to download and investigate the feed itself.
Update: A few readers pointed out that the If-Modified-Since request header would be an even better method to update the data conditionally than an initial HEAD request. They are, of course, right, which is why I updated the library accordingly. Thanks, everyone!