Cryptographic hash functions play an important part in application security: Usually, user passwords are hashed and stored in the database. When someone logs in, their input is hashed as well and compared to the database content. A weak hash is almost as bad as no hash at all: If someone steals (part of) your user database, they can analyze the hashed values to detect the actual password--and then use it, without the owner's knowledge, to log into your application on their behalf.

As part of a proactive web application security model, it is therefore important to stay ahead of the game attackers play and use sufficiently strong encryption to store passwords. Since cryptanalysts are spending great efforts on breaking encryption algorithms (with the help of increasingly fast and cheap computers), SHA-1 is meanwhile considered only borderline in strength. Not a good position to be in if you want to write future-proof apps.

Django (our web app framework of choice at Mozilla) does not support anything stronger than its default, SHA-1, and has, in the past, WONTFIXed attempts to increase hash strengths, citing strong backwards compatibility as the reason. As long as Django targets Python 2.4 as its greatest common denominator, this is unlikely to change. Writing a full-blown, custom authentication backend for the purpose is an option (the Mozilla Add-ons project chose to do that), but it seemed overkill to me, given that with the exception of the hash strength, Django's built-in authentication code works just fine.

So I decided to monkey-patch their auth model at run time, to add SHA-256 support to my application (while staying backwards-compatible with older password hashes possibly existing in the database).

The code is simple, and I made an effort to keep it as uninvasive as possible, so that it can be removed easily in case Django ever does get support for stronger hashes down the road. Let me know what you think: (Embedded from a Gist on Github).

Read more…

Note: Several people asked where the link is to actually add feedback to the site. This is, of course, a good point. As mentioned in the comments: The designated entry point for the feedback application is going to be an extension bundled with Firefox 4 Beta. For more information, please read Aakash's blog post. To try out the application already, feel free to add happy or sad feedback to the test site.

This morning, we published the Firefox Input application. It is a little web application soliciting feedback from our Firefox Beta Program users. The aim is to make it as easy as possible for people to tell us what specifically they like or dislike about an upcoming version of Firefox.

The application was, as far as software goes, developed very rapidly: We made it from requirements to production in a mere three weeks. What made this possible was a number of reusable components that allowed us to avoid reinventing the wheel and stay focused on making the application awesome.

A few key components of the Input application:

  • Django. I can't stress this enough, but Django is a fantastic web application framework. It makes it incredibly easy to set up a web application quickly and securely. Their built-in admin pages save me days of work that I would otherwise have to spend to allow project admins to edit the application data.
  • Jinja2 and Jingo. The only big drawback of Django is its template language: The instant you make nontrivial web applications, it gets in your way. Luckily, like all parts of Django it is replaceable: Jinja2 and Jeff Balogh's jingo interface comes to the rescue. The two of them are already in use over at AMO and also serve us well on Input.
  • Term extraction. Firefox Input extracts key words from all feedback. Sure, you can just split the sentences into words, but if you want to avoid collecting all sorts of meaningless particles ("the", "a", "if", ...), it becomes a little more complicated. We are using the topia.termextract library, which gladly does the heavy lifting for us. Only caveat: It only works for English, so once the application is localized, we need a different solution for the other languages.
  • Search. For the longest time, there was no generic way to do search in a Django app (other than straight SQL queries). In the meantime, haystack has started to fill that gap. We use it on Input in conjunction with Whoosh, a pure-Python search library. That is very easy to set up, at the expense of scalability -- if we outgrow it, however, it will be easy to switch search engines with virtually no code changes at all. Thumbs up!
  • Product details. Only very recently we released a Mozilla product details library for Django, and this is the first application to rely intimately on up-to-date product data: Input only lets users of the latest beta versions of Firefox add feedback, so it auto-updates its product data periodically to gather feedback for the newest versions as quickly as possible.

As always, the source code of Firefox Input is openly and freely available. If you notice any problems with it, feel free to fork it on github, or file a bug in our bug tracker.

Read more…

Need to add a robots.txt file to your Django project to tell Google and friends what and what not to index on your site?

Here are three ways to add a robots.txt file to Django.

1) The (almost) one-liner

In an article on, Paul Bissex suggest to add this rule to your file:

from django.http import HttpResponse

urlpatterns = patterns('', ... (r'^robots.txt$', lambda r: HttpResponse("User-agent: *\nDisallow: /", mimetype="text/plain")) )

The advantage of this solution is, it is a simple one-liner disallowing all bots, with no extra files to be created, and no clutter anywhere. It's as simple as it gets.

The disadvantage, obviously, is the missing scalability. The instant you have more than one rule to add, this approach quickly balloons out of hand. Also, one could argue that is not the right place for content of any kind.

2) Direct to template

This one is the most intuitive approach: Just drop a robots.txt file into your main templates directory and link to it via directtotemplate:

from django.views.generic.simple import direct_to_template

urlpatterns = patterns('',
    (r'^robots\.txt$', direct_to_template,
     {'template': 'robots.txt', 'mimetype': 'text/plain'}),

Just remember to set the MIME type appropriately to text/plain, and off you go.

Advantage is its simplicity, and if you already have a robots.txt file you want to reuse, there's no overhead for that.

Disadvantage: If your robots file changes somewhat frequently, you need to push changes to your web server every time. That can get tedious. Also, this approach does not save you from typos or the like.

3) The django-robots app

Finally, there's a full-blown django app available that you can install and drop into your INSTALLED_APPS: It is called django-robots.

For small projects, this would be overkill, but if you have a lot of rules, or if you need a site admin to change them without pushing changes to the web server, this is your app of choice.

Which one is right for me?

Depending on how complicated your rule set is, either one of the solutions may be the best fit for you. Just choose the one that you are the most comfortable with and that fits the way you are using robots.txt in your application.

Read more…

Over on the Mozilla Webdev blog, I just posted about a new library of ours, django-mozilla-product-details. This tongue twister allows you to periodically update the latest Mozilla product version information as well as language details from our SVN server.

The geeks among you are surely wondering, isn't that going to lead to a lot of useless traffic if the data does not change as frequently as it is being updated?

You are right. Because re-downloading unchanged data is evil and because we like our servers, we are using a fun little trick to keep the data transferred as little as possible:

Every time the update script is run, we first issue a HEAD request to the SVN server: A HEAD request is a type of HTTP request that asks for some location from a server, but instead of receiving the actual data in return (an HTML document, for example, or some binary data), the server only returns the response headers, not the actual data.

From these headers, which are very small, we can read the Last-Modified timestamp and compare that to the time we last updated our local copy of the product data. If the timestamp hasn't changed since then, there's no need for us to download further data.

Instead of blindly downloading the data files on every update, we send the time of our last successful update along to the server, in a If-Modified-Since HTTP request header. If the files have changed since then, the server will send us the updated list, but if nothing has changed in the meantime, the server will just return a "304 Not Modified" status.

This is how we ensure that (almost) no matter how often you choose to update the product data, neither your nor our resources will be wasted.

This is not only a good idea for this specific library: Next time you consume RSS feeds or other "pull" data from various places on the Internet, make sure to query for updates before downloading unnecessary data. Caveat: This method only works if the server can handle an If-Modified-Since header. Servers that serve bogus timestamps or no such header at all leave you no choice but to download and investigate the feed itself.

Update: A few readers pointed out that the If-Modified-Since request header would be an even better method to update the data conditionally than an initial HEAD request. They are, of course, right, which is why I updated the library accordingly. Thanks, everyone!

Read more…

So you wrote a fancy little Django app and want to run it on a lighttpd webserver? There's plenty of documentation on this topic online, including official Django documentation.

Problem is, most of these sources do not mention how to use virtualenv, but the cool kids don't install their packages into the global site-packages directory. So I put some scripts together for your enjoyment.

I assume that you've put your django app somewhere convenient, and that you have a virtualenv containing its packages (including django itself).


You want to set up this file so it adds the virtualenv's site-packages path to its site-packages: site.addsitedir('path/to/mysite-env/lib/python2.6/site-packages'). Note that you need to point directly to the site-packages dir inside the virtualenv, not only the main virtualenv dir. For obvious reasons, this line needs to come before the django-provided from django... import, because you can't import django files if Python doesn't know where they are.


The lighttpd setup will result in mysite.fcgi showing up in all your URLs, unless you set FORCESCRIPTNAME correctly. If your django app is supposed to live right at the root of your domain, set this to the empty string, for example.


This is an initscript (for Debian, but you can modify it to work with most distros, I presume). Copy it to /etc/init.d, adjust the settings on top (and possibly other places, depending on your setup), then start the Django fastcgi servers. Note that you need to have the flup package installed in your virtualenv.

4. lighttpd-vhost.conf

Set up your lighttpd vhost pretty much like the Django documentation suggests. Match up the host and port with the settings from your init script. By using mod_alias for the media and admin media paths, you'll have lighttpd serve them instead of passing them on to Django as well.

That's it! You've deployed your first Django application on lighttpd. If you have any questions or suggestions, feel free to comment here or fork my code.

You can look at all the scripts together over on github or download them in a package.

Read more…

Here's another, beautiful specimen in my little collection of what I have called "fail pets" for awhile now: Github.

Github Fail-Unicorn

I wonder if their pink fail-unicorn is somehow related to the similarly colored (but less angry) Django Pony. A distant relative, maybe -- especially since the "original" Django pony was, in fact, a unicorn.

(Before someone is urged to remind me, yes, to my knowledge, github is written in ruby, not Python/Django.)

Read more…