Day 329 – Ready for the Sunset

Day 329 - Ready for the Sunset

A family of tourists, getting ready to watch the sun set on the Pacific coast. I love silhouette photos like this: It’s fun to see the different characters with their body shapes and postures.

Categories: OSU OSL Crosspost, Project 365

Day 15 – Open Your Mind, Open Your Heart, Open Your Source

Day 15 - Open Your Heart, Open Your Mind, Open Your Source

This is one of my favorite coffee mugs: I bought it back in 2002 at the LinuxTag open source conference in Karlsruhe, Germany. The motto of that year’s conference — “Open Your Mind, Open Your Heart, Open Your Source” — hints at what this conference was still very much about: Convincing decision makers, in particular in government organizations, to recognize the potential in open source software and treat it as an opportunity rather than a threat. Luckily, we’ve come a long way since then.

This is a simple, Saturday-morning sun, “portrait” photo, with no alterations whatsoever.

Better web forms with HTML5 and Firefox 4

Surely, you’ve heard of many fancy new features that HTML5 and related technologies bring to the Web now and in the future: open video on the web, canvas, transitions, what have you.

But sometimes it’s the smallest things that have the biggest impact. Besides these hyped features, HTML5 also introduces a number of semantic form fields. Before, the only textual input the web knew was, well, plain text. It was up to the web application developer to enforce certain rules around that, like making sure the input is a number, or not empty, or even a valid website address (URL).

Firefox 4 understands these new input types and helps the user by enforcing correct values even before the users submits the form. By handling validation on the client, this enables a consistent form validation UI across websites and keeps the user from constantly submitting forms and wait for the server-side form validation to pass or fail. (NB: This does not relieve the developers of performing server-side checks in order to ensure the security of their web application).

Here is what this looks like in a recent prototype of the Firefox Input site:

Another fun little feature, also pictured, is the placeholder text attribute. The grayed-out placeholder in a text box shows you an example of what you might enter into this field. Rather than explaining correct values in a huge label or a side note next to the field, developers can show their users much more easily what data they would like them to enter into the form fields.

All of this makes for fewer mistakes entering data into web forms, which is both beneficial to the user (getting the job done faster) and the developer (collecting better data). Win-win!

For much more detailed on HTML5 forms, placeholders, validation, etc., take a look at Mark Pilgrim’s excellent Dive Into HTML5. Also, don’t miss out on Anthony Ricaud’s in-depth description of HTML5 forms in Firefox on the Mozilla Hacks blog.

Categories: Mozilla Crosspost, OSU OSL Crosspost, Tech Talk

Adding Support for Stronger Password Hashes To Django

Cryptographic hash functions play an important part in application security: Usually, user passwords are hashed and stored in the database. When someone logs in, their input is hashed as well and compared to the database content. A weak hash is almost as bad as no hash at all: If someone steals (part of) your user database, they can analyze the hashed values to detect the actual password–and then use it, without the owner’s knowledge, to log into your application on their behalf.

As part of a proactive web application security model, it is therefore important to stay ahead of the game attackers play and use sufficiently strong encryption to store passwords. Since cryptanalysts are spending great efforts on breaking encryption algorithms (with the help of increasingly fast and cheap computers), SHA-1 is meanwhile considered only borderline in strength. Not a good position to be in if you want to write future-proof apps.

Django (our web app framework of choice at Mozilla) does not support anything stronger than its default, SHA-1, and has, in the past, WONTFIXed attempts to increase hash strengths, citing strong backwards compatibility as the reason. As long as Django targets Python 2.4 as its greatest common denominator, this is unlikely to change. Writing a full-blown, custom authentication backend for the purpose is an option (the Mozilla Add-ons project chose to do that), but it seemed overkill to me, given that with the exception of the hash strength, Django’s built-in authentication code works just fine.

So I decided to monkey-patch their auth model at run time, to add SHA-256 support to my application (while staying backwards-compatible with older password hashes possibly existing in the database).

The code is simple, and I made an effort to keep it as uninvasive as possible, so that it can be removed easily in case Django ever does get support for stronger hashes down the road. Let me know what you think:

(Embedded from a Gist on Github).

Decoding HTML Entities to Text in Python

A while ago, I had to import some HTML into a Python script and found out that—while there is cgi.escape() for encoding to HTML—there did not seem to be an easy or well-documented way for decoding HTML entities in Python.

Silly, right?

Turns out, there are at least three ways of doing it, and which one you use probably depends on your particular app’s needs.

1) Overkill: BeautifulSoup

BeautifulSoup is an HTML parser that will also decode entities for you, like this:

soup = BeautifulSoup(html, convertEntities=BeautifulSoup.HTML_ENTITIES)

The advantage is its fault-tolerance. If your input document is malformed, it will do its best to extract a meaningful DOM tree from it. The disadvantage is, if you just have a few short strings to convert, introducing the dependency on an entire HTML parsing library into your project seems overkill.

2) Duct Tape: htmlentitydefs

Python comes with a list of known HTML entity names and their corresponding unicode codepoints. You can use that together with a simple regex to replace entities with unicode characters:

import htmlentitydefs, re
mystring = re.sub('&([^;]+);', lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]), mystring)
print mystring.encode('utf-8')

Of course, this works. But I hear you saying, how in the world is this not in the standard library? And the geeks among you have also noticed that this will not work with numerical entities. While © will give you ©, © will fail miserably. If you’re handling random, user-entered HTML, this is not a great option.

3) Standard library to the rescue: HTMLParser

After all this, I’ll give you the option I like best. The standard lib’s very own HTMLParser has an undocumented function unescape() which does exactly what you think it does:

>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> s = h.unescape('© 2010')
>>> s
u'\xa9 2010'
>>> print s
© 2010
>>> s = h.unescape('© 2010')
>>> s
u'\xa9 2010'

So unless you need the advanced parsing capabilities of BeautifulSoup or want to show off your mad regex skills, this might be your best bet for squeezing unicode out of HTML snippets in Python.

Next Steps for the Copy ShortURL Addon

The Copy ShortURL Add-on has been on AMO for a week now and was recently approved to be public, so now I have a user base to please ;)

I am inclined to drop the code onto github, where I’d get a proper version history along with a bug tracker. Update: It’s on github now!

For now though, here are a few ideas I have for the add-on, in no particular order and with no promise that I’m about to implement any of this right away:

  • Allow other URL shortening services. tinyURL is all fun and games, and I chose it over bit.ly because it does not require an API key — but if you have one at hand, you should be able to use any service you like. Even if only by setting an about:config preference.
  • Incorporate selected sites that support short URLs but do not publish them as a header. Zappos (zapp.me), for example. Others seem to have a short URL available (such as: NY Times (nyti.ms), Amazon (amzn.to), ESPN (es.pn)) but only use them on their twitter account and not on every webpage, so there might be nothing we can do :( .
  • When shortening, need to make sure not to use the current URL but the canonical URL if such a header exists. (Fixed!)

Let me know what you think! I’d like to know if any other things come to your minds, or which of the above you’d find especially useful.

Copy Short URL Add-on

Update: The add-on is now on AMO! Check it out! Also, feedback is greatly appreciated!


This week during the Mozilla Summit in Whistler, British Columbia, there was a “Rocket Your Firefox” Jetpack contest: The idea, make a new add-on using the Jetpack SDK, submit it, win a prize.

So I went ahead and made a jetpack called “Copy Short URL” and it does what it sounds like:

On any webpage, you get a new item in the right click menu called “copy short URL”. When you click it, the add-on looks for a canonical short URL exposed in the page header. Currently, a number of major websites expose their own short URLs for any entry on their webpages, among these: youtube (“youtu.be/…”), flickr (“flic.kr/…”), Arstechnica, Techcrunch, and many more.

If, however, the site does not name its own short URL, the add-on automatically falls back to making a short URL using tinyurl.com.

Either way, after a fraction of a second, you end up with a short URL in your clipboard, ready to be used in forum posts, tweets, or wherever else you please.

My add-on won the contest in the “most useful” category. The prize was an awesome jetpack sweatshirt:

If you want to check out the add-on, it is currently available (open source, of course) on the add-ons builder website. I also uploaded the add-on to AMO.

Hope you find it useful!

Under the Hood of Firefox Input

Note: Several people asked where the link is to actually add feedback to the site. This is, of course, a good point. As mentioned in the comments: The designated entry point for the feedback application is going to be an extension bundled with Firefox 4 Beta. For more information, please read Aakash’s blog post. To try out the application already, feel free to add happy or sad feedback to the test site.


This morning, we published the Firefox Input application. It is a little web application soliciting feedback from our Firefox Beta Program users. The aim is to make it as easy as possible for people to tell us what specifically they like or dislike about an upcoming version of Firefox.

The application was, as far as software goes, developed very rapidly: We made it from requirements to production in a mere three weeks. What made this possible was a number of reusable components that allowed us to avoid reinventing the wheel and stay focused on making the application awesome.

A few key components of the Input application:

  • Django. I can’t stress this enough, but Django is a fantastic web application framework. It makes it incredibly easy to set up a web application quickly and securely. Their built-in admin pages save me days of work that I would otherwise have to spend to allow project admins to edit the application data.
  • Jinja2 and Jingo. The only big drawback of Django is its template language: The instant you make nontrivial web applications, it gets in your way. Luckily, like all parts of Django it is replaceable: Jinja2 and Jeff Balogh’s jingo interface comes to the rescue. The two of them are already in use over at AMO and also serve us well on Input.
  • Term extraction. Firefox Input extracts key words from all feedback. Sure, you can just split the sentences into words, but if you want to avoid collecting all sorts of meaningless particles (“the”, “a”, “if”, …), it becomes a little more complicated. We are using the topia.termextract library, which gladly does the heavy lifting for us. Only caveat: It only works for English, so once the application is localized, we need a different solution for the other languages.
  • Search. For the longest time, there was no generic way to do search in a Django app (other than straight SQL queries). In the meantime, haystack has started to fill that gap. We use it on Input in conjunction with Whoosh, a pure-Python search library. That is very easy to set up, at the expense of scalability — if we outgrow it, however, it will be easy to switch search engines with virtually no code changes at all. Thumbs up!
  • Product details. Only very recently we released a Mozilla product details library for Django, and this is the first application to rely intimately on up-to-date product data: Input only lets users of the latest beta versions of Firefox add feedback, so it auto-updates its product data periodically to gather feedback for the newest versions as quickly as possible.

As always, the source code of Firefox Input is openly and freely available. If you notice any problems with it, feel free to fork it on github, or file a bug in our bug tracker.