I totally forgot blogging about this!

Remember how I curate a collection of fail pets across the Interwebs? Sean Rintel is a researcher at the University of Queensland in Australia and has put some thought into the UX implications of whimsical error messages, published in his article: The Evolution of Fail Pets: Strategic Whimsy and Brand Awareness in Error Messages in UX Magazine.

In his article, Rintel attributes me with coining the term "fail pet".

Attentive readers may also notice that Mozilla's strategy of (rightly) attributing Adobe Flash's crashes with Flash itself by putting a "sad brick" in place worked formidably: Rintel (just like most users, I am sure) assumes this message comes from Adobe, not Mozilla:

Thanks, Sean, for the mention, and I hope you all enjoy his article.

Read more…

Note: This is a cross-post of an article I published on the Mozilla Webdev blog this week.

During the course of this week, a number of high-profile websites (like LinkedIn and last.fm) have disclosed possible password leaks from their databases. The suspected leaks put huge amounts of important, private user data at risk.

What's common to both these cases is the weak security they employed to "safekeep" their users' login credentials. In the case of LinkedIn, it is alleged that an unsalted SHA-1 hash was used, in the case of last.fm, the technology used is, allegedly, an even worse, unsalted MD5 hash.

Neither of the two technologies is following any sort of modern industry standard and, if they were in fact used by these companies in this fashion, exhibit a gross disregard for the protection of user data. Let's take a look at the most obvious mistakes our protagonists made here, and then we'll discuss the password hashing standards that Mozilla web projects routinely apply in order to mitigate these risks. <!--more-->

A trivial no-no: Plain-text passwords

This one's easy: Nobody should store plain-text passwords in a database. If you do, and someone steals the data through any sort of security hole, they've got all your user's plain text passwords. (That a bunch of companies still do that should make you scream and run the other way whenever you encounter it.) Our two protagonists above know that too, so they remembered that they read something about hashing somewhere at some point. "Hey, this makes our passwords look different! I am sure it's secure! Let's do it!"

Poor: Straight hashing

Smart mathematicians came up with something called a hashing function or "one-way function" H: password -> H(password). MD5 and SHA-1 mentioned above are examples of those. The idea is that you give this function an input (the password), and it gives you back a "hash value". It is easy to calculate this hash value when you have the original input, but prohibitively hard to do the opposite. So we create the hash value of all passwords, and only store that. If someone steals the database, they will only have the hashes, not the passwords. And because those are hard or impossible to calculate from the hashes, the stolen data is useless.

"Great!" But wait, there's a catch. For starters, people pick poor passwords. Write this one in stone, as it'll be true as long as passwords exist. So a smart attacker can start with a copy of Merriam-Webster, throw in a few numbers here and there, calculate the hashes for all those words (remember, it's easy and fast) and start comparing those hashes against the database they just stole. Because your password was "cheesecake1", they just guessed it. Whoops! To add insult to injury, they just guessed everyone's password who also used the same phrase, because the hashes for the same password are the same for every user.

Worse yet, you can actually buy(!) precomputed lists of straight hashes (called Rainbow Tables) for alphanumeric passwords up to about 10 characters in length. Thought "FhTsfdl31a" was a safe password? Think again.

This attack is called an offline dictionary attack and is well-known to the security community.

Even passwords taste better with salt

The standard way to deal with this is by adding a per-user salt. That's a long, random string added to the password at hashing time: H: password -> H(password + salt). You then store salt and hash in the database, making the hash different for every user, even if they happen to use the same password. In addition, the smart attacker cannot pre-compute the hashes anymore, because they don't know your salt. So after stealing the data, they'll have to try every possible password for every possible user, using each user's personal salt value.

Great! I mean it, if you use this method, you're already scores better than our protagonists.

The 21st century: Slow hashes

But alas, there's another catch: Generic hash functions like MD5 and SHA-1 are built to be fast. And because computers keep getting faster, millions of hashes can be calculated very very quickly, making a brute-force attack even of salted passwords more and more feasible.

So here's what we do at Mozilla: Our WebApp Security team performed some research and set forth a set of secure coding guidelines (they are public, go check them out, I'll wait). These guidelines suggest the use of HMAC + bcrypt as a reasonably secure password storage method.

The hashing function has two steps. First, the password is hashed with an algorithm called HMAC, together with a local salt: H: password -> HMAC(local_salt + password). The local salt is a random value that is stored only on the server, never in the database. Why is this good? If an attacker steals one of our password databases, they would need to also separately attack one of our web servers to get file access in order to discover this local salt value. If they don't manage to pull off two successful attacks, their stolen data is largely useless.

As a second step, this hashed value (or strengthened password, as some call it) is then hashed again with a slow hashing function called bcrypt. The key point here is slow. Unlike general-purpose hash functions, bcrypt intentionally takes a relatively long time to be calculated. Unless an attacker has millions of years to spend, they won't be able to try out a whole lot of passwords after they steal a password database. Plus, bcrypt hashes are also salted, so no two bcrypt hashes of the same password look the same.

So the whole function looks like: H: password -> bcrypt(HMAC(password, localsalt), bcryptsalt).

We wrote a reference implementation for this for Django: django-sha2. Like all Mozilla projects, it is open source, and you are more than welcome to study, use, and contribute to it!

What about Mozilla Persona?

Funny you should mention it. Mozilla Persona (née BrowserID) is a new way for people to log in. Persona is the password specialist, and takes the burden/risk away from sites for having to worry about passwords altogether. Read more about Mozilla Persona.

So you think you're cool and can't be cracked? Challenge accepted!

Make no mistake: just like everybody else, we're not invincible at Mozilla. But because we actually take our users' data seriously, we take precautions like this to mitigate the effects of an attack, even in the unfortunate event of a successful security breach in one of our systems.

If you're responsible for user data, so should you.

If you'd like to discuss this post, please leave a comment at the Mozilla Webdev blog. Thanks!

Read more…

In my home network, I use IPv4 addresses out of the 10.x.y.z/8 private IP block. After AT&T U-Verse contacted me multiple times to make me reconfigure my network so they can establish a large-scale NAT and give me a private IP address rather than a public one (this might be material for a whole separate post), I reluctantly switched ISPs and now have Comcast. I did, however, keep AT&T for television. Now, U-Verse is an IPTV provider, so I had to put the two services (Internet and IPTV) onto the same wire, which as it turned out was not as easy as it sounds. <!--more-->

tl;dr: This is a "war story" more than a crisp tutorial. If you really just want to see the ebtables rules I ended up using, scroll all the way to the end.

IPTV uses IP Multicast, a technology that allows a single data stream to be sent to a number of devices at the same time. If your AT&T-provided router is the centerpiece of your network, this works well: The router is intelligent enough to determine which one or more receivers (and on what LAN port) want to receive the data stream, and it only sends data to that device (and on that wire).

Multicast, the way it is supposed to work: The source server (red) sending the same stream to multiple, but not all, receivers (green).

Turns out, my dd-wrt-powered Cisco E2000 router is--out of the box--not that intelligent and, like most consumer devices, will turn such multicast packets simply into broadcast packets. That means, it takes the incoming data stream and delivers it to all attached ports and devices. On a wired network, that's sad, but not too big a deal: Other computers and devices will see these packets, determine they are not addressed to them, and drop the packets automatically.

Once your wifi becomes involved, this is a much bigger problem: The IPTV stream's unwanted packets easily satisfy the wifi capacity and keep any wifi device from doing its job, while it is busy discarding packets. This goes so far as to making it entirely impossible to even connect to the wireless network anymore. Besides: Massive, bogus wireless traffic empties device batteries and fills up the (limited and shared) frequency spectrum for no useful reason.

Suddenly, everyone gets the (encrypted) data stream. Whoops.

One solution for this is only to install manageable switches that support IGMP Snooping and thus limit multicast traffic to the relevant ports. I wasn't too keen on replacing a bunch of really expensive new hardware though.

In comes ebtables, part of netfilter (the Linux kernel-level firewall package). First I wrote a simple rule intended to keep all multicast packets (no matter their source) from exiting on the wireless device (eth1, in this case).

ebtables -A FORWARD -o eth1 -d Multicast -j DROP

This works in principle, but has some ugly drawbacks:

  1. -d Multicast translates into a destination address pattern that also covers (intentional) broadcast packets (i.e., every broadcast packet is a multicast packet, but not vice versa). These things are important and power DHCP, SMB networking, Bonjour, ... . With a rule like this, none of these services will work anymore on the wifi you were trying to protect.
  2. -o eth1 keeps us from flooding the wifi, but will do nothing to keep the needless packets sent to wired devices in check. While we're in the business of filtering packets, might as well do that too.

So let's create a new VLAN in the dd-wrt settings that only contains the incoming port (here: W) and the IPTV receiver's port (here: 1). We bridge it to the same network, because the incoming port is not only the source of IPTV, but also our connection to the Internet, so the remaining ports need to be able to connect to it still.

dd-wrt vlan settings

Then we tweak our filters:

ebtables -A FORWARD -d Broadcast -j ACCEPT
ebtables -A FORWARD -p ipv4 --ip-src ! 10.0.0.0/24 -o ! vlan1 -d Multicast -j DROP

This first accepts all broadcast packets (which it would do by default anyway, if it wasn't for our multicast rule), then any other multicast packets are dropped if their output device is not vlan1, and their source IP address is not local.

With this modified rule, we make sure that any internal applications can still function properly, while we tightly restrict where external multicast packets flow.

That was easy, wasn't it!

Some illustrations courtesy of Wikipedia.

Read more…

Day 329 - Ready for the Sunset

A family of tourists, getting ready to watch the sun set on the Pacific coast. I love silhouette photos like this: It's fun to see the different characters with their body shapes and postures.

Read more…

Day 15 - Open Your Heart, Open Your Mind, Open Your Source

This is one of my favorite coffee mugs: I bought it back in 2002 at the LinuxTag open source conference in Karlsruhe, Germany. The motto of that year's conference -- "Open Your Mind, Open Your Heart, Open Your Source" -- hints at what this conference was still very much about: Convincing decision makers, in particular in government organizations, to recognize the potential in open source software and treat it as an opportunity rather than a threat. Luckily, we've come a long way since then.

This is a simple, Saturday-morning sun, "portrait" photo, with no alterations whatsoever.

Read more…

Surely, you've heard of many fancy new features that HTML5 and related technologies bring to the Web now and in the future: open video on the web, canvas, transitions, what have you.

But sometimes it's the smallest things that have the biggest impact. Besides these hyped features, HTML5 also introduces a number of semantic form fields. Before, the only textual input the web knew was, well, plain text. It was up to the web application developer to enforce certain rules around that, like making sure the input is a number, or not empty, or even a valid website address (URL).

Firefox 4 understands these new input types and helps the user by enforcing correct values even before the users submits the form. By handling validation on the client, this enables a consistent form validation UI across websites and keeps the user from constantly submitting forms and wait for the server-side form validation to pass or fail. (NB: This does not relieve the developers of performing server-side checks in order to ensure the security of their web application).

Here is what this looks like in a recent prototype of the Firefox Input site:

Another fun little feature, also pictured, is the placeholder text attribute. The grayed-out placeholder in a text box shows you an example of what you might enter into this field. Rather than explaining correct values in a huge label or a side note next to the field, developers can show their users much more easily what data they would like them to enter into the form fields.

All of this makes for fewer mistakes entering data into web forms, which is both beneficial to the user (getting the job done faster) and the developer (collecting better data). Win-win!

For much more detailed on HTML5 forms, placeholders, validation, etc., take a look at Mark Pilgrim's excellent Dive Into HTML5. Also, don't miss out on Anthony Ricaud's in-depth description of HTML5 forms in Firefox on the Mozilla Hacks blog.

Read more…

Cryptographic hash functions play an important part in application security: Usually, user passwords are hashed and stored in the database. When someone logs in, their input is hashed as well and compared to the database content. A weak hash is almost as bad as no hash at all: If someone steals (part of) your user database, they can analyze the hashed values to detect the actual password--and then use it, without the owner's knowledge, to log into your application on their behalf.

As part of a proactive web application security model, it is therefore important to stay ahead of the game attackers play and use sufficiently strong encryption to store passwords. Since cryptanalysts are spending great efforts on breaking encryption algorithms (with the help of increasingly fast and cheap computers), SHA-1 is meanwhile considered only borderline in strength. Not a good position to be in if you want to write future-proof apps.

Django (our web app framework of choice at Mozilla) does not support anything stronger than its default, SHA-1, and has, in the past, WONTFIXed attempts to increase hash strengths, citing strong backwards compatibility as the reason. As long as Django targets Python 2.4 as its greatest common denominator, this is unlikely to change. Writing a full-blown, custom authentication backend for the purpose is an option (the Mozilla Add-ons project chose to do that), but it seemed overkill to me, given that with the exception of the hash strength, Django's built-in authentication code works just fine.

So I decided to monkey-patch their auth model at run time, to add SHA-256 support to my application (while staying backwards-compatible with older password hashes possibly existing in the database).

The code is simple, and I made an effort to keep it as uninvasive as possible, so that it can be removed easily in case Django ever does get support for stronger hashes down the road. Let me know what you think: (Embedded from a Gist on Github).

Read more…

A while ago, I had to import some HTML into a Python script and found out that—while there is cgi.escape() for encoding to HTML—there did not seem to be an easy or well-documented way for decoding HTML entities in Python.

Silly, right?

Turns out, there are at least three ways of doing it, and which one you use probably depends on your particular app's needs.

1) Overkill: BeautifulSoup

BeautifulSoup is an HTML parser that will also decode entities for you, like this:

soup = BeautifulSoup(html, convertEntities=BeautifulSoup.HTML_ENTITIES)

The advantage is its fault-tolerance. If your input document is malformed, it will do its best to extract a meaningful DOM tree from it. The disadvantage is, if you just have a few short strings to convert, introducing the dependency on an entire HTML parsing library into your project seems overkill.

2) Duct Tape: htmlentitydefs

Python comes with a list of known HTML entity names and their corresponding unicode codepoints. You can use that together with a simple regex to replace entities with unicode characters:

import htmlentitydefs, re
mystring = re.sub('&([^;]+);', lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]), mystring)
print mystring.encode('utf-8')
Of course, this works. But I hear you saying, how in the world is this not in the standard library? And the geeks among you have also noticed that this will not work with numerical entities. While &copy; will give you ©, &#169; will fail miserably. If you're handling random, user-entered HTML, this is not a great option.

3) Standard library to the rescue: HTMLParser

After all this, I'll give you the option I like best. The standard lib's very own HTMLParser has an undocumented function unescape() which does exactly what you think it does:
>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> s = h.unescape('&copy; 2010')
>>> s
u'\xa9 2010'
>>> print s
© 2010
>>> s = h.unescape('&#169; 2010')
>>> s
u'\xa9 2010'

So unless you need the advanced parsing capabilities of BeautifulSoup or want to show off your mad regex skills, this might be your best bet for squeezing unicode out of HTML snippets in Python.

Read more…

The Copy ShortURL Add-on has been on AMO for a week now and was recently approved to be public, so now I have a user base to please ;)

I am inclined to drop the code onto github, where I'd get a proper version history along with a bug tracker. Update: It's on github now!

For now though, here are a few ideas I have for the add-on, in no particular order and with no promise that I'm about to implement any of this right away:

  • Allow other URL shortening services. tinyURL is all fun and games, and I chose it over bit.ly because it does not require an API key -- but if you have one at hand, you should be able to use any service you like. Even if only by setting an about:config preference.
  • Incorporate selected sites that support short URLs but do not publish them as a header. Zappos (zapp.me), for example. Others seem to have a short URL available (such as: NY Times (nyti.ms), Amazon (amzn.to), ESPN (es.pn)) but only use them on their twitter account and not on every webpage, so there might be nothing we can do :(.
  • When shortening, need to make sure not to use the current URL but the canonical URL if such a header exists. (Fixed!)

Let me know what you think! I'd like to know if any other things come to your minds, or which of the above you'd find especially useful.

Read more…

Update: The add-on is now on AMO! Check it out! Also, feedback is greatly appreciated!


This week during the Mozilla Summit in Whistler, British Columbia, there was a "Rocket Your Firefox" Jetpack contest: The idea, make a new add-on using the Jetpack SDK, submit it, win a prize.

So I went ahead and made a jetpack called "Copy Short URL" and it does what it sounds like:

On any webpage, you get a new item in the right click menu called "copy short URL". When you click it, the add-on looks for a canonical short URL exposed in the page header. Currently, a number of major websites expose their own short URLs for any entry on their webpages, among these: youtube ("youtu.be/..."), flickr ("flic.kr/..."), Arstechnica, Techcrunch, and many more. If, however, the site does not name its own short URL, the add-on automatically falls back to making a short URL using tinyurl.com. Either way, after a fraction of a second, you end up with a short URL in your clipboard, ready to be used in forum posts, tweets, or wherever else you please.

My add-on won the contest in the "most useful" category. The prize was an awesome jetpack sweatshirt:

If you want to check out the add-on, it is currently available (open source, of course) on the add-ons builder website. I also uploaded the add-on to AMO.

Hope you find it useful!

Read more…