O Hai again, Mozilla!

I made it! Starting today, I’ll be working from Mozilla Headquarters in Mountain View, California.

Mozilla HQ

If you haven’t ran into me yet, feel free to step by and say hi. I was very happy to get such a warm welcome today, and I actually found it quite fun to be introduced as one of the “new hires” in today’s all-hands meeting. :)

Categories: fredericiana, Mozilla Crosspost | Tags: ,

Was “Stuttgart 20″ mit Stuttgart 21 gemeinsam hat

Anfang der Woche sah ich im Fernsehen eine Dokumentation aus den 60er Jahren über die Planungen zum Neubau der Schnellfahrstrecke Mannheim-Stuttgart durch die Deutsche Bundesbahn. Das war recht amüsant, etwa die Aussage, die modernen IC-Züge könnten auf dieser zu bauenden Strecke ihre Geschwindigkeit von 160 km/h voll ausfahren (atemberaubend!).

Dann aber wurde es seltsam aktuell: Befragt wurde ein Stuttgarter Politiker nach der Notwendigkeit des geplanten Baus, auch im Hinsicht auf den gewaltigen Widerstand von Seiten der Bevölkerung. Er sagte in etwa das Folgende (aus dem Gedächtnis paraphrasiert):

Die Region Stuttgart ist eine High-Tech-Region, deren Konkurrenzfähigkeit von ihrer Infrastruktur abhängt. Wenn wir wollen, dass sich Stuttgart gegenüber den anderen Technologieregionen in Deutschland und in ganz Europa auch in Zukunft weiter behaupten kann, müssen wir mit modernen Verkehrsmitteln erreichbar sein.

Das war ein Kommentar, den man (mehr als 40 Jahre später!) genau so auch in der Stuttgart-21-Diskussion hätte hören können. Wie ich finde, ein durchaus berechtigter Einwand, der ironischerweise in den 60er Jahren bei weitem noch nicht so relevant war wie heute, wo die Landeshauptstadt nach Kräften versucht, High-Tech-Industrie jeder Couleur in seiner Nähe zu bündeln.

Eines ist sicher: Sich auf den “Daimler-Lorbeeren” (wir brauchen keine Infrastrukturinvestitionen, beim Daimler kommen doch alle pünktlich zur Arbeit) auszuruhen, könnte langfristig großen Schaden anrichten, wenn man zu spät merkt, dass in Wirklichkeit die Relevanz der Region Stuttgart im 21. Jahrhundert auf dem Spiel stand. Wo sich die Politik momentan den Vorwurf machen lassen muss, mit dem Kopf durch die Wand zu wollen, sollten die Gegner des Projekts nicht den Fehler machen, an dieser Wand ziellos weiterzumauern.

Ich jedenfalls bin auf den Fortschritt des Schlichtungsverfahrens gespannt — bis dato gilt es ja schon als Erfolg, dass man noch nicht am ersten Tag gescheitert ist. Erfolg ist eben doch Definitionssache.

Categories: Deutschland | Tags:

Adding Support for Stronger Password Hashes To Django

Cryptographic hash functions play an important part in application security: Usually, user passwords are hashed and stored in the database. When someone logs in, their input is hashed as well and compared to the database content. A weak hash is almost as bad as no hash at all: If someone steals (part of) your user database, they can analyze the hashed values to detect the actual password–and then use it, without the owner’s knowledge, to log into your application on their behalf.

As part of a proactive web application security model, it is therefore important to stay ahead of the game attackers play and use sufficiently strong encryption to store passwords. Since cryptanalysts are spending great efforts on breaking encryption algorithms (with the help of increasingly fast and cheap computers), SHA-1 is meanwhile considered only borderline in strength. Not a good position to be in if you want to write future-proof apps.

Django (our web app framework of choice at Mozilla) does not support anything stronger than its default, SHA-1, and has, in the past, WONTFIXed attempts to increase hash strengths, citing strong backwards compatibility as the reason. As long as Django targets Python 2.4 as its greatest common denominator, this is unlikely to change. Writing a full-blown, custom authentication backend for the purpose is an option (the Mozilla Add-ons project chose to do that), but it seemed overkill to me, given that with the exception of the hash strength, Django’s built-in authentication code works just fine.

So I decided to monkey-patch their auth model at run time, to add SHA-256 support to my application (while staying backwards-compatible with older password hashes possibly existing in the database).

The code is simple, and I made an effort to keep it as uninvasive as possible, so that it can be removed easily in case Django ever does get support for stronger hashes down the road. Let me know what you think:

(Embedded from a Gist on Github).

Decoding HTML Entities to Text in Python

A while ago, I had to import some HTML into a Python script and found out that—while there is cgi.escape() for encoding to HTML—there did not seem to be an easy or well-documented way for decoding HTML entities in Python.

Silly, right?

Turns out, there are at least three ways of doing it, and which one you use probably depends on your particular app’s needs.

1) Overkill: BeautifulSoup

BeautifulSoup is an HTML parser that will also decode entities for you, like this:

soup = BeautifulSoup(html, convertEntities=BeautifulSoup.HTML_ENTITIES)

The advantage is its fault-tolerance. If your input document is malformed, it will do its best to extract a meaningful DOM tree from it. The disadvantage is, if you just have a few short strings to convert, introducing the dependency on an entire HTML parsing library into your project seems overkill.

2) Duct Tape: htmlentitydefs

Python comes with a list of known HTML entity names and their corresponding unicode codepoints. You can use that together with a simple regex to replace entities with unicode characters:

import htmlentitydefs, re
mystring = re.sub('&([^;]+);', lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]), mystring)
print mystring.encode('utf-8')

Of course, this works. But I hear you saying, how in the world is this not in the standard library? And the geeks among you have also noticed that this will not work with numerical entities. While © will give you ©, © will fail miserably. If you’re handling random, user-entered HTML, this is not a great option.

3) Standard library to the rescue: HTMLParser

After all this, I’ll give you the option I like best. The standard lib’s very own HTMLParser has an undocumented function unescape() which does exactly what you think it does:

>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> s = h.unescape('© 2010')
>>> s
u'\xa9 2010'
>>> print s
© 2010
>>> s = h.unescape('© 2010')
>>> s
u'\xa9 2010'

So unless you need the advanced parsing capabilities of BeautifulSoup or want to show off your mad regex skills, this might be your best bet for squeezing unicode out of HTML snippets in Python.

Bye bye, Munich

Time flies! Believe it or not, yesterday ended our Munich “era”. We moved out of our apartment and are getting ready to move to the San Francisco Bay Area in just a few days.

Thank you, Munich, for having us. It’s been a blast.

Categories: Munich, photo