Updating the Mozilla Public License

Today, Mozilla is starting the public process on revising its signature code license, the Mozilla Public License or MPL. Mitchell Baker, chair of the board of the Mozilla Foundation and author of the original MPL 1.0, has more information about the process on her blog.

The discussion is happening on the website mpl.mozilla.org that looks something like this:

I am happy about this for a number of reasons. Of course, I made the website (the design is borrowed from mozilla.org), so I am naturally happy to see it being available to a wider audience.

But I also hope that the revision process itself will be successful. While the MPL has been a remarkable help in Mozilla desktop projects' success, it is unpleasant (to say the least) to use in web applications, for a number of reasons:

The hideous license block. The MPL is a file-based license. It allows any file in the project, even in the same directory, to be licensed differently. Therefore, each MPL-licensed code file must have an over 30 lines long comment block on top. For big code modules, that's fine. For web applications, whose files often have a handful of lines, this balloons up the whole code base and makes files horribly unreadable. Sadly, the current license only allows an exception from that rule if that's impossible "due to [the file's] structure" which would essentially only be the case if that file type did not allow comments.

The copyleft. This one is debatable, but it's a fact that some open source communities, one prominent example is the Python community, does not appreciate strong copyleft provisions. While the MPL (unlike the GNU GPL) does not have a tendency to "taint" other code, this is not at all compatible with the BSD or MIT licenses' notion of "take it and do (almost) whatever you please with it". (As you may have noticed, the file-based MPL is both a curse and a blessing here). I hope that the revision process can make it clearer how this applies to hosted applications (i.e., mostly web applications).

I am excited to see what the broad community discussion will bring to light over the next few months.

March 01, 2010Fred Wenzel

pdftk 1.41 for Mac OS X 10.6

Update: The author of pdftk, Sid Steward, left the following comment:

A new version of pdftk is available (1.43) that fixes many bugs. This release also features an installer [for] OS X 10.6. Please visit to learn more and download: www.pdflabs.com.

This blog post will stick around for the time being, but I (the author of this blog) advise you to always run the latest version so that you can enjoy the latest bug fixes.

OS X Leopard users: Sorry, neither this version nor the installer offered on pdflabs.com works on OS X before 10.6. You might be able to compile from source though. Let us know if you are successful.

Due to my being a remote employee, I get to juggle with PDF files quite a bit. A great tool for common PDF manipulations (changing page order, combining files, rotating pages etc) has proven to be pdftk. Sadly, a current version for Mac OS X is not available on their homepage. In addition, it is annoying (to say the least) to compile, which is why all three third-party package management systems that I know of (MacPorts, fink, as well as homebrew), last time I checked, did not have it at all, or their versions were broken.

Now I wouldn't be a geek if that kept me from compiling it myself. I took some hints from anoved.net who was nice enough to also provide a compiled binary, but sadly did not include the shared libraries it relies on.

Instead, I made an installer package that'll install pdftk itself as well as the handful of libraries you need into /usr/local. Once you ran this, you can open Terminal.app, and typing pdftk should greet you as follows:

$ pdftk
SYNOPSIS
       pdftk <input PDF files | - | PROMPT>
            [input_pw <input PDF owner passwords | PROMPT>]
            [<operation> <operation arguments>]
            [output <output filename | - | PROMPT>]
            [encrypt_40bit | encrypt_128bit]
(...)

You can download the updated package here: pdftk1.41_OSX10.6.dmg

(MD5 hash: ea945c606b356305834edc651ddb893d)

I only tested it on OS X 10.6.2, if you use it on older versions, please let me know in the comments if it worked.

February 17, 2010Fred Wenzel

MongoDB / NoSQL for Web Applications

As mentioned earlier, I dove a little into the world of non-relational databases for web applications. One of the more interesting ones seems to be MongoDB. By the way, a video of the presentation I attended is meanwhile online as well.

MongoDB does not only seem to be "fully buzz-word compatible" (MongoDB (from "humongous") is a scalable, high-performance, open source, schema-free, document-oriented database.), it also looks like an interesting alternative storage backend for web applications, for various reasons I'd like to outline here.

Note that I haven't extensively worked with MongoDB, nor have any of the significant web applications I worked with used non-relational databases yet. So you are very welcome to point out points I got wrong in the comments.

First, some terminology: Schema-free and document-oriented essentially means that your data is stored as a loose collection of items in a bucket, not as rows in a table. Different items in the bucket can be uniform (in OOP-terms, instances of the same class), but they needn't be. In MongoDB, if you access a non-existent object, it'll spring into existence as an empty object. Likewise for a non-existent attribute.

How can that help us? Web applications have a much faster development cycle than traditional applications (an observation, for example reflected in the recent development changes on AMO). With all feature changes, related database changes have to be applied equally as frequently, every time write-locking the site up to several minutes depending on how big the changes. In a schema-free database, code changes can smoothly be rolled out and can start using fields right away, on the affected items only. For example, in MongoDB, adding a nickname to the user profiles would be trivial, and every user object that never had a nickname before would be assumed to have an empty one by default. The tedious task of keeping the database schema in sync between development and deployment is basically going away entirely.

In traditional databases, we have gotten accustomed to the so-called ACID properties: Atomicity, Consistency, Isolation, Durability. By relaxing these properties, unconventional databases can score performance-wise, because less locking and less database-level abstraction is needed. Some exemplary ACID relaxations that I gathered about MongoDB are:

MongoDB does not have transactions, which affects both Atomicity and Isolation. This will let other threads observe intermediate changes while they happen, but in web applications that is often not a big deal.
MongoDB relies on eventual consistency, not strict consistency. That means, when a write occurs and the write command returns, we can not be 100% sure that from that moment in time on, all other processes will see the updated data only. They will only eventually be able to see the changes. This affects caching, because we can't invalidate and re-fill our caches immediately, but again, in web applications it's often not a big deal if updates take a few seconds to propagate.
Durability is also relaxed in the interest of speed: As we all know, accessing RAM takes a few nanoseconds, while hitting the hard drive is easily many thousands of times (!) slower. Therefore, MongoDB won't make sure your data is on the hard drive immediately. As a result, you can lose data that you thought was already written if your server goes down in the period between writing and actual storing to the hard drive. Luckily, that doesn't happen too often.

As you see, if our application is not a banking web site and we are willing to part with some of the guarantees that traditional databases offer, we can use a database like MongoDB, that much more closely fits the way modern web applications are developed than regular RDBMSes do. If that's an option, every project needs to decide on a case-by-case basis.

February 15, 2010Fred Wenzel

On Hackability

There is a Belorussian version of this article provided by PC.

One of the talks I really enjoyed at recent FOSDEM was Paul and Tristan's presentation on Hackability. (Tristan uploaded the English slides to slideshare, as well as the French ones).

Essentially, it was a great promotion for keeping the Web (and Firefox as the tool we view it through) (both legally and technically) open, its building blocks visible and interchangeable. If you can't open it, you don't own it.

As a result, this also means the "view source" function is not there to feed the user's idle curiosity, it is a vital and irreplaceable part of the Web. Likewise, a tool like Firebug does not exist to "break" other people's websites. Instead, it helps us to use the web the way it was meant to be used.

Recently, a colleague of mine (don't remember who, sorry) linked to a little website called patch culture.org, that, in spite of its simple appearance, promotes exactly that: using the Web the way it was meant to be used, fixing, improving the Web on our way through other people's sites, and better yet, share our changes with the people who own the sites. Their steps are easy: 1) Install Firebug, 2) change a website, 3) email a patch to the owner.

Sounds easy (to geek ears, anyway) but is harder than it looks. For starters, how do I get my changes out of Firebug? It's a concept we could call "diffability". If I have to write a book describing what I did to some website's DOM nodes and CSS rules, I am far less likely to fix someone else's website for them than when there is an easy way for me to do it. Granted: Even if Firebug let me export a unified diff, owners of non-trivial, framework-based web sites wouldn't be able to just go ahead and apply it on their codebase. However, diffs are ~~human~~ engineer readable. Without losing a ton of words, the website owner could look at the changes I made and choose to apply them to their software in the appropriate spots.

Second, how do I make my changes stick? We Open Source developers are of course some of the more altruistically inclined citizens of the Web, still if you are going to fix someone's website, you are likely to do so to lower your own annoyance level first, then everybody else's. Therefore, you want your changes to "stick", if or if not the website owner decides to accept and deploy your changes.

Thankfully, this is achievable, though it involves a little bit of a hassle. There are add-ons out there, most notably Stylish (for CSS-based changes) and Greasemonkey (for JS-based changes). These two were recently joined by Jetpack Page Mods. While Greasemonkey is a solid platform with tons of contributions, I see its biggest flaw in missing a solid standard library that takes the pain out of JavaScript, a problem Jetpack mitigates by shipping with jQuery included. In comparison, using jQuery with Greasemonkey is many things, none of which is "beautiful". If Greasemonkey wants to stay the technology of choice for "web hackers", it needs a standard library. Only then will it fill its place as a lightweight extension engine in the future, (yes, in spite of its recent inclusion in Chrome). It would be a twisted situation if it became easier to write full-blown (Jetpack-based) extensions than writing a user script. It's the reason I am already writing small website changes as Jetpacks and not GM scripts, and I am not the only one. But because competition is good for business, on the Web as much as elsewhere, I hope the Greasemonkey guys stay on top of their game.

In summary:

Let's make and keep the Web open and hackable!
We can change web sites, but it's hard to share what we did. A great way towards more open hacking would be a diff engine in Firebug. Even if it only exports pseudo-diffs, or even if the diffs can't be applied with one click unless you run a fully static website.
Finally, it's possible but hard to make changes stick. Greasemonkey is a strong contender in the field, but if they want to keep being the number one "hackability engine", they'll need to make writing scripts easier by adding a decent standard library. After all, it is not the 20th century anymore.

February 10, 2010Fred Wenzel

FOSDEM 2010

Last weekend I spent at FOSDEM 2010, the tenth installment of the "Free and Open Source Software Developers' European Meeting". It was my first time there, and it was great. It was a full-blown conference and meeting point for both big and small open source projects from all over Europe.

Let me outline some of the highlights:

As expected, the Mozilla presentations were highly frequented, and the Mozilla Europe team presented great HTML 5 features that'll make the future of the Web (and web developers' future) bright. Another presentation focused on the importance of Hackability for making the future of technology what we want, not what we are being fed.
Sunday I spent some time on the NoSQL track. It started off with a good presentation on what non-relational databases can do for you, and why they are not supposed to replace SQL. While NoSQL is a buzz word, it's important to note that there is a potential for faster, smoother applications by dropping the rigid framework that relational databases impose on us developers when its advantages are not needed.
Another NoSQL related presentation, Introduction to MongoDB, showed off the features of this particular, schema-free, document-oriented, database. I found it highly interesting for web applications and am looking forward to giving it a shot on an upcoming project.
Finally, two Facebook engineers explained what Open Source projects they have used and improved to scale their infrastructure to accommodate its enormous user base. What's impressive is that they have introduced improvements on almost all parts of the software stack. In order to serve pictures faster, for example, they wrote a file system that allows them to grab a file in a single read. Another interesting technology is HipHop, their PHP-to-C++ compiler. This ensures that they can hire PHP developers, yet have a ridiculously fast web application. That's probably as ugly as it sounds, but luckily not everybody has to do it ;)

On some of these issues, I am going to go into more detail in followup posts.

I also went to some presentations that affect my work on the Mozilla project slightly less:

One of the keynotes, Evil on the Internet, was equally as insightful as it was scary. Not only are the scams out there on the Internet getting smarter and harder to detect, it is also frightening how long some scam sites stay online, if no-one feels responsible for them.
Professor Andrew Tanenbaum showed off his MINIX microkernel, version 3, for which he recently received a significant research grant from the European Union. He would also like to see Firefox ported to MINIX, anyone want to help him out? :)

All in all, fosdem 10 was a great success, thanks to all the volunteers who made it happen!

January 31, 2010Fred Wenzel

Deploying a Django Application on Lighttpd with fastcgi and virtualenv

So you wrote a fancy little Django app and want to run it on a lighttpd webserver? There's plenty of documentation on this topic online, including official Django documentation.

Problem is, most of these sources do not mention how to use virtualenv, but the cool kids don't install their packages into the global site-packages directory. So I put some scripts together for your enjoyment.

I assume that you've put your django app somewhere convenient, and that you have a virtualenv containing its packages (including django itself).

1. manage.py

You want to set up this file so it adds the virtualenv's site-packages path to its site-packages: site.addsitedir('path/to/mysite-env/lib/python2.6/site-packages'). Note that you need to point directly to the site-packages dir inside the virtualenv, not only the main virtualenv dir. For obvious reasons, this line needs to come before the django-provided from django... import, because you can't import django files if Python doesn't know where they are.

2. settings.py

The lighttpd setup will result in mysite.fcgi showing up in all your URLs, unless you set FORCESCRIPTNAME correctly. If your django app is supposed to live right at the root of your domain, set this to the empty string, for example.

3. django-servers.sh

This is an initscript (for Debian, but you can modify it to work with most distros, I presume). Copy it to /etc/init.d, adjust the settings on top (and possibly other places, depending on your setup), then start the Django fastcgi servers. Note that you need to have the flup package installed in your virtualenv.

4. lighttpd-vhost.conf

Set up your lighttpd vhost pretty much like the Django documentation suggests. Match up the host and port with the settings from your init script. By using mod_alias for the media and admin media paths, you'll have lighttpd serve them instead of passing them on to Django as well.

That's it! You've deployed your first Django application on lighttpd. If you have any questions or suggestions, feel free to comment here or fork my code.

You can look at all the scripts together over on github or download them in a package.

January 29, 2010Fred Wenzel

Annoying Browser-Related Blog Spam

[caption id="attachment_2563" align="alignright" width="150" caption="CC by-sa licensed by twicepics on flickr"][/caption]Over the recent weeks I've got frequent blog spam along the lines of:

Hi. I just noticed that your site looks like it has a few code problems at the very bottom of your site's page. I'm not sure if everybody is getting this same problem when browsing your blog? I am employing a totally different browser than most people, referred to as Opera, so that is what might be causing it? I just wanted to make sure you know. Thanks for posting some great postings and I'll try to return back with a completely different browser to check things out!

(emphasis: mine)

Not only does my blog display just fine in Opera (yes, I checked), I get even more bogus comments at times claiming that my blog looks horrible in Firefox, of all browsers. Dear spammers, now you're just making fools of yourselves.

The main thing identifying this kind of comment as spam (other than the bogus claim that my blog doesn't render correctly in non-Internet-Explorer browsers) is the URL these comments come with. Usually, they promise a "free" iPod, MacBook, car, house, airplane or ride to the moon (exaggeration: mine).

I wonder how many bloggers actually publish these, thinking it's well-meant advice. :(

Photo credit: "Spam" CC by-sa licensed by twicepics on flickr

January 21, 2010Fred Wenzel

Things-Bugzilla or: Embedding Python into AppleScript

To keep track of my ever-growing to-do list, I am using a fabulous little application called "Things". And most of my work-related to-do items are bugs in Mozilla's bugzilla bug tracker.

So, me being a geek and all, I quite naturally wanted to integrate the two and wrote a little AppleScript that asks the user for a bugzilla.mozilla.org bug number, obtains its bug title, and makes a new to-do item for it in Things' Inbox folder.

The script is available as a gist on github. Click here to download it.

If you look at the code, you'll notice that I went ahead and embedded some Python code to the script to do the heavy lifting. The problem with AppleScript is not only that it has a hideous syntax, it also completely lacks a standard library for things like downloading websites and regex-parsing strings. Let's look at it a little closer:

set bugtitle to do shell script "echo \"" & bugzilla_url & "\" | /usr/bin/env python -c \"
import re
import sys
import urllib2
bug = urllib2.urlopen(sys.stdin.read())
title = re.search('<title>([^<]+)</title>', bug.read()).group(1)
title = title.replace('&ndash;', '-')
print title
\""

set bugtitle to do shellscript "" means, assign whatever this shell expression returns to the variable bugtitle. This way, we just need to print our final result to stdout and keep using it in AppleScript.
echo \"" & bugzilla_url & "\" | /usr/bin/env python feeds some input data into the Python script through stdin. We read that a few lines later with sys.stdin.read(). Another method, especially for more than one input values, would be command-line parameters, all the way at the end of the Python block (after the source code).
Finally, in python -c \"mycode\" the -c marks an inline code block to be executed by the Python interpreter. Other languages, such as Perl, PHP, or Ruby, have similar operating modes, so you can use those as well.

If you want to install the Things-Bugzilla AppleScript, make sure to download the entire Gist as it also contains an install script for your convenience.

January 14, 2010Fred Wenzel

Managing Young Sys Admins At Oregon State Open Source Lab

A few days ago techtarget published a short interview about the OSU Open Source Lab, where I worked while studying at OSU:

"Lance Albertson, architect and systems administrator at the Oregon State University Open Source Lab, uses a sys admin staff of 18-21-year-old undergrads to manage servers for some high-profile, open-source projects (Linux Master Kernel, Linux Foundation, Apache Software Foundation, and Drupal to name a few). In this Q&A, Albertson talks about the challenges of using young sys admins and the lab's plans to move from Cfengine to Puppet for systems management."

(via slashdot).

I must say, the work I've seen student sys admins do at the OSL is outstanding, and I've met some of the sharpest people there I've ever worked with. Glad to hear they are still going strong.

Thanks for the link, Justin!

January 12, 2010Fred Wenzel

Using SVN repositories as git submodules

With the Subversion VCS, one way to import external modules or libraries into a code tree is by defining the svn:externals property of your repository. Subversion will then check out the specified revision (or the latest revision) of the other repository into your source tree when checking out your code.

Submodules are basically the same thing in the "git" world.

And since git can talk to subversion repositories with git svn, we should be able to specify a third-party SVN repository as a submodule in our own git repository, right? Sadly the answer is currently: No.

Here is a workaround that I have been using to at least achieve a similar effect, while keeping both SVN and git happy. This assumes that you have a local git repository that is also available somewhere else "up-stream" (such as github), and you want to import an external SVN repository into your source tree.

1: Add a tracking branch for SVN

Add a section referring to your desired SVN repository to your .git/config file:

(...)
[svn-remote "product-details"]
    url = http://svn.mozilla.org/libs
    fetch = product-details:refs/remotes/product-details-svn

Note that in the fetch line, the part before the colon refers to the branch you want to check out of SVN (for example: trunk), and the part after that will be our local remote branch location, i.e. product-details-svn will be our remote branch name.

Now, fetch the remote data from SVN, specifying a revision range unless you want to check out the entire history of that repository:

git svn fetch product-details -r59506:HEAD

git will check out the remote branch.

2: clone the tracking branch locally

Now we have a checked-out SVN tracking branch, but to use it as a submodule, we must make a real git repository from it -- a branch of our current repository will keep everything in one place and work as well. So let's check out the tracking branch into a new branch:

git checkout -b product-details-git product-details-svn

As git status can confirm, you'll now have (at least) two branches: master and product-details-git.

3: Import the branch as a submodule

Now let's make the new branch available upstream:

git push --all

After that's been pushed, we can import the new branch as a submodule where we want it in the tree:

git checkout master
git submodule add -b product-details-git ../reponame.git my/submodules/dir/product-details

Note that ../reponame.git refers to the up-stream repository's name, and -b ... defines the name of the branch we've created earlier. Git will check out your remote repository and point to the right branch automatically.

Don't forget to git commit and you're done!

Updating from SVN

Updating the "external" from SVN is unfortunately a tedious three-step process :(. First, fetch changes from SVN:

git svn fetch product-details

Second, merge these changes into your local git branch and push the changes up-stream:

git checkout product-details-git
git merge product-details-svn
git push origin HEAD

And finally, update the submodule and "pin it" at the newer revision:

git checkout master
cd my/submodules/dir/product-details
git pull origin product-details-git
cd ..
git add product-details
git commit -m 'updating product details'

Improvements?

This post is as much a set of instructions as it is a call for improvements. If you have an easier way to do this, or if you know how to speed up or simplify any of this, a comment to this post would be very much appreciated!