Updating the Mozilla Public License

Today, Mozilla is starting the public process on revising its signature code license, the Mozilla Public License or MPL. Mitchell Baker, chair of the board of the Mozilla Foundation and author of the original MPL 1.0, has more information about the process on her blog.

The discussion is happening on the website mpl.mozilla.org that looks something like this:

I am happy about this for a number of reasons. Of course, I made the website (the design is borrowed from mozilla.org), so I am naturally happy to see it being available to a wider audience.

But I also hope that the revision process itself will be successful. While the MPL has been a remarkable help in Mozilla desktop projects’ success, it is unpleasant (to say the least) to use in web applications, for a number of reasons:

The hideous license block. The MPL is a file-based license. It allows any file in the project, even in the same directory, to be licensed differently. Therefore, each MPL-licensed code file must have an over 30 lines long comment block on top. For big code modules, that’s fine. For web applications, whose files often have a handful of lines, this balloons up the whole code base and makes files horribly unreadable. Sadly, the current license only allows an exception from that rule if that’s impossible “due to [the file’s] structure” which would essentially only be the case if that file type did not allow comments.

The copyleft. This one is debatable, but it’s a fact that some open source communities, one prominent example is the Python community, does not appreciate strong copyleft provisions. While the MPL (unlike the GNU GPL) does not have a tendency to “taint” other code, this is not at all compatible with the BSD or MIT licenses’ notion of “take it and do (almost) whatever you please with it”. (As you may have noticed, the file-based MPL is both a curse and a blessing here). I hope that the revision process can make it clearer how this applies to hosted applications (i.e., mostly web applications).

I am excited to see what the broad community discussion will bring to light over the next few months.

Categories: Mozilla Crosspost, OSU OSL Crosspost, Tech Talk | Tags: ,

pdftk 1.41 for Mac OS X 10.6

Due to my being a remote employee, I get to juggle with PDF files quite a bit. A great tool for common PDF manipulations (changing page order, combining files, rotating pages etc) has proven to be pdftk. Sadly, a current version for Mac OS X is not available on their homepage. In addition, it is annoying (to say the least) to compile, which is why all three third-party package management systems that I know of (MacPorts, fink, as well as homebrew), last time I checked, did not have it at all, or their versions were broken.

Now I wouldn’t be a geek if that kept me from compiling it myself. I took some hints from anoved.net who was nice enough to also provide a compiled binary, but sadly did not include the shared libraries it relies on.

Instead, I made an installer package that’ll install pdftk itself as well as the handful of libraries you need into /usr/local. Once you ran this, you can open Terminal.app, and typing pdftk should greet you as follows:

$ pdftk
SYNOPSIS
       pdftk <input PDF files | - | PROMPT>
            [input_pw <input PDF owner passwords | PROMPT>]
            [<operation> <operation arguments>]
            [output <output filename | - | PROMPT>]
            [encrypt_40bit | encrypt_128bit]
(...)

You can download the package here: pdftk1.41_OSX10.6.dmg

I only tested it on OS X 10.6.2, if you use it on older versions, please let me know in the comments if it worked.

Categories: Mozilla Crosspost, OSU OSL Crosspost, Tech Talk | Tags: , ,

United Airlines Joins the Mile High Club

I can’t decide if United Airlines is just promoting the Mile High Club on its twitter page, or if it’s just an ordinary spammer hijacking their account:

(and no, you shouldn’t actually enter that URL into your browser. It’s boring spammy stuff.)

via @cbarrett and countless others on twitter.

Categories: websights | Tags: , ,

MongoDB / NoSQL for Web Applications

As mentioned earlier, I dove a little into the world of non-relational databases for web applications. One of the more interesting ones seems to be MongoDB. By the way, a video of the presentation I attended is meanwhile online as well.

MongoDB does not only seem to be “fully buzz-word compatible” (MongoDB (from “humongous”) is a scalable, high-performance, open source, schema-free, document-oriented database.), it also looks like an interesting alternative storage backend for web applications, for various reasons I’d like to outline here.

Note that I haven’t extensively worked with MongoDB, nor have any of the significant web applications I worked with used non-relational databases yet. So you are very welcome to point out points I got wrong in the comments.

First, some terminology: Schema-free and document-oriented essentially means that your data is stored as a loose collection of items in a bucket, not as rows in a table. Different items in the bucket can be uniform (in OOP-terms, instances of the same class), but they needn’t be. In MongoDB, if you access a non-existent object, it’ll spring into existence as an empty object. Likewise for a non-existent attribute.

How can that help us? Web applications have a much faster development cycle than traditional applications (an observation, for example reflected in the recent development changes on AMO). With all feature changes, related database changes have to be applied equally as frequently, every time write-locking the site up to several minutes depending on how big the changes. In a schema-free database, code changes can smoothly be rolled out and can start using fields right away, on the affected items only. For example, in MongoDB, adding a nickname to the user profiles would be trivial, and every user object that never had a nickname before would be assumed to have an empty one by default. The tedious task of keeping the database schema in sync between development and deployment is basically going away entirely.

In traditional databases, we have gotten accustomed to the so-called ACID properties: Atomicity, Consistency, Isolation, Durability. By relaxing these properties, unconventional databases can score performance-wise, because less locking and less database-level abstraction is needed. Some exemplary ACID relaxations that I gathered about MongoDB are:

  • MongoDB does not have transactions, which affects both Atomicity and Isolation. This will let other threads observe intermediate changes while they happen, but in web applications that is often not a big deal.
  • MongoDB relies on eventual consistency, not strict consistency. That means, when a write occurs and the write command returns, we can not be 100% sure that from that moment in time on, all other processes will see the updated data only. They will only eventually be able to see the changes. This affects caching, because we can’t invalidate and re-fill our caches immediately, but again, in web applications it’s often not a big deal if updates take a few seconds to propagate.
  • Durability is also relaxed in the interest of speed: As we all know, accessing RAM takes a few nanoseconds, while hitting the hard drive is easily many thousands of times (!) slower. Therefore, MongoDB won’t make sure your data is on the hard drive immediately. As a result, you can lose data that you thought was already written if your server goes down in the period between writing and actual storing to the hard drive. Luckily, that doesn’t happen too often.

As you see, if our application is not a banking web site and we are willing to part with some of the guarantees that traditional databases offer, we can use a database like MongoDB, that much more closely fits the way modern web applications are developed than regular RDBMSes do. If that’s an option, every project needs to decide on a case-by-case basis.