As mentioned earlier, I dove a little into the world of non-relational databases for web applications. One of the more interesting ones seems to be MongoDB. By the way, a video of the presentation I attended is meanwhile online as well.

MongoDB does not only seem to be "fully buzz-word compatible" (MongoDB (from "humongous") is a scalable, high-performance, open source, schema-free, document-oriented database.), it also looks like an interesting alternative storage backend for web applications, for various reasons I'd like to outline here.

Note that I haven't extensively worked with MongoDB, nor have any of the significant web applications I worked with used non-relational databases yet. So you are very welcome to point out points I got wrong in the comments.

First, some terminology: Schema-free and document-oriented essentially means that your data is stored as a loose collection of items in a bucket, not as rows in a table. Different items in the bucket can be uniform (in OOP-terms, instances of the same class), but they needn't be. In MongoDB, if you access a non-existent object, it'll spring into existence as an empty object. Likewise for a non-existent attribute.

How can that help us? Web applications have a much faster development cycle than traditional applications (an observation, for example reflected in the recent development changes on AMO). With all feature changes, related database changes have to be applied equally as frequently, every time write-locking the site up to several minutes depending on how big the changes. In a schema-free database, code changes can smoothly be rolled out and can start using fields right away, on the affected items only. For example, in MongoDB, adding a nickname to the user profiles would be trivial, and every user object that never had a nickname before would be assumed to have an empty one by default. The tedious task of keeping the database schema in sync between development and deployment is basically going away entirely.

In traditional databases, we have gotten accustomed to the so-called ACID properties: Atomicity, Consistency, Isolation, Durability. By relaxing these properties, unconventional databases can score performance-wise, because less locking and less database-level abstraction is needed. Some exemplary ACID relaxations that I gathered about MongoDB are:

  • MongoDB does not have transactions, which affects both Atomicity and Isolation. This will let other threads observe intermediate changes while they happen, but in web applications that is often not a big deal.
  • MongoDB relies on eventual consistency, not strict consistency. That means, when a write occurs and the write command returns, we can not be 100% sure that from that moment in time on, all other processes will see the updated data only. They will only eventually be able to see the changes. This affects caching, because we can't invalidate and re-fill our caches immediately, but again, in web applications it's often not a big deal if updates take a few seconds to propagate.
  • Durability is also relaxed in the interest of speed: As we all know, accessing RAM takes a few nanoseconds, while hitting the hard drive is easily many thousands of times (!) slower. Therefore, MongoDB won't make sure your data is on the hard drive immediately. As a result, you can lose data that you thought was already written if your server goes down in the period between writing and actual storing to the hard drive. Luckily, that doesn't happen too often.

As you see, if our application is not a banking web site and we are willing to part with some of the guarantees that traditional databases offer, we can use a database like MongoDB, that much more closely fits the way modern web applications are developed than regular RDBMSes do. If that's an option, every project needs to decide on a case-by-case basis.

Read more…

Last weekend I spent at FOSDEM 2010, the tenth installment of the "Free and Open Source Software Developers' European Meeting". It was my first time there, and it was great. It was a full-blown conference and meeting point for both big and small open source projects from all over Europe.

Let me outline some of the highlights:

  • As expected, the Mozilla presentations were highly frequented, and the Mozilla Europe team presented great HTML 5 features that'll make the future of the Web (and web developers' future) bright. Another presentation focused on the importance of Hackability for making the future of technology what we want, not what we are being fed.
  • Sunday I spent some time on the NoSQL track. It started off with a good presentation on what non-relational databases can do for you, and why they are not supposed to replace SQL. While NoSQL is a buzz word, it's important to note that there is a potential for faster, smoother applications by dropping the rigid framework that relational databases impose on us developers when its advantages are not needed.
  • Another NoSQL related presentation, Introduction to MongoDB, showed off the features of this particular, schema-free, document-oriented, database. I found it highly interesting for web applications and am looking forward to giving it a shot on an upcoming project.
  • Finally, two Facebook engineers explained what Open Source projects they have used and improved to scale their infrastructure to accommodate its enormous user base. What's impressive is that they have introduced improvements on almost all parts of the software stack. In order to serve pictures faster, for example, they wrote a file system that allows them to grab a file in a single read. Another interesting technology is HipHop, their PHP-to-C++ compiler. This ensures that they can hire PHP developers, yet have a ridiculously fast web application. That's probably as ugly as it sounds, but luckily not everybody has to do it ;)

On some of these issues, I am going to go into more detail in followup posts.

I also went to some presentations that affect my work on the Mozilla project slightly less:

  • One of the keynotes, Evil on the Internet, was equally as insightful as it was scary. Not only are the scams out there on the Internet getting smarter and harder to detect, it is also frightening how long some scam sites stay online, if no-one feels responsible for them.
  • Professor Andrew Tanenbaum showed off his MINIX microkernel, version 3, for which he recently received a significant research grant from the European Union. He would also like to see Firefox ported to MINIX, anyone want to help him out? :)

All in all, fosdem 10 was a great success, thanks to all the volunteers who made it happen!

Read more…