Don’t Forget to Clean Up After Yourself

On a growing number of projects at Mozilla, we use a tool called Hudson that runs a complete set of tests on the code with every check-in. The beauty of this is that if you accidentally break something, you (and everyone else) will know immediately, so you can fix it quickly. We also use a bunch of plugins with Hudson, one of which assigns points to every check-in: For example, if all tests pass, you get a positive number of points, or if you broke something, you get a negative score.

An innocent little commit of mine gained me a whopping -100 points (yes, that is minus 100) today.

How did that happen? The build broke badly, not because I wrote a pile of horrendous code, or because I didn’t test before committing. In fact, I’ve made it a habit to commit like this:

./manage.py test && git push origin master

This fun little one-liner will result in my code being pushed to the origin repository if and only if all tests pass.

So in my case, all tests passed locally, and then horribly broke once the server ran the tests again. After a little research, it turned out that when I deleted a now unneeded Python file, I did not remove its compiled cousin, the .pyc file, along with it. Sadly, this module was still imported somewhere else, and because Python still found the .pyc file locally, it did not mind the original .py file being gone, so all tests passed. On the server, however, with a completely clean environment, the file wasn’t found and resulted in the failures of dozens of tests (all of which threw an ImportError).

What’s the lesson? In the short term, I should wipe my .pyc files before running tests. One way to do that would be adding something like

find . -type f -name '*.pyc' | xargs rm

to my ever-growing commit one-liner, but a more general solution might want to perform this inside the test running script. On the other hand, since that script is written in Python, some of the imports that could break have already been performed by the time the script runs.

In general, run your tests on as clean an environment as possible. While any useful test framework will take care of your database having a consistent state for every test run, you also need to ensure that you start with a plane baseline of your code — especially if Hudson, the merciless butler, will rub it in your face if you don’t ;) .



14 Responses to “Don’t Forget to Clean Up After Yourself”

  1. If you’re already using find to look up the files you’d like to delete, there’s no need for xargs:

    find . -type f -name '*.pyc' -exec rm {} \;

    :)

    Oh, and I have lots of these one-(or-more)-liners placed in handy shell scripts in the root directory of my local workspace, for cleaning out a project’s working directory, running the test suite, checking the code into our main repository, and so on.

  2. I’ll echo Jean Pierre’s “don’t use xargs” statement and add “don’t use exec” to it:

    find . -type f -name '*.pyc' -delete

  3. Thanks! Well, at least I don’t earn a useless use of cat award, though I admit I was “wasting” a PID. ;)

  4. find has a -delete option? Unglaublich, man lernt nie aus. ;)

  5. You could safely delete any .pyc files that didn’t have a corresponding .py. I can’t figure out how you might do that, because you can’t exactly do:

    find . -name '*.pyc' -type f -exec test ! -e $(basename {} .pyc).py && rm {} \;

    because the order of execution just doesn’t work like that… but at least in Python code that’d be safe. I feel like py.test actually has an option to do this automatically.

  6. If you need to use `find` you are doing it wrong:

    rm **/*pyc

    and if your shell doesn’t use that… well there’s a reason to switch to zsh :)

  7. For your commit line, in hg I’d recommend writing an hg extension that does what you want, or add a local pre-push hook.

    Now, I have no idea how much of that git would support.

  8. Assuming you have .pyc listed in your .gitignore (and you should – they’re generated files, and shouldn’t be checked in) you should just be able to say “git clean -X”, and it will blow away every file whose name isn’t in the index, and which matches a .gitignore entry.

    On my Python projects, I usually run “git clean -dxf” which blows away absolutely everything that’s not under source-control: files, directories, temp-files, whatever. It’s a bit of a sledge-hammer-walnut approach, so you should take care before running it (hint: replace -f with –dry-run).

  9. Ah, those .pyc files. Yeah, I’ve been bitten by that before.

  10. Dave: I just switched to zsh last weekend — and MAN I’m impressed. How could I ever live without it? :)

  11. Axel: Thanks for the suggestion, I’ll look into it!

    Screwtape: That’s not a bad idea in general, but it’d also blow away my config files, so maybe not ;)

    Also, zsh fans: Fiiiine, I’ll give it a shot :) But I’ll need to translate my .bashrc into a .zshrc first…

  12. I agree with Frédéric that a manual find command, however easily or elegantly it can be written, isn’t the ideal solution here. Eventually you’ll forget about it again.

    I can highly recommend going down the “run your tests on as clean an environment as possible” road. I’ve used the zope.testing test runner a lot in the past, it deletes stale .pyc files before running tests. You might want to consider teaching your test runner to do the same, or use one that already has this feature (Ian mentioned py.test as well).

  13. Fred: start with this zshrc: http://grml.org/zsh/

    And Git has hooks, too: http://progit.org/book/ch7-3.html

  14. For your commit line, in hg I’d recommend writing an hg extension that does what you want, or add a local pre-push hook.

    Now, I have no idea how much of that git would support.