Python Programming, news on the Voidspace Python Projects and all things techie.
Parametrized Tests and unittest
Yet another blog entry on unittest; this is the last one in my list so I'm not planning any more for a while. Something that both nose and py.test provide that unittest (the  Python standard library testing framework) doesn't is a builtin mechanism for writing parametrized tests. The technique that both nose and py.test use (currently anyway) is to allow your test methods to be generators that return a series of tests. The testing framework then runs all these tests for you.
Whenever I've needed to run a series of similar tests with different input parameters I've always used a simple loop; something like:
for x in range(100):
for y in range(100):
The problem with this approach is that as soon as you have a failure for any of the x, y value combinations the test will stop running. In some circumstances it would be much better for all the tests to run and have all the failures reported instead of just the first.
With nose, you could write the above test like this:
for x in range(100):
for y in range(100):
yield self.assertSomethingForXandY, x, y
Nose would detect that the test is a generator and collect the functions (along with their arguments) that it yields and run them independently. The disadvantage of this approach is that you can't know up front how many tests you have (and indeed it could change every time you run the tests) and neither are they isolated from each other (they share the fixture).
Although unittest doesn't include an equivalent it is easy to achieve the same thing and there are several possible approaches. The two I've come up with, prompted by another discussion on the Testing in Python mailing list and with Brandon Craig Rhodes, are available as params.py from my unittest-ext sandbox project (where I tinker with unittest related stuff from time to time). (Note - after showing you two possible approaches I'll show you some better ways that other people have found for solving the same problem.)
The first uses a metaclass in concert with a decorator. You decorate methods with a list of dictionaries - for every dictionary in the list the method will be called with the parameters from the dictionary. (It'll be easier to understand when I show you some code using it.) The metaclass examines all decorated methods at class creation time and adds new test methods to the class.
from types import FunctionType
def __new__(meta, class_name, bases, attrs):
for name, item in attrs.items():
if not isinstance(item, FunctionType):
params = getattr(item, 'params', None)
if params is None:
for index, args in enumerate(params):
def test(self, args=args, name=name):
assertMethod = getattr(self, name)
test.__doc__ = """%s with args: %s""" % (name, args)
test_name = 'test_%s_%s' % (name, index + 1)
test.__name__ = test_name
if test_name in attrs:
raise Exception('Test class %s already has a method called: %s' %
attrs[test_name] = test
return type.__new__(meta, class_name, bases, attrs)
func.params = params
__metaclass__ = Paramaterizer
You don't need to use the metaclass directly, instead subclass TestCaseWithParams:
@with_params([dict(a=1, b=2), dict(a=3, b=3), dict(a=5, b=4)])
def assertEqualWithParams(self, a, b):
@with_params([dict(a=1, b=0), dict(a=3, b=2)])
def assertZeroDivisionWithParams(self, a, b):
self.assertRaises(ZeroDivisionError, lambda: a/b)
The disadvantage of this approach is that you have to know (or calculate) all of the parameters at class creation time instead of when the test runs. The advantage is that the number of tests is known ahead of running the tests - so countTestCases on the TestSuite works as normal and each failure is recorded individually.
Another approach is to use the same generator technique as nose / py.test with a decorator that runs all the tests yielded by the generator.
failures = 
errors = 
for test, args in func(self):
except self.failureException, e:
failures.append((test.__name__, args, e))
# using sys.exc_info means we also catch string exceptions
e = sys.exc_info()
errors.append((test.__name__, args, e))
msg = '\n'.join('%s%s: %s: %s' % (name, args, e.__class__.__name__, e) for (name, args, e) in failures + errors)
for a, b in ((1, 2), (3, 3), (5, 4)):
yield self.assertEqual, (a, b)
yield raises, ()
This is a bit less 'heavy' than using a metaclass. Decorated tests are all run to completion. If any test fails or errors then an appropriate failure is raised - with the message listing all the failures. It has the advantage of allowing tests to be created at test execution time, but the disadvantage of all failures only counting as a single failure. The total number of tests counted will only count generative tests as a single test. If you run the code above you'll see how errors are reported and it is ok (could do better - must try harder). It is also easy to use with any unittest based test framework.
Of course other people have come up with better ideas - which I may evaluate for integrating into unittest. They do still suffer from the problem of non-deterministic number of tests (breaking the countTestCases part of the unittest protocol) but this is unavoidable with this feature.
Konrad Delong posted one solution to his blog: Reporting assertions as a way for test parameterization. The code is here. He uses a decorator to collect the failures / errors and modifies TestCase.run to be aware of them. I like this technique.
Robert Collins has a different solution, which at the heart uses a similar technique but is more general and powerful. This is his testscenarios project. (Every time I try to actually find the code on a launchpad project I go round in circles for a while first. Anyway - it's here.) The project is described thusly:
testscenarios provides clean dependency injection for python unittest style tests. This can be used for interface testing (testing many implementations via a single test suite) or for classic dependency injection (provide tests with dependencies externally to the test code itself, allowing easy testing in different situations).
Instead of just individual tests it allows you to parameterize whole test cases - so you can do 'interface' testing where you swap out the backend implementation and check that all tests pass for various different backends.
The basic nose / py.test technique for generator tests is a dirty hack. They introspect the test method code objects to see which of them are generators. Holger Krekel, core developer of py.test, also thinks that they offer little real advantage over loops and is looking to replace them in py.test with a more powerful system. This uses pytest_generate_tests and he describes it in: Parametrizing Python tests, generalized.
This new system is more powerful, but it seems to make the simple cases more difficult. If Holger is right in that a generalized mechanism that only caters for the simple cases doesn't really have much advantage then this new system may indeed be a winner.
|||Yes Zeth the testing framework. doctest is for testing documentation and makes an awful unit testing tool, especially for test first as practised in test driven development (TDD). Of course not everyone shares my opinions on this matter.|
Gadgets: Samsung SSD, Sharkoon SATA Adaptor, Mimo USB Monitor and Powermate USB Volume Knob(!)
Over the last few months I've bought a few new gadgets, and they're well overdue a review; so here goes.
As I'm sure you're aware Solid State Drives are hard drives using flash memory instead of mechanical disks; this eliminates the need for spin up, plus makes seek times and data rates potentially much faster and power consumption less. I wanted this for my Apple Macbook Pro, which only had a 120gig hard drive. Advantages for me would be a bigger hard drive, a faster hard drive, and through less heat / power a longer battery life as well.
Fitting it was a royal pain in the *ss. I followed the instructions from this article: Upgrade Your MacBook Pro's Hard Drive. They're pretty good, the only place I deviated from them was that once I got inside my Mac the bluetooth module wasn't on top of the hard drive I was removing. This was a good thing.
The hardest part was levering the keyboard top panel from off the innards. This really didn't want to come off, and it is attached by a ribbon cable to the motherboard so you can't be too violent in your attempts to pry it free. It came eventually. Scraping the ribbon cable that is glued to the top of the existing drive free is also slightly nerve-wracking.
Choosing an SSD is almost as painful as fitting. The current crop of drives are the first that are within the realms of affordable (although still expensive), but many of them suffer from real performance issue once you have written a certain amount of data (random write access becomes far slower than even normal hard drives). This AnandTech Article is essential reading on the subject. It was written before the PB22 came out, and the conclusion it came to is that only the OCZ Vertex and the Intel X25-M are worth having. From what I've read the PB22 doesn't suffer the same problems that plague the earlier drives and it is cheaper than both the Vertex and the X25-M so I decided to go for it.
And as for performance, well. XBench reported (results here) an average of 3x faster than a standard Macbook Pro on all the drive benchmarks. The difference in general is noticeable but perhaps not overwhelming. The most striking change was in launching Microsoft Office for the Mac; it launched in about 3 seconds instead of 12! The disappointing thing is that starting my Windows VM (VMWare Fusion) is not much faster, although shutting it down is (which was already pretty fast). Even worse, booting my Mac up (something I don't do very often) - if you include the fifteen to twenty second freeze on start which arrived with the new drive - took about the same amount of time.
In the end, whilst trying to fix a different problem with another of my new gadgets I reset the PRAM on my Macbook, which fixed it! Now on the once a month occasion I restart my laptop it will happen really quickly. Overall the biggest difference that fitting the SSD made was that I now have a bigger hard drive. Everything is faster but possibly not enough to make it worth the cost, it seems that other than Word most of the apps I start are network or CPU bound. The downside is that after investing in the SSD I probably have to wait another couple of years before I replace my Macbook.
When I ordered the SSD I also ordered a 2.5" SATA adaptor to go with it. I asked the salesman if the adaptor would work with the SSD and he did suggest that buying an SSD and then using it through a USB adaptor didn't sound that sensible. Actually I wanted the adaptor to clone the internal drive of my Mac onto the SSD before fitting it. The nice thing about the Sharkoon is that it has connectors for SATA drives and 2.5" / 3.5" IDE drives. Like many geeks I often have random hard drives lying around and this will allow me to use them. It worked fine (without needing a driver) on Mac OS X, despite not advertising Mac compatibility. It even comes with some funky rubber sheaths for attached drives if you want to leave one connected for anything other than a short period of time.
To clone the internal drive in the laptop onto the SSD I used Carbon Copy Cloner. Cloning a 120gig drive (CCC claimed it would do a block level clone but actually did a file level clone) took hours. It was slightly worrying to see the occupied size of the new drive was about 200meg less than the original - but I imagine this is a consequence of smaller blocks on the SSD and CCC doing a file level clone. Anyway it worked fine.
Mimo monitors make a range of 800x480 pixel USB monitors. I wanted the 740 touchscreen monitor for a home media server project. The 740 was out of stock so I ended up with the 710 and the media server project got shelved (I ditched wireless for my main computer as it was sporadically unreliable and with a wired connection to the desktop no need for a separate server).
The monitor is a fantastic second monitor for my laptop but I only use it when I have a power source. Rather than see it unused I have it attached to my desktop (technically my sixth monitor) showing my twitter stream via Tweetdeck.
This photo shows the Mimo and the Powermate volume control (see below).
It turned out to be an irresistible but expensive toy, quelle surprise. Definitely useful though and in constant use, so it's fared better than some of the expensive toys I've bought in the past (Nintendo DS I'm talking to you).
Unfortunately there is a problem with the displaylink driver and the Mac OS X 10.5.7 update. Some details of the problem here and more here (apparently it is a known issue with 10.5.7 and not the fault of the driver). Uninstalling and re-installing the driver worked for me, but sometimes the display doesn't work if I restart my laptop with it plugged in (remembering to unplug it before restarting does the trick).
This was another toy. Whenever I am at my computer I almost inevitably have a movie playing and this expensive little knob is a volume control. It has much more granularity than using the keyboard to control the volume and I find it surprisingly useful. You can configure different behaviour (e.g. scrolling) for different applications, but I just use it as a volume control.
In my last post I mentioned my fuzzywuzzy beard not once but twice. Here's a great picture of me and my fuzzywuzzy beard drawn by Scott Meyer, the creator of the Basic Instructions webcomic.
You can order your own custom avatar for $10.
Future adventures of unittest and a mini-rant
There is a general rule that innovation doesn't happen in the standard library. Instead modules or techniques that have already proven themselves in the Python community are adopted into the standard library. This is exemplified in the standard library testing framework, unittest, which until recently stagnated whilst frameworks like nose and py.test pushed forward the boundaries of testing in Python. Features like test discovery and cleanup functions that have been brought into unittest first appeared in other testing frameworks.
The next releases of Python (2.7 / 3.1 / 3.2) will see but some great improvements to unittest but it is still far from being perfect. In the spirit of lists of things I don't like about something I do like (and before I rant) here are what I think are the main problems (or the main perceived problems) with unittest:
- It is a single monolithic file, it should be a package
- Hard to extend / write plugins
- People want to write functions not classes - unnecessary boilerplate
- No class (or module level) setUp and tearDown
- No standard mechanism for parameterized (generative) tests
Let's dig into these and see what we can do about them:
Currently unittest.py is 1760 lines of Python and test_unittest.py is 3699 lines. Frankly that's horrible and it makes unittest hard to understand and hard to maintain. Benjamin Peterson (the current Python release manager) has said he will split unittest into a package after Python 3.1 is released. If he doesn't have time for it then I'll do it.
Once you are familiar with the responsibilities of the various moving parts in unittest (TestCase, TestRunner, TestLoader, TestSuite and TestResult) it is pretty easy to extend. Unfortunately it is difficult to extend so that other people can reuse what you have done. If you write a TestResult that writes colorized output to the console and I write one that pushes results to a database then the chances are that a new project will have to choose one or the other and can't use both. It isn't impossible, and there are projects that do it very well, but there is no standard plugin mechanism or culture of sharing extensions. I'd like to look at whether a plugin system with some compatibility with the nose / py.test plugins is plausible or just a pipe dream.
Haha - I shake my fuzzywuzzy beard at you in bewilderment. Do you people dislike OOP, the class statement is mere boilerplate to you, I mumble incoherent French obscenities in your general direction. (Did you know the French acronym for object-oriented programming is POO ?). I find grouping tests by class very useful. Although nose and py.test allow you to organise tests as module level functions most people I know still use classes to group tests. In fact unittest does provide a way for you to write test functions rather than classes - but I'm not telling you what it is.
This is a double edged sword. For expensive fixtures (like big databases) it is a slow pain to have to recreate them for every test (in setUp). What you think you want is to be able to have a class or module level setUp where it is done once and shared between tests. Nose and py.test give you what you think you want (which in general is a good policy I guess), but this does violate test isolation. When unittest runs tests it instantiates the TestCase separately for every test it runs; every test is run in a fresh instance unsullied by previous tests. You can already work round this by creating class attributes instead of instance attributes of course. The Twisted test framework (built on unittest) used to provide for shared fixtures with setUpClass / tearDownClass. When this was discussed recently on Testing in Python, people had this to say of them:
Twisted added setUpClass and tearDownClass to Trial and they have caused us nothing but grief. To be fair, they were added before classmethod was added to Python, which caused much of the pain.
Andrew Bennetts: I agree with Jonathan here. Twisted's setUpClass/tearDownClass were terrible, for the reasons he gave.
They both recommended instead the testresources library for shared resources with unittest. Other than being GPL this looks like a useful library. At some point I'll investigate this and consider how shared fixtures might be usefully added to unittest.
- And this topic I leave for a blog entry all of its own...
Now for the rant... I have a lot of admiration for nose and py.test. They have helped popularise Python testing and brought many new and interesting ideas. They haven't had features compelling enough to make me jump from unittest (and until recently IronPython compatibility has been an issue for much of what I've done) but I can understand why many new projects use them.
Something that p*isses me off about them though is the way that their evangelists extoll their virtues by denigrating unittest, and in ways that I think are bizarre. If the library / framework that you like so much so good then let it stand on its own two feet and not denigrate alternatives. The latest person to do this, and so raise my ire, was the otherwise sound-and-sensible Brandon Craig Rhodes in the first part of his Python testing frameworks article on the IBM developerworks site. Of a short unittest example he says:
Look at all of the scaffolding that was necessary to support two actual lines of test code! First, this code requires an import statement that is completely irrelevant to the code under test, since the tests themselves simply ignore the module and use only built-in Python values like True and False. Furthermore, a class is created that does not support or enhance the tests, since they do not actually use their self argument for anything. And, finally, it requires two lines of boilerplate down at the bottom, so that this particular test can be run by itself from the command line.
The scaffolding that he is talking about is two lines at the start of the test module and two lines at the end. One of those lines is the import of unittest (irrelevant??) and the other is the class definition. As I mentioned, most serious users of nose / py.test still write class structured tests and any serious testing module is going to import a whole lot of stuff - including objects from your testing framework. This criticism seems vacuous and unrepresentative of any serious testing environment (four lines may be a lot when your whole testing code is less than ten lines - but in the real world this is less than a non-issue). Unfortunately many of the criticisms of unittest in articles on alternative testing frameworks seem to state this as one of the most important advantages of switching...
Brandon also says later of the assert methods "First, calling a method hurts readability". This I can understand but just plain disagree with I guess and I don't think unittest would be improved by providing a host of assert functions to import rather than having them as methods on TestCase (see my previous comments on OOP and fuzzywuzzy beards). I'm also dubious of the heavy magic done by nose / py.test to support useful error messages when using plain asserts. This magic has portability implications to other implementations like Jython and IronPython.
The other two (whole) lines of boiler-plate that Brandon bemoans are the lines necessary to make a test module executable on its own:
if __name__ == '__main__': unittest.main()
This is true, up until Python 2.6 these two (whole) lines are needed. However, test discovery and better command line options have been added to unittest in Python 2.7. If we're going to be precise about the matter though, you may need these two lines in your test module under unittest - but in a fresh install of Python the whole nose module itself becomes unnecessary boilerplate compared to unittest...
I've exchanged emails with Brandon about this, and he suggested we both blog about it - so I eagerly await his response.
This work is licensed under a Creative Commons Attribution-Share Alike 2.0 License.