Python Programming, news on the Voidspace Python Projects and all things techie.

Improvements to unittest: new asserts, better failure messages and more to come

emoticon:acrobat PyCon was simply amazing. As with last year I really enjoyed the sprints, and this year I worked with Gregory Smith. We integrated a patch contributed by Google that adds a whole bunch of new asserts for providing more useful failure messages when comparing containers, long strings etc. We wired these into assertEqual so that you don't need to call them directly unless you are using subclasses of the specified types.

The changes and discussion are tracked in issue 2578. These improvements are checked in and will be in the Python 2.7 and Python 3.1 releases.

One improvement already in unittest is to make assertRaises a context manager, and we added assertRaisesRegexp also as a context manager. When you are testing APIs, the error messages are often 'part of the API', and this method allows you test the error messages with a regular expression:

with self.assertRaisesRegexp(TypeError, '^You screwed up$'):

    ThisFunctionTakesAString(None)

Assert methods added, some more useful than others, are:

  • assertRaisesRegexp
  • assertIn / assertNotIn
  • assertIsNone / assertIsNotNone
  • assertIs / assertIsNot
  • assertDictEqual / assertListEqual / assertSetEqual / assertTupleEqual - used automatically by assertEqual
  • assertMultiLineEqual - for comparing long strings, shows a diff on failure
  • assertSequencesEqual - assert two containers have the same members without having to be of the same type
  • assertSameElements - compare containers have the same elements without comparing the ordering
  • assertDictContainsSubset
  • assertGreater / assertGreaterEqual / assertLess / assertLessEqual

Example of the new failure messages:

=====================================================
FAIL: testExample (test.test_unittest.Test_TestCase)
-----------------------------------------------------
Traceback (most recent call last):
  File "test_unittest.py", line 2388, in testExample
    self.assertEqual([1, 2, 3], [1, 2, 0])
AssertionError: First differing element 2:
3
0
- [1, 2, 3]
?        ^

+ [1, 2, 0]
?        ^

Many of these methods we've also implemented in the Resolver Systems test framework, so it's nice to see unittest gain this functionality.

By default the new assertMultiLineEqual is not used for comparing strings (as they can contain binary data in Python 2.X), so you have to call it directly to get the pretty error messages.

As well as the new methods the failIf / failUnless synonyms are now pending deprecation (there actually is a PendingDeprecationWarning, which if enabled warns you that in the next version of Python using this API will raise a warning!). About time! We have also standardised on the singular form of asserts (assertEqual rather than assertEquals); I preferred the plural (assert a equals b over assert a is equal to b) but am happy with consistency.

One of the things that has long annoyed me about unittest is that passing in a custom failure message explaining what the failure means silences the useful default error message that tells you what the objects involved were in the failure. I also fixed this. Unfortunately the standard way round this problem is to craft a custom error message that includes a repr of the objects (for example with assertEqual you want to see the repr of the two objects compared in the event of a failure) it couldn't be made the default behaviour.

Instead a new class attribute longMessage has been added. This defaults to False, but if you set it to True then any explicit custom message you pass will be displayed in addition to the default error message. Changing every assert method to use the new system (including tests of course) took all day, but it was my first real commit to Python so I was happy. Smile

As this is a class attribute you can override it in individual tests by assigning longMessage as an instance attribute to True or False before calling the assert methods. In addition I added useful default error messages to assertTrue and assertFalse (currently the default is AssertionError: None!) and changed assertNotEqual to actually use the inequality operator (which exposed a bug in the new ordered dictionary)...

There are further improvements left to go, some of which are implemented and just need to be committed, and some of which still need implementing. None of them are strictly 'innovations' in that they are all features that have long been provided in other test frameworks, as unittest is in the standard library the real innovations will always happen in these other libraries and unittest can just pull in the best and proven features. One thing we intend to add is simple test discovery; something badly needed.

As these features get applied I'll post new blog entries explaining how to use them.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2009-04-20 13:48:43 | |

Categories: Tags: , ,


Dynamic Languages and Architecture

emoticon:clock I received an interesting question by email from Mark Bloodworth, an Architect Evangelist at Microsoft.

I've been interested in Dynamic Languages for a while now (I blog about Ruby and Python from time to time). I'm presenting at AIC in May about Dynamic Languages and architecture. As part of the preparation, I'm contacting a few people to get their views on where dynamic languages fit best. So, I'd be really interested to hear your thoughts on Dynamic Languages, where you think the sweet spot for their usage is. I'd also be interested to hear what you think about the impact using dynamic languages has.

As is my wont, I wrote far more than was warranted in reply and thought I would post it here for your edification.

Dynamic languages are general purpose languages; Python is used for, and suitable for, many of the same problem domains as C#. There are some problem domains where a statically typed language is more to be preferred than a dynamically typed language - and vice versa - but in the realm of 'general programming' these are the exception rather than the rule.

Some of these domains, like real time programming for example, the requirement is not specifically about the type system (although strict type systems make provability easier) but it is about managing resources. You couldn't use Python for a real-time application because resource allocation and deallocation cannot be strictly controlled (the basic container types grow themselves in memory for example) - but you shouldn't use C# or Java either because garbage collection means that the time taken by any individual operation cannot be specified or controlled.

Note

The decision whether one language or another is always a trade-off of competing factors. I would argue that developers often make a less than optimal choice for flawed reasons, some of which I outline here.

Choosing one programming language over another is about balancing factors, and social factors are always a part of this. Forcing a team to use Python (or C#) even where it would be appropriate would likely turn out to be a mistake...

Here's a try that you will certainly find unsatisfactory. Areas where you should use a dynamically typed language:

  • Any kind of scripting where you need runtime behaviour
  • Including system administration where creating an object oriented application just to move a bunch of files around is silly

Areas where you should use statically typed languages:

  • Problems which are subject to provability
  • Areas of code that you have already profiled and optimised and need to improve beyond the capabilities of a dynamically typed language

The reason many developers prefer dynamically typed languages for general programming, including those who have switched from languages like C++ and Java, do so because they are more flexible and enable simpler solutions. The more I use C# (and I quite like it as a language), the more I become convinced that static typing forces (and therefore encourages) more complex architectures. The simplest thing that could possibly work will usually be implemented in a language with a dynamic type system!

Along with this, dynamically typed languages are typically more concise and expressive - meaning the same concepts can usually be expressed with less code. For those fluent in the use of their tools (whether it be Eclipse of Visual Studio) the code may not take longer to write in a statically typed language - but it certainly takes longer to read. And typically you have to read code more often than you write it.

Readability counts - a slide from a presentation by Guido van Rossum

A few examples off the top of my head:

  • Generics in C# - elegant for sure, but actually a workaround for the problems caused by containers restricted on type. Heterogeneous containers make constructing complex data-structures trivially easy by composing built-in types.
  • Reflection - because of the type restrictions .NET reflection is massively more painful and complex than the beautifully simple introspection capabilities of languages like Python, Ruby and Smalltalk.
  • Dependency Injection and Inversion of Control - sometimes useful in their own right, but often used as a workaround to make testing possible. Not needed for these reasons in dynamically typed, late bound languages where you can override almost all behaviour at runtime for the purposes of testing.
  • Covariance and contravariance - wonderfully complex things to wrap your head around. Not even an issue in a language like Python.
  • Delegates - a work around for not having first class functions.
  • Upcasting and down casting - not needed if you have runtime behaviour.
  • Although C# is growing support for functional programming, and will maybe grow support for metaprogramming, these have been a strong part of the culture of dynamic languages for years and also continue to grow.

I could go on...

However it is the case that when you develop within a system you learn to think within that system. For programmers used to a static type system, they use the type system to think within and reason about the programs they are building. They look with pain on the idea of dynamically typed languages as it takes away part of how they reason about programming. Similarly those used to dynamically typed languages are much more used to thinking in terms of object behaviour rather than types. Having to shoe-horn this into a type system that feels rigid makes it much harder for them to reason about programming. This is why the two 'camps' fail to see eye to eye - they speak different languages and think in different ways. I do think there are objective differences as well, some of which I have already outlined.

Besides this though, many programmers have been taught that static typing is safer and required for programming large systems. They rule out the use of dynamic languages for reasons that are either not recognised by those who program large systems in dynamic languages or are only partly true:

  • Managing large systems in dynamic languages is impossible

    Those who have moved to dynamic languages from statically typed languages are horrified at the idea of managing a large system, with its more complex architectural requirements and less readable code.

  • Without type safety dynamic languages are only suitable for advanced programmers

    It's quite a compliment to call us all advanced programmers, but as the languages tend to be easier to learn and easier to read. This makes them more suitable for new programmers and more powerful for advanced programmers.

  • Without type safety you're more likely to have bugs

    Type safety only catches a very small proportion of all possible bugs, and largely the ones that are easiest to find. If you think that just because a program compiles it doesn't have bugs then perhaps you haven't been a programmer for very long! Yes you can have runtime errors that a compiler would have caught. The minimal amount of testing would catch those (and automated tools like PyLint and PyChecker will also help). This does mean that testing is more important in dynamic languages - but I'm a firm believer that automated testing (and preferably test driven development) is preferable whatever language you are using. As dynamic languages make testing much simpler (see chapter 7 of IronPython in Action which is available free online), if you are a strong believer in testing you will love dynamic languages.

  • Statically typed languages tend to be faster

    Actually this is generally true. The rub is that it is possible to write assembly code programs that run slower than Java, and Python programs that run faster than C#. Performance (in the raw) depends far more on the programmer than it does on the language. In general dynamically typed languages are 'fast enough', and with IronPython moving performance sensitive parts of your application into C# is generally easy. The faster you can produce your first version, the more time you have to spend on optimisation!

    At Resolver Systems we have looked at performance in Resolver One (a large application written almost entirely in IronPython) several times. Every time so far we have got the performance we were aiming for by improving our Python code and haven't yet had to drop down to C#. We will look at performance many times again in the future (in fact we're probably due for another round of optimisation) and maybe we'll have to move some code away from Python - but looking at our algorithms and improving them is always our first step.

    Ruling out a language because you don't think it will be fast enough is a premature optimisation.

  • Tool support is not so good

    In general this just plain isn't true (the idea of the IDE and refactoring tools originate in Smalltalk implementations after all). When people say this I often suspect that what they mean is that they are afraid of moving away from Visual Studio. I wrote a blog entry listing some of the tools for Python in particular: http://ironpython-urls.blogspot.com/2009/03/writing-ironpython-debugger.html

    It is true that because the type system isn't fully known until runtime you can't do some of the same static analysis (although there is an awful lot that a good tool can infer). Martin Fowler, who has seen Thoughtworks increase the amount of projects done in Ruby over the last few years, recently said that programmers within Thoughtworks who move from Java to Ruby usually end up using editors like Textmate, Vim and Emacs (lightweight tools for lightweight languages) and he has never heard of any of them missing the refactoring support.

Anyway - I'm not sure if this is what you were looking for, but it's enough typing for one email. I hope it is helpful or interesting.

Parts of that reply are verging on a rant of course. I think part of what I'm ranting against is the inherent complexity in many modern languages and runtimes. Some of the problems I mention, and some of the ones below, are at least partly (or even completely) orthogonal to the type system - but languages, their type systems, and runtimes they are predominantly used on are often so tightly bound together that it is impossible to fully distinguish them. The issues I see are typified in the mainstream statically typed languages: Java, C#, VB.NET, C++, C. The mainstream dynamic languages typically don't have this class of problem: PHP, Perl, Ruby, Python, Javascript.

Specific problems that I have actually encountered with C# and the .NET framework that can't happen with Python include:

  • The differences between value types and reference types - especially the problems around boxing of value types and mutable value types
  • Uninitialized reference types can be null (which can't happen in Python unless you explicitly assign something to be None)
  • Can't overload on return type (partly why you need Func and Action in .NET 3.5)
  • Can't cast between delegates with the same signature

I would include covariance and contravariance as issues that fall within the same 'inherent complexity' of the .NET system. The problem is that we programmers are prone to loving complexity, we mistake it for power. In fact the opposite is true, conceptually simpler systems tend to be more powerful.

With Python all variables are references, no value types, which removes a whole heap (pardon the pun) of complexity. Of course this is a trade-off - in particular having value types allows for certain optimisations. This means that complexity has been moved onto the programmer for the sake of the compiler. Modern JITs (like the one being explored in PyPy for example) can move the complexity into the runtime, allowing for the same optimisations.

Of course dynamic languages have problems too. A lot of the ones pointed out to me are around issues of programmer discipline. With more flexible languages you can do lots of things that won't work (but you won't be warned until runtime). This is one of the reasons that some .NET programmers have told me they think dynamic languages are only suitable for more advanced programmers. The reason I take issue with that is that one of the trade-offs you are making when you leave a dynamic type system is that you are moving to a system with more inherent complexity. This is not something that can possibly be better for less advanced programmers and can hardly make mistakes less likely!

Of course non-mainstream languages like the functional languages and those with more complicated type systems present whole new fields of problems for the beginner and experienced programmers alike. Steve Yegge talks a lot of sense on the subject in his article Rhinos and Tigers (Static Typing's Paper Tigers).

On the subject of language design, I like Jim Hugunin's [1] quote in his story of Jython:

Guido's sense of the aesthetics of language design is amazing. I've met many fine language designers who could build theoretically beautiful languages that no one would ever use, but Guido is one of those rare people who can build a language that is just slightly less theoretically beautiful but thereby is a joy to write programs in.
[1]Jim also wrote a great foreword to IronPython in Action.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2009-04-19 23:59:10 | |

Categories: Tags: , , ,


Release: Mock 0.5.0 Mocking and Patching Library for Testing

emoticon:eggs One of the exciting things about PyCon was to discover that Mock was much more widely used than I thought. It's being used in Disney, Sourceforge, VMWare (well - at least one developer at VMWare) and BATS Trading. Mock is of course heavily used at Resolver Systems; I originally wrote it to simplify the testing patterns we were using and reduce the proliferation of simple but all slightly different mock objects that we had scattered throughout our test framework.

Mock is a library for the creation of simple mock objects that track how they are used so that you can make assertions. It uses the action -> assertion pattern rather than the record -> replay pattern. Action -> assertion puts your tests after you have used the objects, which seems more natural and means that you can make assertions about only the behavior you are interested in. Mock also contains two decorators (patch and patch_object) which make it easy to safely mock out dependencies in the module under test purely within the scope of the test itself (unpatching is done automatically on exit whether or not the test passes). One of the changes in this release is that these decorators also become context managers allowing them to be used with the 'with statement'.

I've just released Mock 0.5.0. This isn't backwards compatible as it cleans up the API in a few ways, but they're all good changes I promise. Wink

One of the new features is that the Mock class now supports wrapping objects; using the wraps keyword.

One of the other big changes is that the documentation is now built with the ever-wonderful Sphinx, so the homepage is new and there also a PDF of the documentation:

Mock can be installed with:

easy_install mock

The changelog for all changes in this release is:

  • Made DEFAULT part of the public api.
  • Documentation built with Sphinx.
  • side_effect is now called with the same arguments as the mock is called with and if returns a non-DEFAULT value that is automatically set as the mock.return_value.
  • wraps keyword argument used for wrapping objects (and passing calls through to the wrapped object).
  • Mock.reset renamed to Mock.reset_mock, as reset is a common API name.
  • patch / patch_object are now context managers and can be used with with.
  • A new 'create' keyword argument to patch and patch_object that allows them to patch (and unpatch) attributes that don't exist. (Potentially unsafe to use - it can allow you to have tests that pass when they are testing an API that doesn't exist - use at your own risk!)
  • The methods keyword argument to Mock has been removed and merged with spec. The spec argument can now be a list of methods or an object to take the spec from.
  • Nested patches may now be applied in a different order (created mocks passed in the opposite order). This is actually a bugfix.
  • patch and patch_object now take a spec keyword argument. If spec is passed in as 'True' then the Mock created will take the object it is replacing as its spec object. If the object being replaced is a class, then the return value for the mock will also use the class as a spec.
  • A Mock created without a spec will not attempt to mock any magic methods / attributes (they will raise an AttributeError instead).

Many thanks to all those who gave feedback, feature requests and patches!

What isn't in 0.5.0 is support for mocking magic methods. I do have a technique in mind for this, which I implemented for the container methods. It is very clean, but different from the pattern used to mock out other methods. As I'm not currently using it I'm going to wait until I need it and see if it works well in practise.

If you're interested in trying it, the code (with full documentation) in a 'magics branch':

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2009-04-19 21:43:19 | |

Categories: , Tags: , ,


PyCon Videos: IronPython Tutorial

emoticon:computer The last videos from PyCon 2009 are going up, which means videos from the tutorials. This includes the three hour IronPython tutorial taken by Jonathan Hartley and me:

The tutorial was great fun, exploring different aspects of programming with .NET and IronPython by creating a Windows Forms Twitter client called Stutter.

I haven't watched the videos, so I have no idea what they are like, but apparently the first five minutes of the first part has no audio.

Of course if you are interested in programming with IronPython, then the very best tutorial and reference is IronPython in Action. Smile

Once I've done a new release of Mock, finished some work I've started on the standard library unittest module, and got one or two other items off my list, I'll turn the IronPython tutorial into a series of articles. There is a lot of good material in the tutorial. Topics we covered in "Application Development with IronPython" included:

  • Differences between IronPython and CPython, including "Why Use IronPython?"
  • Introduction to the .NET framework - a dynamic language on a statically typed framework
  • GUIs with Windows Forms
  • Databases
  • Network requests and web services (the Twitter REST API)
  • Handling XML
  • Threading

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2009-04-18 14:37:24 | |

Categories: , Tags: , , ,


Hosted by Webfaction

Counter...