Python Programming, news on the Voidspace Python Projects and all things techie.

tox: Testing projects with multiple versions of Python

emoticon:pda mock is tested on Python 2.4 upwards. Running all the tests is done with unittest2 test discovery, so up until now I've had several scripts for running the tests with different versions of Python. Not difficult, but a pain.

Enter a new project by the prodigious Holger Krekel:

To use tox you create a simple config file tox.ini specifying the test command and Python versions and tox will create a virtualenv for each Python version and run your tests in it.

I can now run all the mock tests for Python 2.4 - 2.7 on both Mac OS X and Windows with a single command (a single command on each platform - although tox does have Hudson support for continuous integration servers).

As well as running tests tox can execute arbitrary commands. This means it can run sphinx commands. For mock I have tox execute two Sphinx commands to build the documentation and run all the doctests (only for Python 2.6 & 2.7 as some of the doctests use the with statement). This way the tox run fails if there are any errors in the documentation, either reStructured Text errors or doctest failures.

A successful run looks something like this (output from Windows shown - with lots of it snipped):

C:\compile\mock
> tox
_________________________________ [tox sdist] _________________________________
[TOX] ***creating sdist package
[TOX] C:\compile\mock$ C:\Python26\python.exe setup.py sdist --formats=zip --dis
t-dir .tox\dist >.tox\log\0.log
[TOX] ***copying new sdistfile to 'C:\\Users\\michael\\.tox\\distshare\\mock-0.7
.0.zip'
_____________________________ [tox testenv:py24] ______________________________
[TOX] ***reusing existing matching virtualenv py24
[TOX] C:\compile\mock\.tox$ py24\Scripts\pip.exe install dist\mock-0.7.0.zip --d
ownload-cache=C:\compile\mock\.tox\_download >py24\log\3.log
[TOX] C:\compile\mock$ .tox\py24\Scripts\unit2 discover
..............................................................s.................
...s
----------------------------------------------------------------------
Ran 84 tests in 0.109s

...

No builder selected, using default: html
building [html]: targets for 0 source files that are out of date
updating environment: 0 added, 0 changed, 0 removed
looking for now-outdated files... none found
no targets are out of date.
________________________________ [tox summary] ________________________________
[TOX] py24: commands succeeded
[TOX] py25: commands succeeded
[TOX] py26: commands succeeded
[TOX] py27: commands succeeded
[TOX] congratulation :)

C:\compile\mock

Notice how the output from tox shows us exactly which commands are being executed. The test execution is done with test discovery using the command .tox\py24\Scripts\unit2 discover.

Here's the tox.ini for the mock module:

[tox]
envlist = py24,py25,py26,py27

[testenv]
deps=unittest2
commands=unit2 discover []

[testenv:py26]
commands=
    unit2 discover []
    sphinx-build -b doctest docs html
    sphinx-build docs html
deps =
    unittest2
    sphinx

[testenv:py27]
commands=
    unit2 discover []
    sphinx-build -b doctest docs html
    sphinx-build docs html
deps =
    unittest2
    sphinx

This tells tox that when the tox command is run it is to create four different virtual environments with the four different versions of Python (so all these versions of Python need to be installed). For Python 2.6 and 2.7 there are custom command sets and Python 2.4 and 2.5 use the default [testenv] block.

Note

Setting the PIP_DOWNLOAD_CACHE environment variable to a valid directory allows pip to reuse downloaded packages when it creates and populates virtual environments. It will reuse virtual environments anyway, but keeping the cache around can be a good way of avoiding unnecessary network traffic for package downloads.

The dependencies for Python 2.4 & 2.5 are just unittest2, for 2.6 & 2.7 Sphinx is also a dependency. The test command is, as we've seen, with unit2 discover. The [] after the command allows extra command line options to be passed through from tox to the test runner. For example we could use this to modify the parameters for test discovery:

tox -- -p test_signal\*

Although tox is in its early days its already useful for me to be able to run tests on four versions of Python, including testing the Sphinx documentation builds correctly and the code examples work, with a single command.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2010-07-13 00:21:09 | |

Categories: , Tags: , ,


ContextDecorator: creating APIs that work as decorators and context managers

emoticon:cyberpunk Two of the best additions to Python in recent years are the with statement and decorators. Both context managers (objects used in with statements) and decorators can be used for similar purposes: performing an action before and after executing the decorated function or the code inside the with block. In fact I now find that many places I used to use decorators I now prefer the with statement (if I'm lucky enough to be able to ignore Python 2.4 compatibility).

If you're a library or framework creator then it is nice to be able to create APIs that can be used either as decorators or context managers. The patch decorators in mock behave like this, and when I was writing a new variant (patch.dict) I found myself having to figure out again how to do it. It isn't hard, but it's a bit fiddly. Nor is this an uncommon pattern, both py.test and Django have code that behaves like this.

I've written a very simple utility class that does this, called ContextDecorator, and it is now part of contextlib in Python 3.2.

Context managers inheriting from ContextDecorator have to implement __enter__ and __exit__ as normal. __exit__ retains its optional exception handling even when used as a decorator.

Even better contextlib.contextmanager, which is a decorator for writing context managers as functions, uses ContextDecorator so the context managers it creates can automatically be used as decorators as well.

I've put both ContextDecorator and the new contextmanager into a package on PyPI, and it works with all versions of Python from 2.4 - 3.1.

Example:

from contextdecorator import ContextDecorator

class mycontext(ContextDecorator):
   def __enter__(self):
      print 'Starting'
      return self

   def __exit__(self, *exc):
      print 'Finishing'
      return False
>>> @mycontext()
... def function():
...    print 'The bit in the middle'
...
>>> function()
Starting
The bit in the middle
Finishing

>>> with mycontext():
...    print 'The bit in the middle'
...
Starting
The bit in the middle
Finishing

Existing context managers that already have a base class can be extended by using ContextDecorator as a mixin class:

from contextdecorator import ContextDecorator

class mycontext(ContextBaseClass, ContextDecorator):
   def __enter__(self):
      return self

   def __exit__(self, *exc):
      return False

contextdecorator also contains an implementation of contextlib.contextmanager that uses ContextDecorator. The context managers it creates can be used as decorators as well as in with statements.

from contextdecorator import contextmanager

@contextmanager
def mycontext(*args):
   print 'Started'
   try:
     # decorated function or with
     # statement executed here
      yield
   finally:
      # exception handling here
      print 'Finished!'
>>> @mycontext('some', 'args')
... def function():
...    print 'In the middle'
...
Started
In the middle
Finished!


>>> with mycontext('some', 'args'):
...    print 'In the middle'
...
Started
In the middle
Finished!

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2010-07-01 00:46:45 | |

Categories: , , Tags: , , ,


Unicode and new style string formatting

emoticon:halt Python 2.6 and Python 3 gain a new style of string formatting, which is apparently based on the string formatting in C#. I wasn't a big fan of the string formatting in C# and so wasn't very excited about it moving into Python, but as is to be expected it has grown a bit on me.

The 'old-style' string formatting in Python is based on the % operator. In Python the % operator is the modulo operator, so strings have a __mod__ method that implements the string formatting:

>>> some_string = '%s: calls str. %r: calls repr.'
>>> some_string % ('foo', object())
'foo: calls str. <object object at 0x3284a0>: calls repr.'
>>> some_string.__mod__(('foo', object()))
'foo: calls str. <object object at 0x3284e8>: calls repr.'
>>>

In Python 2.6 and 3 strings grow a new format method as well as the modoulo operator:

>>> "The sum of 1 + 2 is {0}".format(1+2)
'The sum of 1 + 2 is 3'

In Python 2.7 and 3.2 you can use empty braces where you are formatting with a sequence. This makes basic string formatting operations as simple as the old equivalent:

>>> '{} {} {}'.format(1, 2, 3)
'1 2 3'

The new style formatting was implemented by Eric Smith. He did a talk on it at PyCon 2010: Advanced String Formatting (video). Without taking anything away from what Eric has achieved, I kind of agree with Maciej Fijalkowski who said I could make that talk in two minutes: Advanced string formatting, don't. Smile

There is unfortunately an issue with the implementation of the new string formatting (with an open issue) that makes it unsuitable for use in some situations.

With the old string formatting the normal Python rules for coercion to Unicode are obeyed. If the string is a byte-string but any of the format arguments are Unicode then the bytestring will be implicitly decoded to Unicode and a Unicode string returned:

>>> 'foo %s baz' % (u'bar',)
u'foo bar baz'

With the new style formatting str.format(...) always returns a byte-string, so if any of the arguments are Unicode strings they will be implicitly encoded:

>>> 'foo {0} baz'.format(u'bar')
'foo bar baz'

In Python 2.X the encoding used for these implicit encodes / decodes is ascii, so non-ascii characters in a string can cause a UnicodeEncodeError or UnicodeDecodeError. As always, the best solution is to not mix Unicode and byte-strings but to keep all strings in Unicode and only perform the encode when actually needed.

So why does this behaviour matter? Well it particularly matters for framework authors formatting messages based on 'user' input. This is the case with unittest, which creates error messages when tests fail. The error messages internally in unittest are byte-strings and they are often mixed with user supplied messages using string formatting. We use old-style (% based) formatting, so if the user supplies byte-strings then the resulting messages will be byte-strings. If the user supplies Unicode strings then the resulting messages will be in Unicode. Because all the internal unittest messages are ascii only we can guarantee than an implicit decode to Unicode will succeed - so the user can choose the output type by varying the type of the messages they provide.

If we switched to using new-style formatting then we would have to choose Unicode or byte-strings: and the user supplied input would have to be safe to either decode or encode with ascii. So using the old style formatting allows the user to choose the string type of messages and puts no requirements on them. If we used the new style formatting then we would have to choose and the burden of making sure the messages don't raise Unicode related exceptions is on the user. The other option would be for us to check the type of the user messages and do the conversion of unittest messages internally. This would complicate the code for very little benefit.

This example shows the difference in practise when you format a byte-string with unicode arguments:

>>> value = u'\u00a3'
>>> 'foo bar %s' % (value,)
u'foo bar \xa3'
>>>
>>> 'foo bar {0}'.format(value)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 0: ordinal not in range(128)
>>>

If value from the examples above come from a user call into an API then .format(...) creates a more restrictive API.

Note that % formatting is still in Python 3 and is not yet deprecated. This will happen eventually I expect.

Of course in Python 3, where all strings are Unicode, this particular problem disappears entirely. Another reason to herald the bright new day that Python 3 is ushering in...

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2010-04-10 18:10:46 | |

Categories: , Tags: , ,


Exception Handling Code for Python 2 and 3

emoticon:mirrormask The right way to maintain a library for both Python 2 and 3 is to run your tests with Python 2.6 with Python 3 warnings switched on. This doesn't mean that you have to make Python 2.6 your minimum supported version of Python, but it will warn you where you are doing things that either won't work or will have different behaviour in Python 3. For example:

$ python -3
Python 2.6.4 (r264:75821M, Oct 27 2009, 19:48:32)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 1 > 'a'
__main__:1: DeprecationWarning: comparing unequal types not supported in 3.x
False
>>>

Once you have done this you can use 2to3 to convert your codebase. Run your tests again and find any problems. The idea is to fix them in your Python 2 codebase (possibly resorting to compatibility layers with separate libraries for things like string / bytes IO) so that 2to3 is fully able to produce a working Python 3 version of your code. Using distribute you can even have 2to3 run automatically when your package is installed on Python 3.

That is the right way. For smaller modules it is possible, but sometimes not fun, to keep a single codebase that runs fine with both Python 2 and Python 3. The one module I maintain like this is discover, a backport of the new unittest test discovery [1]. There are various tricks to getting around the slightly different syntax and semantics between Python 2 and Python 3. One of these is handling exceptions.

For Python 2.5 and earlier you define your try..except blocks thusly:

try:
    do_something()
except AttributeError, e:
    handle_this(e)
except TypeError, e:
    handle_that(e)
else:
    finish()

For Python 3, and also Python 2.6 if you don't mind being incompatible with earlier versions of Python, you do:

try:
    do_something()
except AttributeError as e:
    handle_this(e)
except TypeError as e:
    handle_that(e)
else:
    finish()

So you can't write exception handling code that will work with both Python 2.5 and 3.X using these constructs. Instead you can use the following nasty trick:

import sys

try:
    do_something()
except:
    ExceptionClass, e = sys.exc_info()[:2]

    if ExceptionClass is AttributeError:
        handle_this(e)
    elif ExceptionClass is TypeError:
        handle_that(e)
    else:
        raise
else:
    finish()

Not very pretty, and don't forget to fix it as soon as you drop support for Python 2.5, but it works fine.

[1]But don't use discover, use unittest2 instead. I have had a report that only one line in Mock needs to be changed for it work with Python 3, but I haven't got round to running the tests yet.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2010-03-21 01:27:33 | |

Categories: , Tags: , ,


A Rambling Recording on Member Lookup in Python (podcast)

emoticon:newspaper I was thinking about the Python object model, in part as a result of my post on The Python Class Statement. Python is a really easy language to learn, but it also has advanced features like its protocols, descriptors and metaclasses, that make the full object model pretty complex - and that's before you start looking at the corner cases.

It would be really nice to write up a single document describing the Python object model, including all of its intricacies. That sounds too much like hard work, so instead I recorded a rambling hand-wavy description of member lookup in Python. I don't go into full blown detail, but then this is a podcast - it won't seriously mislead you and no-one is going to use it as a reference guide...

This was recorded using the Blue Fire iPhone app whilst I was wandering around outside. I chopped out about half my pauses and coughing using Audacity, so if you think the quality is rough you should have heard the first version. Wink

Topics covered include:

  • Member lookup on instances and classes
  • How the interpreter looks up protocol ('magic') methods
  • __getattr__ and its mysterious cousin __getattribute__
  • Descriptors, bound methods, properties and friends

In the podcast I mention the new technique I have for dynamically mocking magic methods. Magic methods, when they are called for you by the interpreter, are usually looked up directly on the class. Unfortunately Python is not entirely consistent, some magic methods are still looked up on the instance first before the class. This is gradually being fixed in Python (in 2.7 they pretty much all fixed), but the inconsistency is a pain for mocking the magic methods.

Mock now allows you to mock the magic methods by assigning an appropriate function, that takes self as the first argument, to the magic method on the mock instance. By default mocks do not have the magic methods implemented except the ones it uses itself. When you assign to them it dynamically grows them on just that instance - all other mock instances are unaffected. Magic methods can then be looked up on the class or the instance, either way works (and you can delete them):

>>> from mock import Mock
>>> m = Mock()
>>> m
<mock.Mock object at 0x429770>
>>> m.__repr__ = lambda self: 'A Mock Object'
>>> m
A Mock Object
>>> m.__repr__()
'A Mock Object'
>>> del m.__repr__
>>> m
<mock.Mock object at 0x429770>

You can also use Mocks for magic methods. Here's an example of mocking out the built-in open function when used as a context manager:

@patch('__builtin__.open')
def test_with_statement(self, mock_open):
    mock_open.__enter__ = Mock()
    mock_open.__exit__ = Mock()
    mock_open.__exit__.return_value = False

    with open('filename') as handle:
        handle.read()

    mock_open.assert_called_with('filename')
    mock.__enter__.assert_called_with()
    mock.__enter__.return_value.read.assert_called_with()
    mock.__exit__.assert_called_with(None, None, None)

The version of mock with magic method support hasn't yet been released, but you can pull it from the google code SVN repo. When I have time to write docs it will be released as 0.7.0.

There's a bit of trickery involved in making this work. If you're interested in how it's done look at the implementation of __new__ and __setattr__.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2010-01-10 20:25:22 | |

Categories: , , Tags: , , ,


Notes on the Python Class Statement

emoticon:fish Python classes are created at runtime, usually when you execute a script, or import the module they are defined in. Class creation is done primarily with the class statement. The class statement is executed by the Python runtime to create the class. Functions and names assigned in the body of the class statement become methods and attributes of the class.

You can easily see that the code inside the body of the class is executed, and that it can contain arbitrary code, by putting a print statement inside the class body:

>>> class ClassName(object):
...     print 'hello world'
...
hello world

Any assignments that happen in the body of the class definition create class members. Class and function definitions both cause names to be assigned, so classes defined inside the body of another class statement can be accessed as class attributes and functions defined inside the body of a class become methods.

Here's a trivial example with simply assigning a value to the name X:

>>> class SomeClass(object):
...     X = 3
...
>>> SomeClass.X
3

We can combine the fact that arbitrary code is executed with the assignment rule to conditionally define class members:

>>> import sys
>>> class SomeClass(object):
...     if sys.platform == 'darwin':
...         X = 3
...     else:
...         X = 4
...
>>> SomeClass.X
3

What happens in class creation (in Python 2 - the rules change slightly in Python 3 as the metaclass mechanism is improved) is that the class body is executed, the collection of names and values are passed as a dictionary (along with the class name and a tuple of the base classes) to the metaclass which is 'called' (if the metaclass is a type - which it usually is - the metaclass is instantiated) and the resulting class object is assigned to the name in the scope in which it was defined. The resulting class is an object like everything else in Python. Unless the class uses __slots__ the dictionary of members becomes the class __dict__. This dictionary is protected by being wrapped in a dictproxy. Although you can fetch members directly from the dictproxy you can't directly assign or delete members, instead you have to go through the normal attribute setting / deleting mechanisms:

>>> class SomeClass(object):
...  X = 3
...
>>> SomeClass.__dict__
<dictproxy object at 0x50b7b0>
>>> SomeClass.__dict__.keys()
['__dict__', 'X', '__module__', '__weakref__', '__doc__']
>>> SomeClass.__dict__['X']
3
>>> SomeClass.__dict__['Y'] = 4
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'dictproxy' object does not support item assignment
>>> del SomeClass.__dict__['X']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'dictproxy' object does not support item deletion
>>> SomeClass.Y = 4
>>> del SomeClass.X
>>> # X has now gone from the __dict__ and Y appeared
>>> SomeClass.__dict__.keys()
['__module__', 'Y', '__dict__', '__weakref__', '__doc__']

An interesting example of assignment creating class members is what happens when you put a list comprehension inside a class body. An implementation detail of list comprehensions is that variables used in the list comprehension 'leak' into their surrounding scope. A list comprehension in a class body creates an unexpected class member:

>>> class SomeClass(object):
...     [foo for foo in (1, 2, 3)]
...
>>> SomeClass.foo
3

The same isn't true of generator expressions where the variable doesn't leak:

>>> class AnotherClass(object):
...     list(bar for bar in (1, 2, 3))
>>> AnotherClass.bar
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'AnotherClass' has no attribute 'bar'

The variable leaking from list comprehensions is a side-effect and should not be relied on.

Whilst the code in the class statement is being executed it creates a temporary namespace. Code can refer to names already assigned as if they were local variables.

>>> class SomeClass(object):
...  X = 3
...  b = [a * X for a in (1, 2, 3)]
...
>>> SomeClass.b
[3, 6, 9]

A common use for this is to create aliases, where you give the same member two or more names. In this example cost is an alias to the calculate_price method:

>>> class SomeClass(object):
...     def calculate_price(self, quantity):
...         return quantity * 10.0
...     cost = calculate_price
...
>>> instance = SomeClass()
>>> instance.calculate_price(20)
200.0
>>> instance.cost(20)
200.0

It is also the standard way of creating properties before Python 2.6:

>>> class SomeClass(object):
...     _value = None
...     def get(self):
...         return self._value
...     def set(self, value):
...         self._value = value
...     value = property(get, set)
...

The value property is created using the get and set functions from the scope that forms the class members.

Unfortunately we have a problem with generator expressions. Generator expressions create their own scope, causing names to be looked up lexically and ignoring the temporary class scope.

>>> class AnotherClass(object):
...  X = 3
...  b = list(a * X for a in (1, 2, 3))
...
Traceback (most recent call last):
  File "<stdin>", line 3, in AnotherClass
  File "<stdin>", line 3, in <genexpr>
NameError: global name 'x' is not defined

If you're interested in how metaclasses are involved in class creation then you should read: Metaclasses in five minutes. (Hopefully readable even for non-gurus.)

An interesting reference on why the class statement in Python contains executable code is this article by Guido van Rossum, the creator of Python: How Everything Became an Executable Statement.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2010-01-10 13:49:13 | |

Categories: , Tags: , ,


Fun with Unicode, Latin-1 and a C1 Control Code

emoticon:torch Unicode is a rabbit-warren of complexity; almost fractal in nature, the more you learn about it the more complexity you discover. Anyway, all that aside you can have great fun (i.e. pain) with fairly basic situations even if you are trying to do the right thing.

This particular problem was encountered by Stephan Mitt, one of my colleagues at Comsulting. I helped him find the solution, and with a bit of digging (and some help from #python-dev) worked out why it was happening.

We receive data from customers as CSV files that need importing into a web application. The CSV files are received in latin-1 encoding and we decode and then iterate over them to process a line at a time. Unfortunately the data from the customers included some \x85 characters, which were breaking the CSV parsing.

One of the problems with the latin-1 encoding is that it uses all 256 bytes, so it is never possible to detect badly encoded data. Arbitrary binary data will always successfully decode:

>>> data = ''.join(chr(x) for x in range(256))
>>> data.decode('latin-1')
u'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f...'

If you iterate over a standard file object in Python 2 (i.e. one that reads data as bytestrings) then you iterate over it a line at a time. This splits lines on carriage returns (\x0D) and line feeds (\x0A). If you're on Windows then the sequence \x0D\x0A (CRLF) signifies a new line. If you're trying to do-the-right-thing, and decode your data to Unicode before treating it as text, then you might use code a bit like the following to read it:

import codecs

handle = codecs.open(filename, 'r', encoding='latin-1')
for line in handle:
    ...

This was the cause of our problem. When decoding using latin-1 \x85 is transcoded to u'\x85', which Unicode treats as a line break. So if your source data has \x85 embedded in it, and you are splitting on lines, where the lines break will be different depending on if you are using byte-strings or Unicode strings:

>>> d = 'foo\x85bar'
>>> d.split()
['foo\x85bar']
>>> u = d.decode('latin-1')
>>> u
u'foo\x85bar'
>>> u.split()
[u'foo','bar']

This could still be a pitfall in Python 3, where all strings are Unicode, particularly if you are porting an application from Python 2 to Python 3. Suddenly your data will behave differently when you treat it as Unicode. The answer is to do the split manually, specifying which character to use as a line break.

The problem isn't restricted to \x85. The Unicode spec on newlines shows us why. \x85 is referred to by the acronym NEL, which is a C1 Control Code: NEL Next Line Equivalent to CR+LF. Used to mark end-of-line on some IBM mainframes.

In fact NEL belongs to a general class of characters known as Paragraph Separators (Category B). This category includes the characters \x1C, \x1D, \x1E, \x0D, \x0A and \x85. Splitting on lines will split on any of these characters, which may not be what you expect. It certainly wasn't what we expected.

For us the solution was simple; we just strip out any occurence of \x85 in the binary data before decoding.

Note

Marius Gedminas suggests that the data is probably encoded as Windows 1252 rather than Latin-1. He is probably right.

There are some interesting notes on Unicode line breaks in this Python bug report: What is an ASCII linebreak?.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2010-01-07 12:42:27 | |

Categories: , , Tags: , ,


Python Surprises

emoticon:pill In the last few days I've run into several things I didn't know about Python. Not necessarily bad or wrong, just new to me.

>>> object.__new__(int)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object.__new__(int) is not safe, use int.__new__()

The same happens for pretty much all the built-in types. I don't think you can achieve this effect from pure-Python code, which is why it is impossible (I think) to write a real singleton in pure-Python. From any singleton instance you can always do this:

object.__new__(type(the_singleton))

Anyway, next surprise:

>>> class Meta(type):
...  __slots__ = ['foo']
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
    nonempty __slots__ not supported for subtype of 'type'

This was annoying at the time, but caused me to find a better way to achieve what I wanted anyway. These first two show that despite the 'grand-merger' of Python 2.2 you can't treat the built-in types exactly as if they were user-defined classes.

The next one I actually ran into a while back:

>>> @EventHandler[HtmlEventArgs]
  File "<stdin>", line 1
    @EventHandler[HtmlArgs]
                 ^
SyntaxError: invalid syntax

This one is annoying. In IronPython EventHandler[HtmlEventArgs] would return a typed event handler for wrapping a function with. Decorator syntax would be very convenient but the only valid syntax is a name followed by optional parentheses and arguments - not any arbitrary expression.

The relevant part of the grammar is:

decorator      ::=  "@" dotted_name ["(" [argument_list [","]] ")"] NEWLINE

This grammar not only prevents indexing but means you can't (for example) define lambda decorators. All it would take is a grammar change and these could work, no actual code would need to be written in support. The reason that Guido didn't allow it is that he didn't want people writing code like:

@(F((foo + bar / 3 )) / [x**2 for x in frobulator])
def function():
    ...

Guido did agree that the rules could be relaxed (here is the python-ideas thread where it was discussed), but then the language moratorium came into effect.

The final surprise was that default object equality comparison is implemented inside the Python runtime instead of there being a default implementation in object. In fact object() instances don't even have the equality / inequality methods (__eq__ / __ne__).

>>> object().__eq__(object())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'object' object has no attribute '__eq__'

However, if you look up __eq__ on the type, as you might if you were trying to delegate up to the default implementation that doesn't exist, then something weird happens:

>>> object.__eq__(object(), object())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: expected 1 arguments, got 2
>>> object.__eq__
<method-wrapper '__eq__' of type object at 0x141fc0>

When you look up __eq__ on object (the type rather than an instance) then you get the __eq__ method of its metaclass (type) bound to object which is an instance of type. As this is a bound method it only takes one argument and calling it with two arguments causes a TypeError.

In fact there is nothing special about __eq__ here, I just didn't realise that member resolution on types would check the metaclass after checking the base classes:

>>> class Meta(type):
...     X = 3
...
>>> class Something(object):
...     __metaclass__ = Meta
...
>>> Something.X # from the metaclass
3
>>> Something.X = 4 # set on the type
>>> Meta.X
3
>>> class SomethingElse(Something): pass
...
>>> SomethingElse.X # fetched from base class not the metaclass
4

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2010-01-04 12:16:12 | |

Categories: , Tags: , ,


Mocking Magic Methods and Preserving Function Signatures Whilst Mocking

emoticon:cyberpunk So, I'm most of the way through one blog entry, my tax return is due, I have a PyCon talk to write and I have a release of ConfigObj [1] just waiting for me to finish updating the docs. Naturally then I should mess around implementing new features for Mock.

These particular features were inspired by an email from Mock user Juho Vepsalainen who had a particular problem with Mock. In case you aren't familiar with it, Mock is a simple mocking library for unit testing. Mock makes creating mock objects, and patching out implementations with mocks at runtime, trivially easy.

I've spent a chunk of time today implementing a module that extends Mock to add new features. Eventually they will become part of Mock itself, but that would require a new release and tedious things like writing documentation:

Note

I've already improved the code in extendmock and merged it into the main mock module. No need for a special MagicMock class any more. You can use mock.py from subversion or wait for the release of version 0.7.

To implement a lot of functionality (mocking any class and recording how they are used), mocks are instances of the Mock class. This can be a problem for code that uses introspection to determine if something is a function or not, or introspects the function signature. If you mock a function or method it will be replaced with a callable object with the signature (*args, **kwargs). This also means that code which is called incorrectly won't raise an error, you will only catch this in your tests if you specifically check how the object is called (which you usually will because that's the point of mocking it out - but still).

A solution to all these problems is the mocksignature function. This takes a function (or method) and a mock object. It creates a wrapper function with the same signature as the function you pass in. When called this wrapper function calls the mock, so instead of directly patching a mock to replace a function or method you use the function returned by mocksignature. Code that introspects the function you are patching out will still work. Here's an example:

from mock import Mock, patch
from extendmock import mocksignature

from some_module import some_function
mock = Mock()

mock_function = mocksignature(some_function, mock)

@patch('some_module.some_function', mock_function)
def test():
    from some_module import some_function
    some_function('foo', 'bar', 'baz')

test()
mock.assert_called_with('foo', 'bar', 'baz')

To make it more convenient to use I will build support for mocksignature into the patch decorator.

You can also use mocksignature on instance methods:

from mock import Mock
from extendmock import mocksignature

class Something(object):
    def method(self, a, b):
        pass

s = Something()
mock = Mock()
mock_method = mocksignature(s.method, mock)
s.method = mock_method

s.method(3, 4)
mock.assert_called_with(3, 4)

A limitation of mocksignature is that all arguments are passed to the underlying mock by position. If there are default values they will be explicitly passed in. Keyword arguments are only collected if the function uses **kwargs. See the tests for more details. The important fact is that the function signature is unchanged:

import inspect
from extendmock import mocksignature
from mock import Mock

def f(a, b, c='foo', **kwargs):
    pass
mock = Mock()

new_function = mocksignature(f, mock)
assert inspect.getargspec(f) == inspect.getargs(new_function)

The limitation on keyword arguments sounds confusing (certainly the way I expressed it above), so it's easier to demonstrate in practise with the call_args attribute:

>>> from mock import Mock
>>> from extendmock import mocksignature
>>>
>>> mock = Mock()
>>>
>>> def f(a=None): pass
...
>>> f2 = mocksignature(f, mock)
>>> f2()
<mock.Mock object at 0x441d70>
>>> mock.call_args
((None,), {})
>>> mock.assert_called_with(None)
>>>

Even though we passed no arguments in, the argument with the default value (a) is called as if None was passed in explicitly. This affects the way you use assert_called_with when using Mock and mocksignature in concert. You can still use mocksignature with functions that collect args with *args and **kwargs:

>>> from extendmock import mocksignature
>>> from mock import Mock
>>>
>>> def f(*args, **kw): pass
...
>>> mock = Mock()
>>> mock.return_value = 3
>>> f2 = mocksignature(f, mock)
>>> f2(1, 'a', None, foo='fish', bar=1.0)
3
>>> mock.call_args
((1, 'a', None), {'foo': 'fish', 'bar': 1.0})
>>>

Another problem with Mock is that it currently doesn't support mocking out the Python protocol methods (like __len__, __getitem__ and so on). extendmock contains a new class that adds magic suport to Mock: MagicMock. Here's an example of how you use it:

from extendmock import MagicMock

mock = MagicMock()

_dict = {}
def getitem(self, name):
    return _dict[name]
def setitem(self, name, value):
    _dict[name] = value
def delitem(self, name):
    del _dict[name]

mock.__setitem__ = setitem
mock.__getitem__ = getitem
mock.__delitem__ = delitem

self.assertRaises(KeyError, lambda: mock['foo'])
mock['foo'] = 'bar'
self.assertEquals(_dict, {'foo': 'bar'})
self.assertEquals(mock['foo'], 'bar')
del mock['foo']
self.assertEquals(_dict, {})

You mock magic methods by assigning a function (or a mock object) to the mock instance. Magic methods are looked up on the object class by the Python interpreter. MagicMock has all the magic methods implemented in a way that checks for corresponding instance variables, with sensible behaviour if the instance variable doesn't exist. However, the presence of these magic methods on the class could break some duck-typing (if it checks for the presence or absence of these methods), so I would rather have MagicMock be a separate class instead of integrating this into the Mock class. On the other hand there is no reason why I can't move MagicMock into the mock module next time I do a release.

For all magic methods you mock in this way you have to include self in the function signature. I might change this at a future date, so be warned this an experimental implementation. Also note that calls to mocked magic methods aren't recorded in method_calls and don't use object wrapping - all things that may change in the future.

One reason that some users have been requesting magic method support is for mocking context managers. Unfortunately __enter__ and __exit__ are looked up differently from the other magic methods in Python 2.5 and 2.6 (they aren't looked up on the class first but on the instance first like normal members). This makes the following technique still the correct way to mock the with statement.

Note

This is no longer true in the magic method support now in trunk. You mock __enter__ and __exit__ in exactly the same way as you do other magic methods.

You can also mock magic methods by assigning a Mock instance to the method you are mocking. For example:

>>> from mock import Mock
>>> mock = Mock()
>>> mock.__getitem__ = Mock()
>>> mock.__getitem__.return_value = 'bar'
>>> mock['foo']
'bar'
>>> mock.__getitem__.assert_called_with('foo')

Mocking the with statement:

mock = Mock()
mock.__enter__ = Mock()
mock.__exit__ = Mock()
mock.__exit__.return_value = False

with mock as m:
    self.assertEqual(m, mock.__enter__.return_value)
mock.__enter__.assert_called_with()
mock.__exit__.assert_called_with(None, None, None)
[1]Mike Driscoll has just written a very good short tutorial for ConfigObj by the way: A brief ConfigObj Tutorial.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2010-01-03 00:35:50 | |

Categories: , , Tags: , , ,


Django json support

emoticon:eggs As I mentioned in my last entry I'm now working on a Silverlight application with Django on the backend. This means that we're using Django to serve json to the Silverlight application, so whilst we're using the Django ORM, url routing and authentication we aren't using its templating.

The data model is 'unusual' but makes sense for the app. We've only implemented the first user story, which uses a subset of the data, but you can already start to see the shape of it. Here's a simplified approximation of the data from the point of view of the Django model classes:

from django.db import models

class CompanyType(models.Model):
    type = models.CharField(max_length=255)

class Company(models.Model):
    name = models.CharField(max_length=255)
    company_type = models.ForeignKey(CompanyType)

class Address(models.Model):
    street = models.CharField(max_length=255)
    city = models.CharField(max_length=255)
    postcode = models.CharField(max_length=255)
    company = models.ForeignKey(Company)

class Individual(models.Model):
    first_name = models.CharField(max_length=255)
    last_name = models.CharField(max_length=255)
    address = models.ForeignKey(Address)

The reason for this slightly non-intuitive setup is that a company may have several addresses. At every address there can be several contacts.

In our view we have a companies function that needs to return a list of all the companies. If we use the built-in json serializer then for the company_type field it just puts an id number into the json. If we wanted the actual company_type then we would have to make an additional query per company.

Additionally, for this view we want to retrieve all of the addresses associated with a company and every individual associated with each address.

There is a project called wadostuff that includes a replacement serializer. It's very easy to use, just specify the following in settings.py:

SERIALIZATION_MODULES = {
    #'json': 'djangoserializers.json'
    'json': 'wadofstuff.django.serializers.json'
}

When we import and call the Django json serializer we can now specify relations for the serializer to follow and include in the json:

from django.core import serializers
from project.app.models import Company

from django.http import HttpResponse

def company(request):
    companies = Company.objects.all()
    json = serializers.serialize(companies, relations=('company_type',))
    return HttpResponse(json, mimetype="application/json"))

This doesn't solve the problem of how we include the addresses and individuals information. One option would be to generate three separate lists and include them all in the json and let the client sort them out. The wadostuff serializer does let us specify a set of extra fields (extras). Despite what the documentation says, in practise I had to implement these as methods on the model objects that could only return a string. This means I couldn't use it to return a list of model objects like I wanted.

Maybe I'm missing something obvious, which is entirely likely as I'm new to Django, but it doesn't seem like this use case is that unusual. I'm surprised that Django has no infrastructure at all to support this kind of use case??

After a bit of hunting I discovered the awesome django-piston project. We don't need an XML or YAML API, nor streaming or throttling, but it includes an awesome json serializer that I 'borrowed' and hacked around so that I could use it on its own. My final code for associating each company with the related addresses and individuals looks like this:

from project.app.models import Company
from project.modules.emitter import Emitter

from django.http import HttpResponse


def companies(request):
    companies = Company.objects.all()

    for company in companies:
        addresses = company.address_set.all()
        company.addresses = addresses
        for address in addresses:
            individuals = address.individual_set.all()
            address.individuals = individuals

    emitter = Emitter(fields=('company', 'company_type', 'address', 'addresses', 'individuals'))
    thejson = emitter.render(companies)
    return HttpResponse(thejson, mimetype="application/json")

This follows the 'company', 'company_type', and 'address' relations on model objects it serializes and also handles the addresses and individuals fields. What I get back on the Silverlight end is json representing a list of all companies. Each company has an 'addresses' field with a list of all addresses for that company and each address has an 'individuals' field. This is exactly what we need.

Note

In the comments Doug Napoleone suggests using select_related() rather than all() as it should be more efficient given the way we are using all the relations.

Doug also suggests setting 'related_name' in the model fields which would give me nicer names than address_set and individual_set. If I taught the serializer how to handle these names then I could move the loop from my view to the serializer; but the loop would still be there, so no efficiency gain just nicer looking code. Smile

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2009-11-16 01:54:10 | |

Categories: , , Tags: , ,


The Python Object Model Revisited (data descriptors)

emoticon:scanner A few weeks ago I demonstrated the complexity of the Python object model by fetching docstrings from objects. A while after posting it I thought of a bug - or at least a way in which it could return the wrong result when looking up an attribute on an object. It will probably come as no surprise that this is due to the descriptor protocol.

Descriptors are special types of objects that have __get__ and or __set__ and __delete__ methods and have special behaviour when fetched, set or deleted as object attributes. They are how methods, class methods, static methods, properties and __slots__ are implemented in Python.

Descriptors that have both __get__ and __set__ are called data descriptors (properties are the canonical example), descriptors with only __get__ are non-data descriptors (methods being the canonical example). Data descriptors have interesting behaviour when they are on a class which has the same member in the instance dictionary.

Instance members are stored in the __dict__ attribute of the object. Normally if this instance dictionary has a member then fetching that member will pull it out of the dictionary. The exception is that if the class has a data-descriptor with the same name then that will be invoked instead of the object in the instance dictionary. This is easy to demonstrate:

>>> class A(object):
...     @property
...     def a(self):
...         return 'property'
...
>>> a = A()
>>> a.__dict__['a'] = 'attribute'
>>> a.a
'property'

So a data-descriptor on the class will override a member with the same name on the instance - but the 18 lines of code I wrote before for fetching docstrings from attributes will always look on the instance first.

The same is true for inherited data-descriptors:

>>> class B(A): pass
...
>>> b = B()
>>> b.__dict__['a'] = 'attribute'
>>> b.a
'property'

Non-data descriptors don't override instance attributes and data-descriptors on a base class don't override normal class attributes on a subclass.

To handle this we need to check both the instance and walk the inheritance hierarchy. If we find the member we are looking for in both then we check the member from the class for a __set__ method. If the member from the class (or one of its base classes) has a __set__ member then we return that - otherwise we return the member from the instance.

Our modified full code that takes this into account has grown to 22 lines and now looks like:

import types
import inspect

def get_doc(obj, member):
    found = []
    if hasattr(obj, '__dict__') and member in obj.__dict__:
        found.append(obj.__dict__[member])

    if isinstance(obj, (type, types.ClassType)):
        search_order = inspect.getmro(obj)
    else:
        search_order = inspect.getmro(obj.__class__)

    for entry in search_order:
        if member in entry.__dict__:
            if hasattr(entry.__dict__[member], '__set__'):
                return entry.__dict__[member].__doc__
            found.append(entry.__dict__[member])
            return found[0].__doc__


def get_docstrings(obj):
    try:
        members = dir(obj)
    except Exception:
        members = []
    return [(member, get_doc(obj, member)) for member in members]

Note

In practise there is another exception that we haven't handled here. Although you can override methods with instance attributes (very useful for monkey patching methods for test purposes) you can't do this with the Python protocol methods. These are the 'magic methods' whose names begin and end with double underscores. When invoked by the Python interpreter they are looked up directly on the class and not on the instance (however if you look them up directly - e.g. x.__repr__ - normal attribute lookup rules apply).

There is a corner case (that I alluded to in my previous post), classes can define __slots__ and create a dummy __dict__ member. If this member isn't a dictionary then our code will barf horribly - but really this is such an evil corner case that I'm not going to worry about it. Smile

I have seen one use case for __slots__ in combination with a fake __dict__ member: proxying attribute access. This is a part of the werkzeug web framework - the LocalProxy class defines __dict__ as a property which returns the __dict__ member of the object it is proxying...

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2009-06-22 23:08:08 | |

Categories: , Tags: ,


Hosted by Webfaction

Counter...