Python Programming, news on the Voidspace Python Projects and all things techie.

Checking multiple calls with mock and coping with mutable arguments

emoticon:tooth mock has a nice API for making assertions about how your mock objects are used.

>>> from mock import Mock
>>> mock = Mock()
>>> mock.foo_bar.return_value = None
>>> mock.foo_bar('baz', spam='eggs')
>>> mock.foo_bar.assert_called_with('baz', spam='eggs')

If your mock is only being called once you can use the assert_called_once_with method that also asserts that the call_count is one.

>>> mock.foo_bar.assert_called_once_with('baz', spam='eggs')
>>> mock.foo_bar()
>>> mock.foo_bar.assert_called_once_with('baz', spam='eggs')
Traceback (most recent call last):
    ...
AssertionError: Expected to be called once. Called 2 times.

Both assert_called_with and assert_called_once_with make assertions about the most recent call. If your mock is going to be called several times, and you want to make assertions about all those calls, the API is not quite so nice.

All of the calls, in order, are stored in call_args_list as tuples of (positional args, keyword args).

>>> mock = Mock(return_value=None)
>>> mock(1, 2, 3)
>>> mock(4, 5, 6)
>>> mock()
>>> mock.call_args_list
[((1, 2, 3), {}), ((4, 5, 6), {}), ((), {})]

Because it stores positional args and keyword args, even if they are empty, the list is overly verbose which makes for ugly tests. It turns out that I do this rarely enough that I've never got around to improving it. One of the new features in 0.7.0 helps with this. The tuples of (positional, keyword) arguments are now custom objects that allow for 'soft comparisons' (implemented by Konrad Delong). This allows you to omit empty positional or keyword arguments from tuples you compare against.

>>> mock.call_args_list
[((1, 2, 3), {}), ((4, 5, 6), {}), ((), {})]
>>> expected = [((1, 2, 3),), ((4, 5, 6),), ()]
>>> mock.call_args_list == expected
True

This is an improvement, but still not as nice as assert_called_with. Here's a helper function that pops the last argument of the call args list and decrements the call count. This allows you to make asserts as a series of calls to assert_called_with followed by a pop_last_call.

def pop_last_call(mock):
    if not mock.call_count:
        raise AssertionError("Cannot pop last call: call_count is 0")
    mock.call_args_list.pop()
    try:
        mock.call_args = mock.call_args_list[-1]
    except IndexError:
        mock.call_args = None
        mock.called = False
    mock.call_count -=1
>>> mock = Mock(return_value=None)
>>> mock(1, foo='bar')
>>> mock(2, foo='baz')
>>> mock(3, foo='spam')
>>> mock.assert_called_with(3, foo='spam')
>>> pop_last_call(mock)
>>> mock.assert_called_with(2, foo='baz')
>>> pop_last_call(mock)
>>> mock.assert_called_once_with(1, foo='bar')

The calls to assert_called_with are made in reverse order to the actual calls. Your final call can be a call to assert_called_once_with, that ensures there were no extra calls you weren't expecting. You could, if you wanted, extend the function to take args and kwargs and do the assert for you.

Another situation is rare, but can bite you, is when your mock is called with mutable arguments. call_args and call_args_list store references to the arguments. If the arguments are mutated by the code under test then you can no longer make assertions about what the values were when the mock was called.

Here's some example code that shows the problem:

def frob(val):
    pass

def grob(val):
    "First frob and then clear val"
    frob(val)
    val.clear()
>>> from mock import patch
>>> with patch('%s.frob' % __name__) as mock_frob:
...     val = set([6])
...     grob(val)
...
>>> val
set([])
>>> mock_frob.assert_called_with(set([6]))
Traceback (most recent call last):
    ...
AssertionError: Expected: ((set([6]),), {})
Called with: ((set([]),), {})

One possibility would be for mock to copy the arguments you pass in. This could then cause problems if you do assertions that rely on object identity for equality.

Here's one solution that uses the side_effect functionality. If you provide a side_effect function for a mock then side_effect will be called with the same args as the mock. This gives us an opportunity to copy the arguments and store them for later assertions. In this example I'm using another mock to store the arguments so that I can use the mock methods for doing the assertion. Again a helper function sets this up for me.

from mock import Mock, DEFAULT
from copy import deepcopy

def copy_call_args(mock):
    new_mock = Mock()
    def side_effect(*args, **kwargs):
        args = deepcopy(args)
        kwargs = deepcopy(kwargs)
        new_mock(*args, **kwargs)
        return DEFAULT
    mock.side_effect = side_effect
    return new_mock
>>> with patch('%s.frob' % __name__) as mock_frob:
...     new_mock = copy_call_args(mock_frob)
...     val = set([6])
...     grob(val)
...
>>> new_mock.assert_called_with(set([6]))
>>> new_mock.call_args
((set([6]),), {})

copy_call_args is called with the mock that will be called. It returns a new mock that we do the assertion on. The side_effect function makes a copy of the args and calls our new_mock with the copy.

Note

If your mock is only going to be used once there is an easier way of checking arguments at the point they are called. You can simply do the checking inside a side_effect function.

>>> def side_effect(arg):
...     assert arg == set([6])
...
>>> mock = Mock(side_effect=side_effect)
>>> mock(set([6]))
>>> mock(set())
Traceback (most recent call last):
 ...
AssertionError

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2010-11-25 19:14:05 | |

Categories: , Tags: ,


Mocking Unbound Methods and Mocking Properties

emoticon:pencil Whilst writing tests today I needed to patch an unbound method (patching the method on the class rather than on the instance). I needed self to be passed in as the first argument because I want to make asserts about which objects were calling this particular method. The issue is that you can't patch with a mock for this, because if you replace an unbound method with a mock it doesn't become a bound method when fetched from the instance, and so it doesn't get self passed in. The workaround is to patch the unbound method with a real function instead. The patch decorator makes it so simple to patch out methods with a mock that having to create a real function becomes a nuisance.

Thankfully mock 0.7.0b4 introduces a new feature that allows you to mock out methods with real functions; mocksignature. (The latest - and hopefully last - beta of 0.7.0 is being downloaded around 150 times a day from PyPI! I suspect some people aren't using egg caches though...)

If you pass mocksignature=True to patch then it does the patching with a real function object. This function object has the same signature as the one it is replacing, but delegates to a mock under the hood. You still get your mock auto-created in exactly the same way as before. What it means though, is that if you use it to patch out an unbound method on a class the mocked function will be turned into a bound method if it is fetched from an instance. It will have self passed in as the first argument, which is exactly what I wanted:

>>> from mock import patch
>>>
>>> class Foo(object):
...   def foo(self):
...     pass
...
>>> with patch('%s.Foo.foo' % __name__, mocksignature=True) as mock_foo:
...   mock_foo.return_value = 'foo'
...   foo = Foo()
...   foo.foo()
...
'foo'
>>> mock_foo.assert_called_once_with(foo)

If we don't use mocksignature=True then the unbound method is patched out with a Mock instance instead, and isn't called with self.

A few people have asked about mocking properties, specifically tracking when properties are fetched from objects or even having side effects when properties are fetched.

I'm keen to not grow the mock API unless there is a really compelling use case. Despite this there are quite a few additions in mock 0.7.0. One solution I suggested for mocking properties was to subclass Mock:

from mock import Mock

mock_foo = Mock()
class MyMock(Mock):
    @property
    def foo(self):
        return mock_foo()
>>> mock = MyMock()
>>> mock.foo
<mock.Mock object at 0x...>
>>> mock_foo.assert_called_once_with()

This works fine, but requires the extra boiler-plate. It occurs to me that there is a more general approach, making a version of Mock that behaves like a property by making it a descriptor. You still have subclass to use them, unless you are patching onto another class, so maybe there is not much benefit:

class PropertyMock(Mock):
    def __get__(self, obj, obj_type):
        return self()
    def __set__(self, obj, val):
        self(val)
>>> mock_foo = PropertyMock(return_value='foo')
>>> class MyMock(Mock):
...     foo = mock_foo
...
>>> m = MyMock()
>>> m.foo
'foo'
>>> MyMock.foo.assert_called_with()
>>> m.foo = 3
>>> MyMock.foo.assert_called_with(3)

If PropertyMock was part of the mock module then Mock could support it by allowing you to assign them to an instance and elevating them to be class members. (This is already done for magic methods so it would be very simple. Every Mock has its own class to isolate class members from other instances.) I created an issue to track descriptor mocks. They certainly won't be in 0.7.0, but for 0.8.0 who knows...

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2010-11-25 01:18:54 | |

Categories: , Tags: , ,


Garbage collection via weak references

emoticon:animals_cat Python has a protocol method for finalisation of objects: __del__. If you implement __del__ it will be called when an object is garbage collected:

>>> class Something(object):
...   def __del__(self):
...     print 'called'
...
>>> s = Something()
>>> del s
called

There are lots of caveats around implementing __del__ (see the documentation). Usually it is best not to implement it and make your object a context manager, or find some other way of ensuring it is disposed of properly.

The worst problem with __del__ is that it can prevent objects being garbage collected altogether, and keep them alive indefinitely. One of the drawbacks with reference counting for garbage collection (which is what CPython uses) is that it doesn't handle cycles of objects. Python has a cycle collector to deal with this, but if the cycle involves objects that implement __del__ Python will choose not to arbitrarily break the cycle (because the __del__ methods may then attempt to reference objects that should be reachable but have actually been garbage collected already). This makes the cycle uncollectable and your application leaks memory.

Note

The decision of CPython not to break the cycle, just in case a __del__ method attempts to access a disposed object, is at least slightly controversial. In the same situation most other runtimes choose to break the cycle rather than leak memory. IronPython, built on the .NET runtime, has this behaviour and so does PyPy.

Also note that the examples below are more useful as an interesting look at the lifetime of Python objects and weak references. Explicit disposing of objects is a much better way of managing object life-cycles than using weak references...

There is an alternative approach to providing finalizers (or destructors - terminology in this area is confusing) for objects in Python without implementing __del__. This can be done using weak references. Weak references allow you to keep a reference to a Python object (so long as the object supports weak references) without preventing it from being garbage collected:

>>> import gc
>>> from weakref import ref
>>> class Something(object): pass
...
>>> s = Something()
>>> weak = ref(s)
>>> print weak()
<__main__.Something object at 0x50a750>
>>> del s
>>> print weak()
<__main__.Something object at 0x50a750>
>>> gc.collect()
7
>>> print weak()
None

While the object the weak reference points to is still alive calling the weak reference returns the object. Once the object has been garbage collected calling it returns None.

Weak references have another useful features, which is what allows us to use them for cleaning up objects. You can provide a callback that will be called when the object is collected.

>>> def callback(ref):
...   print 'called'
...
>>> s = Something()
>>> weak = ref(s, callback)
>>> del s
>>> gc.collect()
called
0

So, when an object is created we could create a weak reference with a callback to clean it up. The callback will be executed when the object is garbage collected. This is tricky to do right and has almost as many caveats as implementing __del__. Importantly though, if done correctly, it doesn't have the limitation that it can stop your object from being collected at all if it is involved in a cycle.

If done incorrectly, it can stop your object being collected. One of the biggest difficulties with this approach is that your callback mustn't have any references to the object being collected. If the callback references the object, then the callback will keep the object alive and prevent it from being collected! You also have to keep a reference around to the weak reference, or that will be garbage collected and not executed.

Here's example code that has both those problems:

import weakref

class Something(object):

    def __init__(self):
        def callback(ref):
            self.close()
        weakref.ref(self, callback)

    def close(self):
        pass

The intention here is to have our close method called when a Something instance is collected. The callback function closes over self, keeping a reference to it in the closure. This would prevent it from being collected, but it probably doesn't matter - because without keeping the weak reference alive it isn't going to have any effect anyway...

Note also that the callback is called after your object has been collected. Any attempt to call methods on the object inside our callback is doomed, even if we could somehow do that without keeping the object alive indefinitely in the first place.

So if we mustn't reference the original object, and couldn't even if we wanted to, how is this useful? We can construct a callback that, via a closure, has access just to the parts of the object necessary for the cleanup. In the example below our 'Something' class has a 'Foobar' object attached to it. We want to ensure the 'Foobar' is 'frobbed' when we dispose of 'Something' instances. We give our callback access just to the 'Foobar', without it keeping alive the 'Something' instance itself. We also add new weak references to a list, so that we keep our cleanup refs alive until they are needed.

>>> class Foobar(object):
...   def frob(self):
...     print 'frobbed'
...
>>> class Something(object):
...   def __init__(self):
...     self.foobar = Foobar()
...     foobar = self.foobar
...     def callback(ref):
...       foobar.frob()
...     _refs.append(ref(self, callback))
...
>>> _refs = []
>>> s = Something()
>>> _refs
[<weakref at 0x500b40; to 'Something' at 0x510c50>]
>>> del s
frobbed
>>> _refs[0]
<weakref at 0x500b40; dead>

Notice how our callback references foobar rather than self.foobar. If we used self.foobar then the closure would keep self alive. The closure here is only referencing foobar, the instance member needed for the cleanup.

After deleting s, the callback is called for us and our 'Foobar' is frobbed. The weak reference we kept is then dead. Of course keeping all those dead weak references around is also a memory leak, so we could clean up that list periodically.

Note

Here's an example of an uncollectable cycle. Foo has a __del__ method which is called when instances are collected. If we create two Foo instances and make them refer to each other then we have a cycle. Because a has a reference to b, and vice-versa, the reference count on each is still one even when all other references to the instances have been deleted. Because of the __del__ method the instances are uncollectable and live on in gc.garbage until interpreter shutdown.

>>> import gc
>>> class Foo(object):
...   def __del__(self):
...     print 'called'
...
>>> a = Foo()
>>> del a
>>> gc.collect()
called
0
>>> a = Foo()
>>> b = Foo()
>>> a.b = b
>>> b.a = a
>>> del a, b
>>> gc.collect()
4
>>> gc.collect()
0
>>> gc.garbage
[<__main__.Foo object at 0x...>, <__main__.Foo object at 0x...>]

A practical example of using weak references for finalising objects can be seen in Benjamin Peterson's recipe for Calling C-level finalizers with ctypes. (But just because Benjamin does it doesn't mean you should...)

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2010-11-22 22:05:29 | |

Categories: , Tags: ,


Fetching attributes without triggering code execution: getattr_static

emoticon:ir_scope hasattr and getattr are fundamental tools for introspection in Python, but they can both trigger the execution of code when you are looking at attributes that are generated dynamically via __getattr__ or __getattribute__, or are descriptors like properties

>>> class Foo(object):
...   @property
...   def foo(self):
...     print 'called'
...     return 1
...
>>> getattr(Foo(), 'foo')
called
1
>>> hasattr(Foo(), 'foo')
called
True

Normally this is what you want, but there are some situations where you want to examine objects without triggering code execution or risking raising exceptions. (Actually hasattr masks exceptions, but that's another issue.) Documentation tools are one example of where being able to examine live objects without triggering code execution (passive introspection) is useful.

This is trickier than you might imagine (or maybe not...), because the complexities of the Python object model means that virtually any interaction with an object can trigger code execution. I blogged about some of the complexities of this a while back when looking at fetching docstrings from objects.

I've now added a function to the inspect module in Python 3.2 that makes it easier. It uses the same member lookup rules as Python and handles data descriptors correctly, and can cope with classes that lie about their __class__ and classes that have metaclasses that lie about their method resolution order.

getattr_static has the same signature as getattr() but avoids executing code when it fetches attributes.

The code is compatible with Python 2, but the tests only run under Python 3 because they use the Python 3 syntax for specifying the metaclass. The code doesn't work with old style classes (not an issue for Python 3 of course) as it uses type(...) to get the class of the object. Using obj.__class__ can trigger code execution.

>>> class x: pass
...
>>> type(x())
<type 'instance'>

getattr_static looks like this:

_sentinel = object()

def getattr_static(obj, attr, default=_sentinel):
    """Retrieve attributes without triggering dynamic lookup via the
       descriptor protocol,  __getattr__ or __getattribute__.

       Note: this function may not be able to retrieve all attributes
       that getattr can fetch (like dynamically created attributes)
       and may find attributes that getattr can't (like descriptors
       that raise AttributeError). It can also return descriptor objects
       instead of instance members in some cases. See the
       documentation for details.
    """
    instance_result = _sentinel
    if not _is_type(obj):
        instance_result = _check_instance(obj, attr)
        klass = type(obj)
    else:
        klass = obj

    klass_result = _check_class(klass, attr)

    if instance_result is not _sentinel and klass_result is not _sentinel:
        if (_check_class(type(klass_result), '__get__') is not _sentinel and
            _check_class(type(klass_result), '__set__') is not _sentinel):
            # descriptors with both __get__ and __set__ should be returned
            # in preference to instance members
            return klass_result

    if instance_result is not _sentinel:
        # found instance member
        return instance_result
    if klass_result is not _sentinel:
        # found class member
        return klass_result

    if obj is klass:
        # for types we check the metaclass too
        for entry in _static_getmro(type(klass)):
            try:
                # a type object must have a __dict__
                # as metaclasses can't use slots
                return entry.__dict__[attr]
            except KeyError:
                pass
    if default is not _sentinel:
        return default
    raise AttributeError(attr)


def _static_getmro(klass):
    return type.__dict__['__mro__'].__get__(klass)

def _check_instance(obj, attr):
    instance_dict = {}
    try:
        instance_dict = object.__getattribute__(obj, "__dict__")
    except AttributeError:
        pass
    return instance_dict.get(attr, _sentinel)


def _check_class(klass, attr):
    for entry in _static_getmro(klass):
        try:
            # a type object must have a __dict__
            # as metaclasses can't use slots
            return entry.__dict__[attr]
        except KeyError:
            pass
    return _sentinel

def _is_type(obj):
    try:
        _static_getmro(obj)
    except TypeError:
        return False
    return True

A particularly ugly bit of the code above is how we get the method resolution order for the type without fetching the __mro__ member which may be faked by the metaclass:

type.__dict__['__mro__'].__get__(klass)

We reuse this to tell if an object is a type. It raises a TypeError if you pass in an object that isn't a type. We can't use isinstance because that checks the __class__ member, which may be faked by implementing it as a property. (This is how objects can lie to isinstance about what they are.)

Notes from the documentation.

The only known case that can cause getattr_static to trigger code execution, and cause it to return incorrect results (or even break), is where a class uses object.__slots__ and provides a __dict__ member using a property or descriptor. If you find other cases please report them so they can be fixed or documented.

getattr_static does not resolve descriptors, for example slot descriptors or getset descriptors on objects implemented in C. The descriptor object is returned instead of the underlying attribute.

You can handle these with code like the following. Note that for arbitrary getset descriptors invoking these may trigger code execution:

# example code for resolving the builtin descriptor types
class _foo(object):
    __slots__ = ['foo']

slot_descriptor = type(_foo.foo)
getset_descriptor = type(type(open(__file__)).name)
wrapper_descriptor = type(str.__dict__['__add__'])
descriptor_types = (slot_descriptor, getset_descriptor, wrapper_descriptor)

result = getattr_static(some_object, 'foo')
if type(result) in descriptor_types:
    try:
        result = result.__get__()
    except AttributeError:
        # descriptors can raise AttributeError to
        # indicate there is no underlying value
        # in which case the descriptor itself will
        # have to do
        pass

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2010-11-21 22:30:41 | |

Categories: , Tags: ,


Hosted by Webfaction

Counter...