Python Programming, news on the Voidspace Python Projects and all things techie.

Implementing __dir__ (and finding bugs in Pythons)

emoticon:paper A new magic method was added in Python 2.6 to allow objects to customise the list of attributes returned by dir.

The new protocol method (I don't really like the term "magic method" but it is so entrenched both in the Python community and in my own mind) is __dir__. From the docs:

This allows objects that implement a custom __getattr__() or __getattribute__() function to customize the way dir() reports their attributes.

mock.Mock() is one such object that provides attributes dynamically and I thought it would be good if dir(mock) correctly reported the attributes it had created. Plus, if a mock is created with a spec (so that only attributes available on the original object are available on the mock), then all available attributes (even if they haven't been created yet) should be reported.

So a pretty standard way of implementing __dir__ would seem to be thusly; take all the standard attributes normally reported by dir and add the dynamically created ones. It turns out that isn't so easy, because if you're providing __dir__ there is no way to get the list of "standard attributes normally reported by dir". There is no object.__dir__ to call up to, and calling dir(self) causes infinite recursion (of course!).

The strategy I went for was to call dir on the type (so get all the class attributes), add anything in the instance __dict__ and finally add the dynamically created attributes. Throw the whole mix into a set to remove duplicates and then return a sorted list of the results:

def __dir__(self):
    return sorted(set((dir(type(self)) + list(self.__dict__) +
                  self._get_dynamic_attributes()))

Of course this doesn't play well with multiple inheritance and is just a touch ugly. It would be far nicer to be able to do (using the Python 3 super calling convention):

def __dir__(self):
    standard = super().__dir__()
    return standard + self._get_dynamic_attributes()

Thankfully other folk on the python-ideas list either agreed or didn't disagree (which is a rare thing!), so it might happen for Python 3.3.

Note

Since writing this, Benjamin Peterson has gone ahead and done it.

Whilst fiddling with implementing and testing __dir__ in mock I had an obscurely failing test on pypy. Most dir(mock) calls behaved as expected, but when using a module as a spec __dir__ wouldn't be called. For example: dir(Mock(spec=sys)).

If you create a mock with a spec then mock.__class__ returns the class of the spec object, so that isinstance(mock, SpecType) still passes. pypy implements dir (in Python) and special cases modules using isinstance (only the contents of the module and not module attributes are reported). This means that a mock with a module as a spec doesn't have __dir__ called because the pypy implementation of dir thinks that it is a module.

Although obscure this is also a problem for a subclass of ModuleType that has a custom __dir__ implementation. Benjamin Peterson happened to be online when I reported it and fixed it within minutes - by moving the check for __dir__ ahead of the check for a module.

Whilst looking at the source code to the pypy dir implementation Benjamin and I both noticed another issue that affects both pypy and CPython:

>>> class foo(type):
...   def __dir__(self):
...    return ['f']
...
>>> class bar(object):
...  __metaclass__ = foo
...
>>> dir(bar)
['f']
>>> dir(bar())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __dir__() takes exactly 1 argument (2 given)

(The same thing happens with Python 3 - just use class bar(metaclass=foo): instead).

A class that has a metaclass implementing __dir__ will blow up when you call dir on instances of the class. It claims that you passed two arguments to __dir__! This is because both pypy and CPython look up the __dir__ method by doing the equivalent of type(obj).__dir__(obj) (or in the case of pypy exactly that). They do the lookup on the class rather than the instance because this is how protocol methods are supposed to be fetched. Fetching the method from the class returns an unbound method (or in Python 3 a normal function - which is a rant for another day), so it takes an instance of the class as the first argument.

Unfortunately you can lookup metaclass methods on a type, so if the class doesn't have a __dir__, but the metaclass does, then type(obj).__dir__ returns the metaclass method. What is more the class is an instance of the metaclass, so the method returned is a bound method, with the class already bound as the first argument. When this method is then called with the original object as a second argument it rightly complains.

>>> bar.__dir__
<bound method foo.__dir__ of <class '__main__.bar'>>
>>> bar.__dir__()
['f']
>>> type(bar()).__dir__
<bound method foo.__dir__ of <class '__main__.bar'>>

I'm pretty sure that Benjamin has now fixed this too, in both pypy and CPython.

So the custom __dir__ will be in mock 0.8.0, which has several great new features. There'll be an alpha release soon for you to play with. Whilst working on those features I found a bug in CPython, a bug in pypy and a bug in jython.

The bug in jython was in 2.5.1, and already fixed in 2.5.2, but it's still nice to be able to say that in one day I found bugs in three implementations of Python! The specific problem was that rich comparisons of Python classes and object were returning different results from CPython. My tests for rich comparisons (for the new magic method mocking support in 0.7.0) were then failing with jython 2.5.1.

In jython 2.5.1:

>>> class Foo(object): pass
...
>>> Foo < object
True

And in CPython:

>>> class Foo(object): pass
...
>>> Foo < object
False

Onto pypy. Booleans in Python subclass integer, which is sucky but that's another discussion and isn't going to change anyway. So they have integer attributes, for example True.real == 1. In pypy 1.5 True.real == True. This triggered a bug in the new mock code and caused infinite recursion - so it was a useful difference. Wink

However, this was a more general bug in pypy. To match CPython behaviour, these int / long / float attributes on subclasses should return straight ints, longs or floats. In pypy 1.5 they return instance of the subclass instead. This was fixed by Armin Rigo within ten minutes of me reporting the issue. (The report was accompanied by Maciej complaining that the real problem is that CPython doesn't have tests for this behaviour which is why it is different in pypy, which is a fair point.)

And finally the CPython bug. In CPython (2.x only) re pattern objects don't have a __class__ attribute (well, technically they do, but they raise an AttributeError when you try to access it).

>>> import re
>>> re.compile('foo')
<_sre.SRE_Pattern object at 0x1043230>
>>> p = re.compile('foo')
>>> p.__class__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: __class__

This is because of a broken tp_getattr slot implementation. This was fixed (for Python 2.7.2) by Benjamin Peterson about fifteen minutes after I reported it.

So the Jython guys win by fixing the bug even before I discovered it. Wink

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2011-05-25 11:04:51 | |

Categories: , , Tags: , ,


Hosted by Webfaction

Counter...