Python Programming, news on the Voidspace Python Projects and all things techie.

Python in Fifty Words

emoticon:clock Aahz, a long standing member of the Python community, is manning a Python booth at the OSCON Conference. With the help of the Python Advocacy mailing list he has come up with a description of Python in fifty words. I thought it was pretty neat:

Python is a dynamic, object-oriented, and general-purpose programming language with extensive libraries. Python's clean syntax and testing frameworks encourage readable and maintainable code. Python supports major operating systems, Java, .NET, and mobile/embedded devices. Integration with C/C++ is straightforward. Adopters include Google/YouTube, Philips, HP, EVE Online, and Industrial Light and Magic.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2009-05-21 22:14:47 | |

Categories: Tags: , ,


Fetching docstrings from objects: easy, right? (A painful exploration of the Python object model)

emoticon:waffle Extraordinarily simple introspection is one of the features that makes dynamic languages like Python such a joy to work with. If you have code dealing with arbitrary objects and you want to discover and present information about those objects it is marvellously simple. Alongside this Python has docstrings; a simple way to associate usage documentation with just about any object. This is invaluable when working with the interactive interpreter; you can import and instantiate classes and get documentation on live objects by calling help(instance.method).

These features, listing all members, fetching them by name and looking at the docstring, can also be used in live systems or in tools that automatically build documentation for libraries.

So just how easy is it to fetch the docstring from an arbitrary member on an arbitrary object? If you are going to handle all the corner cases of the Python type system then the answer is harder than you might think. As I've ended up writing code that does this (more or less) twice in the last week I'll walk you through the nefarious pitfalls and lets see how many lines of code we end up with.

The arbitrary goal I've set for this task is to write a function that when given an object returns a list of all members along with their docstrings. Along the way we'll learn far more about the Python object model than we wanted to know.

A naive first attempt looks like this:

def get_docstrings(obj):
    members = dir(obj)
    return [(member, getattr(obj, member).__doc__) for member in members]

dir(obj) returns a list of all the member names of obj as strings. getattr(obj, member).__doc__ fetches an individual attribute by name and gets its associated docstring.

In IronPython I can make this crash very easily:

>>> from System import ArgIterator
>>> dir(ArgIterator)
Traceback (most recent call last):
  ...
SystemError: An attempt was made to load a program with an incorrect format. (Ex
ception from HRESULT: 0x8007000B)

Ha, bad IronPython I can already hear you chuckling. Not so fast. Since Python 2.6 and the introduction of the __dir__ protocol method we can easily suffer the same problem:

class UnDesirable(object):
    def __dir__(self):
        raise Exception("Don't dir me bro")

>>> dir(UnDesirable())
Traceback (most recent call last):
  ...
Exception: Don't dir me bro

Ok, so we can catch this error and return an empty list.

def get_docstrings(obj):
    try:
        members = dir(obj)
    except Exception:
        members = []
    return [(member, getattr(obj, member).__doc__) for member in members]

What's next? What happens when we run this function on an instance of a class with properties?

class Proper(object):
    @property
    def something(self):
        "Here be the docstring"

>>> get_docstrings(Proper())
[... ('something', None)]

When our code called getattr(obj, 'something') it actually triggered the property instead of fetching it as an object for us to look at. Introspection that causes code execution is a bad thing (tm) - it could raise exceptions or have unpleasant side effects. More to the point we don't get the information we want.

The property descriptor for 'something' lives on the class:

>>> getattr(Proper, 'something')
<property object at 0x7e600>
>>> getattr(Proper, 'something').__doc__
'Here be the docstring'

The problem is that although this works for things like methods and properties it doesn't work for instance attributes. Even worse, in IronPython this is no better than our previous solution because static properties are very common - fetching an attribute from a class can also cause arbitrary code execution. Haha, bad IronPython again. Well, thanks to the wonders of the descriptor protocol the same thing can happen in Python:

class descriptor(object):
    def __init__(self, function):
        self.func = function
        self.__doc__ = function.__doc__

    def __get__(self, instance, owner):
        raise Exception("No one expects the descriptor protocol")

class Harumph(object):
    @descriptor
    def something(self):
        "I still haven't found what I'm looking for"

>>> getattr(Harumph, 'something')
Traceback (most recent call last):
  ...
Exception: No one expects the descriptor protocol

The reason it is so tempting to use getattr is that it neatly handles finding exactly where attributes live. Once we have decided we shouldn't use getattr we have to start understanding the way Python looks up attributes in more detail than we really wanted.

We've already seen that although instance attributes live on the instance, to safely find things like properties and methods we need to look on the class. Python objects have __dict__ attributes that act as namespaces for objects - they map the member names to the members. We could rewrite our function to use it, first checking the instance and then the class:

def get_docstrings(obj):
    try:
        members = dir(obj)
    except Exception:
        members = []
    return [(member, get_doc(obj, member)) for member in members]

def get_doc(obj, member):
    if member in obj.__dict__:
        return obj.__dict__[member].__doc__
    return type(obj).__dict__[member].__doc__

This works fine for some situations but is so full of holes it is hard to know where to start. Let's start with classes that use slots. Classes that define a __slots__ member (list of strings) don't have an instance dictionary (a memory optimisation) but have reserved slots for the instance members specified in the class. Attempting to access instance.__dict__ for objects like this will die with an attribute error.

Classes using slots will however have an entry in the class dictionary for each slot:

class Slotted(object):
    __slots__ = ['x']

>>> slotty = Slotted()
>>> slotty.__dict__
Traceback (most recent call last):
  ...
AttributeError: 'Slotted' object has no attribute '__dict__'

>>> dir(slotty)
[... 'x']
>>> slotty.x
Traceback (most recent call last):
  ...
AttributeError: x
>>> type(slotty).__dict__['x']
<member 'x' of 'Slotted' objects>

These member slots don't have a docstring but lets assume this is sufficient for our use case and handle it:

def get_docstrings(obj):
    try:
        members = dir(obj)
    except Exception:
        members = []
    return [(member, get_doc(obj, member)) for member in members]

    def get_doc(obj, member):
        if hasattr(obj, '__dict__') and member in obj.__dict__:
            return obj.__dict__[member].__doc__
        return type(obj).__dict__[member].__doc__

This gets us past one hurdle. Let's try our function on a the Proper class instead of an instance:

>>> get_docstrings(Proper)
Traceback (most recent call last):
 ...
KeyError: '__class__'

Slightly obscure, but it turns out that the __class__ attribute of an object (a pointer to its type) is another descriptor and is inherited from object. We don't find it in the instance dictionary or the class dictionary. In fact our code would have the same problem with any attribute inherited from a base class.

>>> object.__dict__['__class__']
<attribute '__class__' of 'object' objects>
>>> descriptor = object.__dict__['__class__']
>>> descriptor.__get__(Proper(), Proper)
<class '__main__.Proper'>

To find an arbitrary attribute you need to not only look in the instance dictionary and the class dictionary, but also walk the inheritance hierarchy if the attribute lives on any of the base classes. And what about multiple inheritance, oh god. Thankfully this isn't as difficult a problem as it sounds. Python classes define an __mro__ attribute (the Method resolution order) which even in the face of multiple inheritance returns a list of base classes you need to search - and in the right order too (the order is important in case the method is defined on a base class and overridden in a sub-class). We can rewrite our code to use this:

def get_doc(obj, member):
    if hasattr(obj, '__dict__') and member in obj.__dict__:
        return obj.__dict__[member].__doc__
    for entry in type(obj).__mro__:
        if member in entry.__dict__:
            return entry.__dict__[member].__doc__

Let's try it on a subclass of Proper:

class SubProper(Proper):
    pass

>>> get_docstrings(SubProper())
[...  ('something', 'Here be the docstring')]
>>> get_docstrings(SubProper)
[...  ('something', None)]

This works fine for instances but fails when we use it on the class object itself. It turns out that we need to handle instances differently to classes. For instances this code is perfectly correct: for entry in type(obj).__mro__. If we pass in an instance we need to check its class object and then all base classes. For class objects its type is its metaclass. This controls some of the behavior of the class but isn't where its members are defined. We need to go straight to the bases classes. This should do the trick:

def get_doc(obj, member):
    if hasattr(obj, '__dict__') and member in obj.__dict__:
        return obj.__dict__[member].__doc__
    if isinstance(obj, type):
        search_order = obj.__mro__
    else:
        search_order = type(obj).__mro__

    for entry in search_order:
        if member in entry.__dict__:
            return entry.__dict__[member].__doc__


>>> get_docstrings(SubProper)
[...  ('something', 'Here be the docstring')]

Hooray, let's see how it fares on a sub-class of an old style class:

class Base:
    def something(self):
        "Here be dragons"

class Sub(Base):
    pass

>>> get_docstrings(Sub)
[...  ('something', None)]

Ah, old style classes aren't instances of type. Instead they're instances of types.ClassType, painfully they don't have __mro__ which was introduced in Python 2.3 for new style classes. We need a different strategy for handling old style classes - and thankfully this is already implemented for us in inspect.getmro. For new style classes this already uses __mro__, so we can rewrite our code as:

import types
import inspect

def get_doc(obj, member):
    if hasattr(obj, '__dict__') and member in obj.__dict__:
        return obj.__dict__[member].__doc__
    if isinstance(obj, (type, types.ClassType)):
        search_order = inspect.getmro(obj)
    else:
        search_order = inspect.getmro(type(obj))

    for entry in search_order:
        if member in entry.__dict__:
            return entry.__dict__[member].__doc__


>>> get_docstrings(Sub)
[... ('something', 'Here be dragons')]
>>> get_docstrings(Sub())
[... ('something', None)]

Update

The code above was my final version, but a colleage pointed out a problem (bug). It's explained below.

As you can see, we have fixed the problem for old style classes - but still have a problem with instances of old style classes. The problem is that you can't don't get the class of an old-style instance by calling type on it:

>>> class A:
...     pass
...
>>> a = A()
>>> type(a)
<type 'instance'>

So this line was broken for some objects search_order = inspect.getmro(type(obj)). Instead you can access the class of any arbitrary object through its __class__ member:

>>> a.__class__
<class __main__.A at 0x01D23780>

If we make this change, then finally we have code that works for all these cases. We have the additional burden in IronPython of a few 'magic' attributes that IronPython sticks on for us (like the ReferenceEquals method) which can't be found in any of the base classes. This code defaults to returning None if it fails to find a member, so it handles these gracefully.

Our full code for finding all docstrings for all members on an arbitrary object is 18 lines of code:

import types
import inspect

def get_doc(obj, member):
    if hasattr(obj, '__dict__') and member in obj.__dict__:
        return obj.__dict__[member].__doc__
    if isinstance(obj, (type, types.ClassType)):
        search_order = inspect.getmro(obj)
    else:
        search_order = inspect.getmro(obj.__class__)

    for entry in search_order:
        if member in entry.__dict__:
            return entry.__dict__[member].__doc__

def get_docstrings(obj):
    try:
        members = dir(obj)
    except Exception:
        members = []
    return [(member, get_doc(obj, member)) for member in members]

Even all this isn't bulletproof. A class with slots can define a __getattr__ method that returns something for every attribute access. This means that hasattr(obj, '__dict__') will return True and the code that follows may die. A class or instance can override __bases__ and lie about its base classes. It may even be possible to create a metaclass that uses __slots__ and then have classes (instances of the metaclass) without a dictionary (breaking our code that accesses Klass.__dict__). Oh well, they're obscure corner cases of obscure corner cases.

If you can think of any cases I've missed, or ways of doing this more elegantly, then let me know in the comments!


Movable Python Goes Open Source

emoticon:movpy2 Movable Python is a portable distribution of Python for Windows. It is capable of running from a USB stick and includes a mobile development environment. People have successfully used it with Zope, Django and various other large Python frameworks.

Movable Python started back in 2006, and for a long time I ran it as a commercial project with a small charge. Some time ago I decided to release Movable Python as open source after making a certain number more sales; with half the money from those sales going to the Python Software Foundation (PSF) and the One Laptop Per Child (OLPC) project. Last year in anticipation of reaching the target I made the donations, which was followed by Stephen Ferg making a very generous donation to finalise the process.

After a slight delay Movable Python finally went open source, and after another delay I'm actually announcing it! The source code for building distributions, and prebuilt distributions for Python 2.2-2.5 are available for download from:

The code for the Movable Python GUI needs a bit of love. Actually it needs throwing away and rewriting, but the core code is in pretty good shape with no outstanding bugs that I'm aware of. I do need to build a new distribution for Python 2.6 and see if building one for Python 3.0 is possible.

Another interesting part of the Movable Python project is Movable IDLE. This is an entirely self-contained version of IDLE, along with the whole Python standard library. This could be particularly useful for teaching environments. It also needs updating, with distributions for Python 2.5 & 2.6 building and the dialog that pre-dates Movable Python going open source can be removed.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2009-05-18 14:59:03 | |

Categories: , Tags: , ,


Monkey Patching, Static Methods and the Descriptor Protocol

emoticon:mobile In the Resolver One test framework we have a fairly elaborate mock system. It emulates a .NET API from a commercial vendor which we don't have installed on our test / development machines. Most of the methods on a core part of this API are static methods.

Jonathan and I have spent a good chunk of this morning tracking down an obscure failure. It's one of those wonderful failures; some test is passing but causing other tests later on to fail. Great fun.

Some of our tests work by monkey patching parts of an API to test their interaction with other parts. This is a fairly standard testing technique in Python, but where you are patching any global state (class and module members) you need to be very careful. Even in the event of test failure you must restore the original members after the test completes. Otherwise you get the kind of errors we were seeing.

The standard way of doing this is code that looks like this:

original = module.something
module.something = something_else
try:
    do_something()

    self.assertTrue(something_happened())
finally:
    module.something = original

This gets particularly painful if you patch several things and it is easy to get wrong. To solve exactly this problem, and remove the boilerplate, we use a patch function (which is also available in my Python mock module).

It reduces the snippet above to:

@patch('module.something', something_else)
def function():
    do_something()

function()
self.assertTrue(something_happened())

So as we are doing this, it was particularly odd that we were seeing this error. Eventually we tracked it down to the patching of a static method on our mock API. Let me illustrate the problem:

>>> class Test(object):
...   @staticmethod
...   def test():
...     print 'woozer'
...
>>> Test.test()
woozer
>>> original = Test.test
>>> original
<function test at 0x70df0>
>>> Test.test = original
>>> Test.test()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unbound method test() must be called with Test instance as first argument (got nothing instead)
>>>

Fetching the static method and then setting it directly back on the class breaks it! To understand why we need to understand how attributes are fetched in Python and a bit about the descriptor protocol.

When a method is created inside a class declaration it is a normal function stored inside the class dictionary. Python functions have a __get__ method (part of the descriptor protocol) which controls how they are fetched as attributes from classes and instances. When a method is fetched from an instance what you get is a bound method - effectively the original function bound to the instance which is passed in as the first argument (self). When a method is fetched from the class itself what you actually get is an unbound method (in Python 2.X only - unbound methods have gone in Python 3). This is often used when calling up to base classes in overridden methods:

class SubClass(BaseClass):
    def method(self):
        BaseClass.method(self)

Static methods are called without self as the first argument. Functions wrapped in staticmethod exhibit different behaviour when fetched from the class - the underlying function is returned directly. So in our monkeypatching scenario, when the static method is fetched we get a function. Attaching a function back to the class (when the original is restored) turns it into an unbound method next time it is fetched. It has lost its 'staticmethod'ness.

A solution is to do this instead:

>>> class Test(object):
...   @staticmethod
...   def test():
...     print 'fritch'
...
>>> Test.test()
fritch
>>> original = Test.__dict__['test']
>>> original
<staticmethod object at 0x74710>
>>> Test.test = original
>>> Test.test()
fritch

By accessing the class dictionary we get the static method descriptor which is safe to set back onto the class. There is slight drawback to this approach. Instances of classes that use __slots__ don't have a dictionary - so we either need to fallback to the normal attribute lookup in this case or treat patching classes differently from patching instances. Normally this isn't an issue because unless it is a singleton it isn't so important to unpatch instances.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2009-05-18 14:01:01 | |

Categories: , Tags: ,


HOWTO: Using the Wing Python IDE with IronPython

emoticon:firefox A common question amongst new IronPython users is Which IDE is best for IronPython?

One common suggestion, which .NET developers gravitate towards naturally, is IronPython Studio. This is an example of extending Visual Studio through the VSx shell, and in my opinion not really suitable for pain-free use. You can read some of my thoughts on it in this IronPython-URLs blog entry.

As IronPython code is just Python code any good Python IDE will do, however they rarely feature good integration with IronPython. Fortunately there are a large range of tools that can be plugged into any extensible editor.

My favourite IDE is the Wing IDE, not least because of it has the best autocomplete (intellisense) of any Python IDE I've used. It achieves this by statically analysing Python code and inferring the types. This doesn't work with .NET types because they don't have Python source code... This HOWTO shows 'how to' enable autocomplete for the .NET types in Wing, plus using the scripting API to add commands like executing the current file with IronPython:

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2009-05-17 17:13:56 | |

Categories: , , , , Tags: , ,


Updated: Embedding IronPython in C# Silverlight Applications Article

emoticon:pda IronPython has been designed to make it easy to embed in .NET applications. With code evaluated and executed at runtime it opens up all sorts of possibilities for user scripting of applications, storing rules as text that can be created and modified at runtime and so on. Hosting IronPython in Silverlight is slightly different from desktop .NET applications. Although the hosting API is the same there is some initial configuration that needs to be done.

I've updated my article on the topic to include the configuration needed for IronPython code to import from other Python files contained in the xap file, plus adding references to the standard .NET / Silverlight assemblies so that the hosted code doesn't need to manually call clr.AddReference. Many thanks for Jimmy Schementi for his assistance.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2009-05-17 16:47:29 | |

Categories: , , Tags: , ,


Hosted by Webfaction

Counter...