Python Programming, news on the Voidspace Python Projects and all things techie.

itemgetter and attrgetter

emoticon:ghostradio I've just discovered the attrgetter and itemgetter functions from the Python standard library operator module (both functions new in Python 2.4 with added functionality in Python 2.5). I wish I'd discovered them earlier as although they do very simple jobs they can make code cleaner and more readable.

Both of them return functions that will fetch a specified item or attribute from objects.

One place I could make use of attrgetter is in property definitions. It is a fairly common pattern to have a property with custom behaviour when set, but that merely returns an underlying instance attribute when fetched.

As an example:

class Model(object):
    def __init_(self):
        self._document = None

    def _set_document(self, document):
        self._document = document
        # do other stuff ...

    document = property(lambda self: self._document, _set_document)

The 'document' setter method is _set_document, but the getter is merely a lambda that returns self._document.

We can improve this by using attrgetter instead:

from operator import attrgetter

...

document = property(attrgetter('_document'), _set_document)

attrgetter is called with a string and returns a function that fetches that attribute. Anything that helps eliminate lambdas has to be good right? Wink

It is roughly the equivalent of:

def attrgetter(attribute):
    def getter(thing):
        return getattr(thing, attribute)
    return getter

In Python 2.5 you can call attrgetter with multiple attributes and the getter it returns will fetch you a tuple of all the attributes. As an added bonus, if you are using CPython, it is nice and fast.

itemgetter is very similar, but instead of fetching attributes it fetches items from sequences or mappings. One place this comes in handy is as a key function when sorting lists.

A common pattern when needing a custom sort order for lists is 'decorate-sort-undecorate' (also known as the Schwartzian Transform from its Perl origin). This involves transforming the list (decorate) into one that can be sorted using the built-in sorted (which returns a new list) or using the list sort method (which sorts in place). You then transform your newly sorted list back (the undecorate).

This pattern is now built-in to Python. Both sorted and sort take a key function to transform each item. The list is sorted on the transformed items, saving you the effort of having to do it yourself. As an example suppose we have a list of tuples like (first_name, last_name), and we want to sort on last name.

We can achieve this by passing in a key function to sort that returns the second item of the tuple:

items = [('Michael', 'Foord'), ('Menno', 'Smits'), ('Christian', 'Muirhead'), ('Jonathan', 'Hartley')]
sorted_items = sorted(items, key=lambda item: item[1])

I'm sure you can see what's coming. Very Happy

We can use itemgetter to eliminate the lambda:

from operator import itemgetter

...

sorted_items = sorted(items, key=itemgetter(1))

itemgetter is roughly the equivalent of:

def itemgetter(item):
    def getter(thing):
        return thing[item]
    return getter

As with attrgetter it is nice and fast and the Python 2.5 version can take multiple items and the getter will then return you a tuple. It doesn't just work with sequences, but can also be used with dictionaries.

I actually discovered this when Christian was playing with Raymond Hettinger's Named Tuple Recipe (one of awesomenesses coming in Python 2.6) on IronPython. Its use of itemgetter triggered a very obscure bug in IronPython 2 Beta 4 that has thankfully gone away with the Beta 5 release. Named tuples really are awesome, and we will start using them in Resolver One.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2008-09-24 12:54:48 | |

Categories: , Tags: ,


Python: Two Phase Object Creation

emoticon:boxing_gloves I've started watching a talk by Alex Martelli on Python and design patterns:

It's very interesting. Alex points out that design patterns can't be taken out of the context of the language (the technology) in which they are being used. For example, the iterator pattern is now built-in to most high-level languages. With first class types and functions, most of the object creational patterns are effectively built-in to Python (class and function factories are trivially easy to write).

I'm only half way through the video, but I liked his explanation of object creation in Python. Python has two different 'magic-methods' (protocol methods) for object construction [1], and it is probably not until you have programming in Python for a while (become a 'journeyman') that you need to understand this.

The method usually described as the 'constructor' is __init__. If you define a class with a 'dunder-init' [2] method, then it is called when a new object is created [3]:

class SomeClass(object):
    def __init__(self, *args, **keywargs)
        ...

When you create an instance - SomeClass(*args, **keywargs) - the arguments you use in the call are passed to the __init__ method.

Notice that __init__ is an instance method - it receives the instance self as the first argument. This means that it is really an object initialiser; the instance has already been created when it is called.

The method responsible for creating objects is a class method called __new__. This receives the class as the first argument and should return an instance:

class SomeClass(object):
    def __new__(cls, *args, **keywargs)
        ...
        return object.__new__(cls)

dunder-new can actually return anything; there is no restriction requiring it to be an instance of the class. It can return an instance of a sub-class picked by the arguments passed in, it can return None or a pre-created instance (useful for the Singleton pattern).

If dunder-new does return an instance of the class then dunder-init is called with the same arguments that were passed to dunder-new - the arguments used in the call to construct the object. The vast majority of classes you write won't need a custom implementation of dunder-new.

So when might you want to use __new__? We've already mentioned using your class as a factory that can return an instance of a subclass. Another reason is for creating immutable objects. Because dunder-init is an instance method it can be called again on an instance. If object state is setup in dunder-init then calling it again with new arguments could change the state of the object:

instance = SomeClass(*args, **keywargs)

instance.__init__(*newargs, **newkeywargs)

If you setup the object state in dunder-new instead then calling it again creates a new instance rather than mutating the instance. You'll need to do this if you sub-class the built-in immutable types like strings, numbers or tuples. Even if you only want to write a new dunder-init method that takes new arguments, you will still need to override dunder-new. Any arguments passed to object creation will also get passed to dunder-new; and the default one will barf on extra arguments.

class MyInt(int):

    def __new__(cls, value, arg):
        # ignore 'arg'
        return int.__new__(cls, value)

    def __init__(self, value, arg):
        self.arg = arg

Creating instances from classes is actually done by the metaclass (yes - they are in the language for a reason other than making things complicated).

You can make objects callable in Python by defining a __call__ method. If a class defines dunder-call, then instances of the class are callable. In fact functions and methods in Python are just examples of callable objects.

Object creation is done by 'calling classes'. So if calling an object results in a call to dunder-call defined on their class, what happens when you call a class?

The same thing... the class of a class is its metaclass - and when you call a class, __call__ on the metaclass is called. For classes that inherit from object (new-style classes), the default metaclass is type. The two phase object construction [4] we have been discussing is implemented in type.__call__, and it is roughly equivalent to:

def __call__(cls, *args, **keywargs):
    instance = cls.__new__(cls, *args, **keywargs)
    if isinstance(instance, cls):
        cls.__init__(instance, *args, **keywargs)
    return instance

So you can customize what happens when a class is 'called' by implementing a custom metaclass that overrides __call__.

[1]I'm typing here whilst watching the latest series of 'The Shield'. I just mistyped 'object construction' as 'obstruction'...
[2]'dunder' being shorthand for 'double-underscore'.
[3]These methods are documented in Basic Customization.
[4]Ruby has effectively the same two-phase object creation: Class.new maps to Class.allocate and Class.init.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2008-09-21 13:21:57 | |

Categories: , Tags: , ,


Hosted by Webfaction

Counter...