Duck Typing in Python

What Type is My Data ?

 

 

Note

If you don't care for my introductory ramble on typing, head straight for problem.

Dynamic Typing

Caution!

A lot of what of written here has some exceptions - but it's basically true. It could be hedged around with all sorts of caveats - but as far as I can see none of it is actually going to mislead you.

Python is dynamically but strongly typed. The fact that the two halves of that statement fit together can confuse those who come from a static language type background. In Python it is perfectly legal to do this :

variable = 3
variable = 'hello'

So hasn't variable just changed type ? The answer is a resounding no. variable isn't an object at all - it's a name. In the first statement we create an integer object with the value 3 and bind the name 'variable' to it. In the second statement we create a new string object with the value hello, and rebind the name 'variable' to it. If there are no other names bound to the first object (or more properly no references to the object - not all references are names) then it's reference count will drop to zero and it will be garbage collected.

This is all fairly basic Python stuff - but it can take the newbie a while to get his (or hopefully her) head around. We say Python is dynamically typed because we pass around references and don't check the type until the last possible minute [1]. We say it is strongly typed because objects don't change type.

Duck typing

There is another concept in this typing lark that is a feature of dynamic languages. This is duck typing. The idea is that it doesn't actually matter what type my data is - just whether or not I can do what I want with it.

For example in a statically typed language we have a concept of adding. Some types of object can be added - usually only to objects of the same type. (Although most languages will let you add an integer to a floating point number - resulting in a floating point number). Try to add different types of objects together and the compiler will tell you that you're not allowed to [2].

In Python we allow the object to define what it means to be added. The expression 3 + 3 is syntactic sugar for calling the __add__ method of the integer type. It's the same as calling int.__add__(3, 3). This means that if you define an __add__ method for one of your classes you can make all sorts of things happen when you add instances of them together Cool [3].

Much of Python syntax is sugar for underlying methods. Especially data access. Accessing members of both sequence type objects and mapping type objects is done by using the __getitem__ method of these objects.

a = [0,1 2, 3]
print a[0]
0
b = {'a': 0, 'b': 1}
print b['a']
0

is exactly the same as :

a =  [0,1 2, 3]
print list.__getitem__(a, 0)
0
b = {'a': 0, 'b': 1}
print dict._getitem__(b, 'a')
0

In the first example we use normal Python syntax. In the second example we do what the first example is doing under the hood. In order to set members we would use the __setitem__ method instead of __getitem__. There are lots more examples of syntactic sugar - including comparing objects and accessing attributes. Beyond this we start getting into the realms of descriptors and meta programming. I haven't yet climbed these lofty heights - but playing with properties seems to be coming alarmingly close.

Duck typing happens because when we do a['member'] Python doesn't care what type object a is. All it cares is whether the call to it's __getitem__ method returns anything sensible. If not - an error will be raised. Something like TypeError: Unsubscriptable object..

This means you can create your own classes that have their own internal data structures - but are accessed using normal Python syntax. This is awfully convenient.

For example in my module ConfigObj [4] we read config files. These are values where each value has a name - just like a dictionary (a mapping type object). So you can do :

config = ConfigObj(filename)
value = config['member 1']
value2 = config['member 2']

and so on... no need for horrible getter and setter methods.

It also makes it easy to write things like ordered dictionaries that keep keys in insertion order and all sorts of other things.

Problem

The principle of duck typing says that you shouldn't care what type of object you have - just whether or not you can do the required action with your object. For this reason the isinstance keyword is frowned upon.

isinstance(object, dict) returns True if object is a dictionary - or an instance of a subclass of dict. What this usually means is that you want to perform some action only appropriate to a mapping type object. isinstance returns False if the object isn't a subclass - even if it supports the mapping type protocol you really want to test for.

Instead of :

if isinstance(object, dict):
    value = object[member]

it is considered more pythonic to do :

try:
     value = object[member]
except TypeError:
    # do something else

This means that anyone else using your code doesn't have to use a real dictionary or subclass (by the way - don't use the name object in your code, this is an example !) - they can use any object that implements the mapping interface.

Unfortunately in practise it's not that simple. What if member in the above example might be an integer ? Integers are immutable - so it's perfectly reasonable to use them as dictionary keys. However they are also used to index sequence type objects. If member happens to be an integer then example two could let through lists and strings as well as dictionaries.

If we want our code to treat different types of object differently then the approach in example two fails. This isn't contrived - this is exactly the situation we found ourselves in with ConfigObj. If you are setting a new member - passing in a dictionary creates a new new section. This has to be handled differently to just setting a value (which could be a string, a list, a boolean, or whatever).

It gets worse if you need to tell the difference between a string and a list (which we also needed to do). They're both sequence types - so any way of accessing a list member is a valid way of indexing a string.

You can tell the difference by the method signatures. For example a dictionary has a keys method which lists don't have. Strings have all sorts of methods that lists don't have (e.g. lower).

Our example above could become :

if hasattr(object, 'keys'):
     value = object[member]

This is still arbitrary though. It is perfectly possible to create a dictionary like object that doesn't have the keys method. What's a guy to do if he wants to detect dictionary like objects ?

The problem is exemplified in the operator module. This module has two functions IsMappingType and IsSequenceType. The theory is that these functions will return True if an object you pass them is of the requisite type. For the built in types this works fine. However the mapping type and sequence type interfaces are so poorly defined (i.e. not defined at all) that both functions return True for any user defined class [5]. This makes them utterly useless for detecting mapping type and sequence type objects.

So the Python mapping type and sequence type 'interfaces' are so vague that we can't really use duck typing at all Sad .

Subclassing the Built In Types

There's a good principle which says don't use the word new in names. It gets old pretty quickly.

Ever since Python 2.2 we've had new style classes. This removed the old type/class dichotomy (good word hey) and allowed us to subclass the built in types. Any object that has object as it's ultimate base class is considered a new style class.

This means I can now create my own class of objects that are a subclass of dictionary.

My assertion is that (currently) the only sensible way of telling if an object will behave sensibly as a mapping type object or a sequence type object is by isinstance tests. That means that if an object inherits from dict you can assume it's safe to treat it as a dictionary like object (and so on).

There is an argument (isinstance considered harmful) that says you can't rely on a subclass to properly implement the interface of the parent class. This is true enough - Python won't stop you writing broken code Razz .

In ConfigObj we were asked to allow the passing in of 'dictionary like objects' [6] to create new sections. This would have meant removing the isinstance tests and replacing them with something else.

The closest we could come up with was by relying on method signature. This meant defining our own, arbitrary, rules about what methods strings, other sequences (lists or tuples), and dictionary like objects should have. We gave up in disgust and decided that dictionary like objects should inherit from dict and sequences should inherit from the appropriate base class.

The alternative is for the Python community to define what interfaces a objects should have.

I would suggest something like :

  • string like object should inherit from str Razz
  • Sequence objects should implement __getitem__ as a minimum
  • Dictionary like objects should implement __getitem__ and keys as a minimum

This would mean that we could actually know what we're talking about when we say a dictionary like object. It also means that IsMappingType and IsSequenceType can finally do something useful after all these years [7]...

New IsType Functions

So we define a MappingType objects as one that has __getitem__ and a keys method.

We define a SequenceType object as one that has __getitem__ and doesn't have a keys method.

Below are two functions that implement a basic test for this protocol. The functions return True if the object you pass them appears to support the protocol.

Both functions also take an optional argument require_set. If this is True then the object must also have a __setitem__ method.

def IsMappingType(obj, require_set=False):
    """
    Returns ``True`` if an object appears to
    support the ``MappingType`` protocol.

    If ``require_set`` is ``True`` then the
    object must support ``__setitem__`` as
    well as ``__getitem__`` and ``keys``.
    """

    if require_set and not hasattr(obj, '__setitem__'):
        return False
    if hasattr(obj, 'keys') and hasattr(obj, '__getitem__'):
        return True
    else:
        return False

def IsSequenceType(obj, require_set=False):
    """
    Returns ``True`` if an object appears to
    support the ``SequenceType`` protocol.

    If ``require_set`` is ``True`` then the
    object must support ``__setitem__`` as
    well as ``__getitem__``.

    The object must not have a ``keys``
    method.
    """

    if require_set and not hasattr(obj, '__setitem__'):
        return False
    if (not hasattr(obj, 'keys')) and hasattr(obj, '__getitem__'):
        return True
    else:
        return False

Note

A string is a sequence, that means that IsSequenceType(a_string) returns True.

You can't assign to members of a string though (they're immutable). This means that IsSequenceType(a_string, require_set=True) returns False.

I suggest that if you need to differentiate between strings and other sequences (without using require_set), it's reasonable to test for string like objects using isinstance(obj, str).

Since Python 2.2, it doesn't really make sense to implement a string like object that isn't a subclass of str.


Footnotes

[1]This is as good an explanation as I can come up with. Very Happy
[2]This is the sort of errors that static heads are ever so proud of not being allowed to make.
[3]The path module has a good example of this. It allows you to add paths together and automatically inserts the correct separator between the parts.
[4]Written in conjunction with Nicola Larosa.
[5]Possibly only true for old style classes I haven't bothered to check this.
[6]Which didn't inherit from dict.
[7]It's possible that the proposed 'interfaces' could address this issue. My (limited) understanding is that you use interfaces with adapters. That means that for simple cases like these it would be well over the top to use interfaces to solve the problem.

For buying techie books, science fiction, computer hardware or the latest gadgets: visit The Voidspace Amazon Store.

Hosted by Webfaction

Return to Top

Page rendered with rest2web the Site Builder

Last edited Tue Aug 2 00:51:34 2011.

Counter...