Python Programming, news on the Voidspace Python Projects and all things techie.

Namedtuples and sets of dictionaries

emoticon:mirrormask So, I recently had a use case for comparing lists of dictionaries without worrying about the order. One way of comparing collections of objects without worrying about the order is to use sets:

>>> a = 1, 2, 3
>>> b = 3, 2, 1
>>> set(a) == set(b)

Note that sets will also remove duplicates, which may or may not be what you want:

>>> a = 1, 1, 2
>>> b = 1, 2, 2
>>> set(a) == set(b)

Another way, which doesn't remove duplicates, is to sort the collections and compare them:

>>> a = 1, 1, 2
>>> b = 1, 2, 2
>>> sorted(a)
[1, 1, 2]
>>> sorted(b)
[1, 2, 2]
>>> sorted(a) == sorted(b)

Unfortunately dictionaries are unhashable, which means they can't be used as dictionary keys and they can't be put in sets. Only immutable objects (ones that don't change) can be hashed. In Python 3 dictionaries can't be ordered as it doesn't make sense to say that one dictionary is 'greater-than' or 'less-than' another dictionary. This means that lists of dictionaries can't be sorted:

>>> l = [{'a': 3, 'b': 2}, {'a': 1, 'b': 2}]
>>> set(l)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
>>> sorted(l)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: dict() < dict()

Sets are also mutable, but there is a little-used version of the set called the frozenset. The frozenset allows you to perform operations on sets of sets.

>>> a = {1, 2, 3}
>>> b = {4, 5, 6}
>>> set([frozenset(a), frozenset(b)])
{frozenset({1, 2, 3}), frozenset({4, 5, 6})}

Note the lovely new syntax for set literals in Python 3.

For my particular use case being able to compare sets of dictionaries would be really useful, so having some kind of frozendict would be nice. Writing a frozendict wouldn't be hard and lots of people have. However, there is an alternative solution built in to the standard library in the form of namedtuple. Named tuples are tuples (ordered immutable collections) with named fields. Names mapping to members sounds like a dictionary. Because they are sequences as well as mappings they can be sorted and compared. Named tuples are created via a factory function that takes a name and a list of keys.

You can create a named tuple instance from any dictionary, using the keys for the field names and using keyword unpacking to create the named tuple from the dictionary members:

>>> from collections import namedtuple
>>> my_dict = {'a': 1, 'b': 3}
>>> f = namedtuple('MyDict', my_dict.keys())(**my_dict)
>>> f
MyDict(a=1, b=3)

So long as your dictionaries have string keys, and all have the same keys, you can use namedtuple to create frozen (immutable) versions of dictionaries that can be compared, sorted and stored in sets:

>>> from collections import namedtuple
>>> a = {'a': 3, 'b': 2}
>>> b = {'a': 10, 'b': 5}
>>> MyDict = namedtuple('MyDict', a.keys())
>>> dicts = [MyDict(**a), MyDict(**b)]
>>> set(dicts)
{MyDict(a=3, b=2), MyDict(a=10, b=5)}
>>> sorted(dicts)
[MyDict(a=3, b=2), MyDict(a=10, b=5)]

Now, if we'd been using unittest2 we could have just used assertItemsEqual. Alas, we are using the Django TestCase. The good news is that it is looking good for unittest2 to be built in to Django 1.3.


As pointed out in the comments, an alternative and easier approach would be just to use sorted lists of dict.items(). This technique can still be useful where you need named access to the members as well as just comparing them.

Like this post? Digg it or it.

Posted by Fuzzyman on 2010-10-09 17:17:01 | |

Categories: , Tags:

Hosted by Webfaction