Mapping & Sequence Type Objects

Extracts from the Python Documentation

 

 

Introduction

Python has two generic container type objects, the mapping type and the sequence type. The archetypal examples of these are the basic datatypes the dictionary and the list.

Python uses a protocol to work with these datatypes. This protocols is based on certain methods that these objects have. The method names usually start and end with a double underscore - e.g. __getitem__ - to indicate that they are special. (Sometimes called magic methods, because their use is completely invisible to the programmer [1].)

This means that you can implement your own objects that behave like the standard container objects. This is useful because you can then use ordinary Python syntax to access the data contained in your object.

This is called duck typing. If an object walks like a dictionary, and quacks like dictionary (to coin a phrase), Python doesn't care whether it actually is a dictionary or not.

Note

This kind of duck typing isn't without problems. Sometimes you need to be able to tell the difference between types, and it's not always straightforward.

See A Rant about Duck Typing for a discussion on this, and a suggested workaround.

Python Reference

info

The documentation here is taken from the Python 2.4.2 Manual.

So sometimes you want to implement a container object, and you want to know what methods you need to implement.

One way is to examine the UserList and UserDict source code. These contain implementations of the magic methods - and you can either subclass these, or make judicious use of copy and paste. Cool

Looking up the references in the Python documentation can be difficult though, because the needed information is scattered across a few pages.

This page contains all the relevant pages, and so should be a useful reference for anyone implementing sequence or mapping objects.

The pages are taken from :

Note

You can just subclass the built-in types dict and list. Because their functionality (for all methods) is implemented in C, you'll end up needing to override all the methods anyway.

Sublassing is still useful, so that your object can pass isinstance tests for example.

3.3.5 Emulating container types

The following methods can be defined to implement container objects. Containers usually are sequences (such as lists or tuples) or mappings (like dictionaries), but can represent other containers as well. The first set of methods is used either to emulate a sequence or to emulate a mapping; the difference is that for a sequence, the allowable keys should be the integers k for which 0 <= k < N where N is the length of the sequence, or slice objects, which define a range of items. (For backwards compatibility, the method __getslice__() can also be defined to handle simple, but not extended slices.)

It is also recommended that mappings provide the methods keys(), values(), items(), has_key(), get(), clear(), setdefault(), iterkeys(), itervalues(), iteritems(), pop(), popitem(), copy(), and update() behaving similar to those for Python's standard dictionary objects.

The UserDict module provides a DictMixin class to help create those methods from a base set of __getitem__(), __setitem__(), __delitem__(), and keys().

Mutable sequences should provide methods append(), count(), index(), extend(), insert(), pop(), remove(), reverse() and sort(), like Python standard list objects.

Finally, sequence types should implement addition (meaning concatenation) and multiplication (meaning repetition) by defining the methods __add__(), __radd__(), __iadd__(), __mul__(), __rmul__() and __imul__() described below; they should not define __coerce__() or other numerical operators.

It is recommended that both mappings and sequences implement the __contains__() method to allow efficient use of the in operator; for mappings, in should be equivalent of has_key(); for sequences, it should search through the values.

It is further recommended that both mappings and sequences implement the __iter__() method to allow efficient iteration through the container; for mappings, __iter__() should be the same as iterkeys(); for sequences, it should iterate through the values.

__len__( self)

Called to implement the built-in function len(). Should return the length of the object, an integer >= 0. Also, an object that doesn't define a __nonzero__() method and whose __len__() method returns zero is considered to be false in a Boolean context.

__getitem__( self, key)

Called to implement evaluation of self[key]. For sequence types, the accepted keys should be integers and slice objects. Note that the special interpretation of negative indexes (if the class wishes to emulate a sequence type) is up to the __getitem__() method.

If key is of an inappropriate type, TypeError may be raised; if of a value outside the set of indexes for the sequence (after any special interpretation of negative values), IndexError should be raised.

For mapping types, if key is missing (not in the container), KeyError should be raised. Note: for loops expect that an IndexError will be raised for illegal indexes to allow proper detection of the end of the sequence.

__setitem__( self, key, value)

Called to implement assignment to self[key]. Same note as for __getitem__(). This should only be implemented for mappings if the objects support changes to the values for keys, or if new keys can be added, or for sequences if elements can be replaced. The same exceptions should be raised for improper key values as for the __getitem__() method.

__delitem__( self, key)

Called to implement deletion of self[key]. Same note as for __getitem__(). This should only be implemented for mappings if the objects support removal of keys, or for sequences if elements can be removed from the sequence. The same exceptions should be raised for improper key values as for the __getitem__() method.

__iter__( self)

This method is called when an iterator is required for a container. This method should return a new iterator object that can iterate over all the objects in the container. For mappings, it should iterate over the keys of the container, and should also be made available as the method iterkeys().

Iterator objects also need to implement this method; they are required to return themselves. For more information on iterator objects, see Iterator Types in the Python Library Reference.

__contains__( self, item)

Called to implement membership test operators. Should return True if item is in self, False otherwise. For mapping objects, this should consider the keys of the mapping rather than the values or the key-item pairs.

The membership test operators (in and not in) are normally implemented as an iteration through a sequence. However, container objects can supply this method with a more efficient implementation, which also does not require the object be a sequence.

2.3.8 Mapping Types

A mapping object maps immutable values to arbitrary objects. Mappings are mutable objects. There is currently only one standard mapping type, the dictionary. A dictionary's keys are almost arbitrary values. Only values containing lists, dictionaries or other mutable types (that are compared by value rather than by object identity) may not be used as keys. Numeric types used for keys obey the normal rules for numeric comparison: if two numbers compare equal (such as 1 and 1.0) then they can be used interchangeably to index the same dictionary entry.

Dictionaries are created by placing a comma-separated list of key: value pairs within braces, for example:

{'jack': 4098, 'sjoerd': 4127} or {4098: 'jack', 4127: 'sjoerd'}

Mapping Type Operations

The following operations are defined on mappings (where a and b are mappings, k is a key, and v and x are arbitrary objects):

Operation     Result            Notes
=========     ======            =====

len(a)        the number of items in a
a[k]          the item of a with key k (1)
a[k] = v      set a[k] to v
del a[k]      remove a[k] from a (1)
a.clear()     remove all items from a
a.copy()      a (shallow) copy of a
a.has_key(k)  True if a has a key k, else False
k in a        Equivalent to a.has_key(k) (2)
k not in a    Equivalent to not a.has_key(k) (2)
a.items()     a copy of a's list of (key, value)
              pairs             (3)
a.keys()      a copy of a's list of keys (3)
a.update([b]) updates (and overwrites) key/value
              pairs from b      (9)
a.fromkeys(seq[, value]) Creates a new dictionary
              with keys from seq and values set
              to value          (7)
a.values()    a copy of a's list of values (3)
a.get(k[, x]) a[k] if k in a, else x (4)
a.setdefault(k[, x]) a[k] if k in a, else x
              (also setting it) (5)
a.pop(k[, x]) a[k] if k in a, else x
              (and remove k)    (8)
a.popitem()   remove and return an arbitrary
              (key, value) pair (6)
a.iteritems() return an iterator over
              (key, value) pairs (2), (3)
a.iterkeys()  return an iterator over the mapping's
              keys              (2), (3)
a.itervalues() return an iterator over the mapping's
              values            (2), (3)

Notes

  1. Raises a KeyError exception if k is not in the map.
  2. New in version 2.2.
  3. Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary's history of insertions and deletions. If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond. This allows the creation of (value, key) pairs using zip(): "pairs = zip(a.values(), a.keys())". The same relationship holds for the iterkeys() and itervalues() methods: "pairs = zip(a.itervalues(), a.iterkeys())" provides the same value for pairs. Another way to create the same list is "pairs = [(v, k) for (k, v) in a.iteritems()]".
  4. Never raises an exception if k is not in the map, instead it returns x. x is optional; when x is not provided and k is not in the map, None is returned.
  5. setdefault() is like get(), except that if k is missing, x is both returned and inserted into the dictionary as the value of k. x defaults to None.
  6. popitem() is useful to destructively iterate over a dictionary, as often used in set algorithms. If the dictionary is empty, calling popitem() raises a KeyError.
  7. fromkeys() is a class method that returns a new dictionary. value defaults to None. New in version 2.3.
  8. pop() raises a KeyError when no default value is given and the key is not found. New in version 2.3.
  9. update() accepts either another mapping object or an iterable of key/value pairs (as a tuple or other iterable of length two). If keyword arguments are specified, the mapping is then is updated with those key/value pairs: "d.update(red=1, blue=2)". Changed in version 2.4: Allowed the argument to be an iterable of key/value pairs and allowed keyword arguments.

2.3.6 Sequence Types

There are six sequence types: strings, Unicode strings, lists, tuples, buffers, and xrange objects.

String literals are written in single or double quotes: 'xyzzy', "frobozz". See chapter 2 of the Python Reference Manual for more about string literals.

Unicode strings are much like strings, but are specified in the syntax using a preceding "u" character: u'abc', u"def".

Lists are constructed with square brackets, separating items with commas: [a, b, c].

Tuples are constructed by the comma operator (not within square brackets), with or without enclosing parentheses, but an empty tuple must have the enclosing parentheses, such as a, b, c or (). A single item tuple must have a trailing comma, such as (d,).

Buffer objects are not directly supported by Python syntax, but can be created by calling the builtin function buffer(). They don't support concatenation or repetition.

Xrange objects are similar to buffers in that there is no specific syntax to create them, but they are created using the xrange() function. They don't support slicing, concatenation or repetition, and using in, not in, min() or max() on them is inefficient.

Most sequence types support the following operations. The "in" and "not in" operations have the same priorities as the comparison operations. The "+" and "*" operations have the same priority as the corresponding numeric operations. [2]

Sequence Operations

This table lists the sequence operations sorted in ascending priority (operations in the same box have the same priority). In the table, s and t are sequences of the same type; n, i and j are integers:

Operation     Result            Notes
=========     ======            =====

x in s        True if an item of s is equal to x,
              else False        (1)
x not in s    False if an item of s is equal to x,
              else True         (1)
s + t         the concatenation of s and t (6)
s * n , n * s n shallow copies of s concatenated (2)
s[i]          i'th item of s, origin 0 (3)
s[i:j]        slice of s from i to j (3), (4)
s[i:j:k]      slice of s from i to j with step k (3), (5)
len(s)        length of s
min(s)        smallest item of s
max(s)        largest item of s

Notes

  1. When s is a string or Unicode string object the in and not in operations act like a substring test. In Python versions before 2.3, x had to be a string of length 1. In Python 2.3 and beyond, x may be a string of any length.

  2. Values of n less than 0 are treated as 0 (which yields an empty sequence of the same type as s). Note also that the copies are shallow; nested structures are not copied. This often haunts new Python programmers; consider:

    >>> lists = [[]] * 3
    >>> lists
    [[], [], []]
    >>> lists[0].append(3)
    >>> lists
    [[3], [3], [3]]
    

    What has happened is that [[]] is a one-element list containing an empty list, so all three elements of [[]] * 3 are (pointers to) this single empty list. Modifying any of the elements of lists modifies this single list. You can create a list of different lists this way:

    >>> lists = [[] for i in range(3)]
    >>> lists[0].append(3)
    >>> lists[1].append(5)
    >>> lists[2].append(7)
    >>> lists
    [[3], [5], [7]]
    
  3. If i or j is negative, the index is relative to the end of the string: len(s) + i or len(s) + j is substituted. But note that -0 is still 0.

  4. The slice of s from i to j is defined as the sequence of items with index k such that i <= k < j. If i or j is greater than len(s), use len(s). If i is omitted, use 0. If j is omitted, use len(s). If i is greater than or equal to j, the slice is empty.

  5. The slice of s from i to j with step k is defined as the sequence of items with index x = i + n*k such that 0 <= n <= ((j-i)/k). In other words, the indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached (but never including j). If i or j is greater than len(s), use len(s). If i or j are omitted then they become end values (which end depends on the sign of k). Note, k cannot be zero.

  6. If s and t are both strings, some Python implementations such as CPython can usually perform an in-place optimization for assignments of the form s=s+t or s+=t. When applicable, this optimization makes quadratic run-time much less likely. This optimization is both version and implementation dependent. For performance sensitive code, it is preferable to use the str.join() method which assures consistent linear concatenation performance across versions and implementations. Changed in version 2.4: Formerly, string concatenation never occurred in-place.

Note On Comparison

Not covered here are the comparison operators. For new objects it is less confusing (but more work) to implement all the rich comparison methods and not implement __cmp__. It is then much more straightforward to know which method will be used in a comparison.

The rich comparison methods are :

__lt__( self, other)  ( a <  other )
__le__( self, other)  ( a <= other )
__eq__( self, other)  ( a == other )
__ne__( self, other)  ( a != other )
__gt__( self, other)  ( a >  other )
__ge__( self, other)  ( a >= other )

Footnotes

[1]Unless called explicitly, which you sometimes do in subclasses.
[2]They must have since the parser can't tell the type of the operands.

For buying techie books, science fiction, computer hardware or the latest gadgets: visit The Voidspace Amazon Store.

Hosted by Webfaction

Return to Top

Page rendered with rest2web the Site Builder

Last edited Tue Aug 2 00:51:34 2011.

Counter...