Data Persistence With ConfigObj

 

 

Introduction

Note

Since the introduction of the unrepr mode to ConfigObj, there is now a better way of doing data-persistence with ConfigObj.

The techniques and code discussed in this article are still useful for automatically creating a configspec. This beats creating them by hand. Smile

ConfigObj is a pure python module for the easy reading and writing of application configuration data. It uses an ini file like syntax - similar to the ConfigParser module - but with much greater power.

ConfigObj can store nested sections. A section maps members (values) to names. This is bascially what the Python dictionary object does, and so we use the dictionary to represent a section.

Every value can be a single value or a list. Individual values are stored as strings - but using the validate module they can be transparently translated to and from floats, booleans or integers [1].

This means that the ConfigObj can naturally represent Python data structures comprised of dictionaries [2], lists, strings, floats, booleans and integers. This is most of the basic datatypes.

This article discusses using ConfigObj for the common programmer's task of data-persistence - the storing and retrieving of data structures based on the Python dictionary. Along the way we evolve a set of tools (with a high level interface) to do this.

Hint

You can see the final results of this article as the module ConfigPersist.py.

The Problem

There are some restrictions though. ConfigObj can't just be used to represent arbitrary data structures - even if all the members are allowed types.

  • Although dictionaries can be nested, they can't be inside lists.
  • Lists also can't be nested inside each other [3].
  • Values other than strings need a schema (a configspec) to convert them back into the right type.
  • Dictionary keys must be strings.
  • It is actually impossible to store a string containing single triple quotes (''') and double triple quotes (""").
  • List members cannot contain carriage returns. (Single line values only). [4]

ConfigObj isn't a data persistence module - this list of restrictions tells you that much. However if you examine the typical data structures used in your programs you may find that these restrictions aren't a problem for many of them.

Why Not Pickle ?

Why would we want to do this ? Well, the usual method for preserving data structures is the Python pickle module. This can store and retrieve a much wider range of objects - with none of the restrictions above.

However :

  • Pickles aren't human readable or writeable. This makes ConfigObj ideal for debugging, or where you want to manually modify the data.
  • Pickles are unsafe - a maliciously crafted pickle can cause arbitrary code execution.
  • ConfigObj is slightly easier to use - data = ConfigObj(filename) and data.write().

Of these, the first two reasons are the most compelling.

So we've looked at the sort of data that ConfigObj can and can't store.

We still have a big problem. ConfigObj is designed for storing strings - this means that our data will have been converted to strings when we read it back in.

The configspec

If you know the datatype of each member then you can write a configspec. If you pass this into the ConfigObj when you read the config file [5] then you can call the validate method. This uses the configspec to transform the values into the expected data types. It will even transform each member of list values into the right type.

Note

In fact the configspec does more than just specify the type of each member. It can be used to specify the bounds or parameter of each value.

So if your data structure is always going to have members of the same type (but possibly different values) you could write a configspec for it.

That sounds like hard work though Smile . Let's write a function that will automatically generate a configspec for a ConfigObj.

Note

If all your values are strings, you don't need to use a configspec. Lists will automatically be converted into lists of strings without needing validation.

Creating a configspec

A configspec is a dictionary of checks for a section. In the first step we'll walk a ConfigObj and create a configspec for it.

The types we'll check for are strings, booleans, integers, and floats. We'll also check for lists of these types. the check is done using an isinstance test - so subclasses are allowed (but won't be recreated when read from the file).

This function modifies a ConfigObj inplace - so it doesn't return anything. It will overwrite any existing configspec.

def add_configspec(config):
    """
    A function that adds a configspec to a ConfigObj.

    Will only work for ConfigObj instances using basic datatypes :

        * floats
        * strings
        * ints
        * booleans
        * Lists of the above
    """

    config.configspec = {}
    for entry in config:
        val = config[entry]
        if isinstance(val, dict):
            # a subsection
            add_configspec(val)
        elif isinstance(val, bool):
            config.configspec[entry] = 'boolean'
        elif isinstance(val, int):
            config.configspec[entry] = 'integer'
        elif isinstance(val, float):
            config.configspec[entry] = 'float'
        elif isinstance(val, str):
            config.configspec[entry] = 'string'
        elif isinstance(val, (list, tuple)):
            list_type = None
            out_list = []
            for mem in val:
                if isinstance(mem, str):
                    this = 'string'
                elif isinstance(mem, bool):
                    this = 'boolean'
                elif isinstance(mem, int):
                    this = 'integer'
                elif isinstance(mem, float):
                    this = 'float'
                else:
                    raise TypeError('List member  "%s" is an innapropriate type.' % mem)
                if list_type and this != list_type:
                    list_type = 'mixed'
                elif list_type is None:
                    list_type = this
                out_list.append(this)
            if list_type is None:
                l = 'list(%s)'
            else:
                list_type = {'integer': 'int', 'boolean': 'bool',
                             'mixed': 'mixed', 'float': 'float',
                            'string': 'string' }[list_type]
                l = '%s_list(%%s)' % list_type
            config.configspec[entry] = l % str(out_list)[1:-1]
        #
        else:
            raise TypeError('Value "%s" is an innapropriate type.' % val)

Having created a configspec you should then be able to call validate and have it return True :

from configobj import ConfigObj
from validate import Validator

vtor = Validator()
config = ConfigObj(filename)

add_configspec(config)
assert config.validate(vtor) == True

Next thing to do is to retrieve the configspec as a list of lines. For this we'll need a new function. This function assumes you have already called add_configspec.

def write_configspec(config):
    """Return the configspec (of a ConfigObj) as a list of lines."""
    out = []
    for entry in config:
        val = config[entry]
        if isinstance(val, dict):
            # a subsection
            m = config.main._write_marker('', val.depth, entry, '')
            out.append(m)
            out += write_configspec(val)
        else:
            name = config.main._quote(entry, multiline=False)
            out.append("%s = %s" % (name, config.configspec[entry]))
    #
    return out

This function now returns a configspec that we can use to validate a ConfigObj. It will also restore the type of any non-string values.

config = ConfigObj()
# set some non string values
config['member 1'] = 3
config['member 2'] = 3.0
config['member 3'] = True
config['member 4'] = [3, 3.0, True]

add_configspec(config)
configspec = write_configspec(config)

# lets create a copy of the original config
# and validate it with the configspec we made
b = ConfigObj(config.write(), configspec=configspec)
assert b.validate(vtor) == True
assert b == config

The Next Step

Great - so we now have a way of storing data structures and restoring the values with the correct type. The only problem is that we have to store the type information separately from the actual data - what a nuisance.

Wouldn't it be funky if we could store the type info in the data structure. Obviously we'd want to read and write this transparently.

Saving it is easy. We create a new subsection in each section called __types__. This contains a dictionary with a copy of the configspec in it. When we call the write method this will automatically get saved out for us Cool .

def add_typeinfo(config):
    """
    Turns the configspec attribute of each section into a member of the
    section. (Called ``__types__``).

    You must have already called ``add_configspec`` on the ConfigObj.
    """

    for entry in config.sections:
        add_typeinfo(config[entry])
    config['__types__'] = config.configspec

That looks like it should work. What about reading this back in ? We'll need to do the opposite of course.

def typeinfo_to_configspec(config):
    """Turns the '__types__' member of each section into a configspec."""
    for entry in config.sections:
        if entry == '__types__':
            continue
        typeinfo_to_configspec(config[entry])
    config.configspec = config['__types__']
    del config['__types__']

Putting this together avoids the need for the write_configspec stage :

config = ConfigObj()
# set some non string values
config['member 1'] = 3
config['member 2'] = 3.0
config['member 3'] = True
config['member 4'] = [3, 3.0, True]

# create a copy to test
# because add_typinfo modifies the config
orig = ConfigObj(config)

add_configspec(config)
add_typeinfo(config)

config.filename = 'test.ini'
config.write()

b = ConfigObj('test.ini')
typeinfo_to_configspec(b)
assert b.validate(vtor) == True
assert b == orig

So now we have two ways of saving and restoring data structures. Both of them involve calling add_configspec(config) first. Then we can either call write_configspec(config) or add_typeinfo(config).

write_configspec is useful where we repeatedly work with similar data structures. If they all have the same structure (or type signature) then a single configspec will work repeatedly. In this case it makes sense to store it separately.

If we just want to store and restore an arbitrary data structure (within our limitations) then we should use add_typeinfo and typeinfo_to_configspec.

In both cases the intermediate file that is saved is simple enough to be edited by hand.

Final Step

We can see in the examples above that our conversions are done with simple two step processes. Like all programmers I am lazy and would prefer this to be a simple one step process.

Lets create a couple of convenience functions that do them in a single step :

try:
    from validate import Validator
except ImportError:
    vtor = None
else:
    vtor = Validator()

def store(config):
    """"
    Passed a ConfigObj instance add type info and save.

    Returns the result of calling ``config.write()``.
    """

    add_configspec(config)
    add_typeinfo(config)
    return config.write()

def restore(stored):
    """
    Restore a ConfigObj saved using the ``store`` function.

    Takes a filename or list of lines, returns the ConfigObj instance.

    Uses the built-in Validator instance of this module (vtor).

    Raises an ImportError if the validate module isn't available
    """

    config = ConfigObj(stored)
    if vtor is None:
        raise ImportError('Failed to import the validate module.')
    typeinfo_to_configspec(config)
    config.validate(vtor)
    return config

def save_configspec(config):
    """Creates a configspec and returns it as a list of lines."""
    add_configspec(config)
    return write_configspec(config)

These functions are all designed to work with dictionary like data structures containing the basic data types. You can initialise a ConfigObj instance from the dictionary (config = ConfigObj(dict)) - so it's a very easy to use technique.

If you wanted to extend this system to work with additional data types it wouldn't be hard. You would need to add functions to your Validator instance (see the validate docs) and also amend the functions here to handle the extra types.

You can download these functions as a module called ConfigPersist.py.

Note

Perhaps the most serious restriction is that dictionary keys must be strings. It would be possible to walk a dictionary converting all keys to strings - where the string contains type info. (e.g. 3 becomes '3:int', 3.0 becomes '3.0:float', '3' to '3:str' etc).

You would need another function that walks the ConfigObj and rebuilds the dictionary, restoring the type of the keys.

I leave this as an exercise to the reader - not least because it reduces the readability of the saved config file.


Footnotes

[1]Or in fact any arbitrary datatype if you'll write the conversion function. See the validate module for details of how to extend it.
[2]So long as the basic structure is based on the dictionary.
[3]We could remove this restriction by using the listquote module to parse ConfigObj values. The LineParser can parse a string for nested lists. You should read the config file with list_values=False and use the walk method to transform values into lists.
[4]List members can also not contain both types of quote. We could remove these last two restrictions using the quote_unescape function from listquote - it's a bit ungainly though. Note however that the walk method of ConfigObj is ideal for transforming values in this way. It will recursively walk the values and apply a function to them all.
[5]The config data can come from a file (specified by filename), a list of lines, or a StringIO instance.

For buying techie books, science fiction, computer hardware or the latest gadgets: visit The Voidspace Amazon Store.

Hosted by Webfaction

Return to Top

Page rendered with rest2web the Site Builder

Last edited Tue Aug 2 00:51:34 2011.

Counter...