Data Persistence With ConfigObj
Contents
IntroductionNote Since the introduction of the unrepr mode to ConfigObj, there is now a better way of doing data-persistence with ConfigObj. The techniques and code discussed in this article are still useful for
automatically creating a configspec. This beats creating them by hand. ConfigObj is a pure python module for the easy reading and writing of application configuration data. It uses an ini file like syntax - similar to the ConfigParser module - but with much greater power. ConfigObj can store nested sections. A section maps members (values) to names. This is bascially what the Python dictionary object does, and so we use the dictionary to represent a section. Every value can be a single value or a list. Individual values are stored as strings - but using the validate module they can be transparently translated to and from floats, booleans or integers [1]. This means that the ConfigObj can naturally represent Python data structures comprised of dictionaries [2], lists, strings, floats, booleans and integers. This is most of the basic datatypes. This article discusses using ConfigObj for the common programmer's task of data-persistence - the storing and retrieving of data structures based on the Python dictionary. Along the way we evolve a set of tools (with a high level interface) to do this. Hint You can see the final results of this article as the module ConfigPersist.py. The ProblemThere are some restrictions though. ConfigObj can't just be used to represent arbitrary data structures - even if all the members are allowed types.
ConfigObj isn't a data persistence module - this list of restrictions tells you that much. However if you examine the typical data structures used in your programs you may find that these restrictions aren't a problem for many of them. Why Not Pickle ?Why would we want to do this ? Well, the usual method for preserving data structures is the Python pickle module. This can store and retrieve a much wider range of objects - with none of the restrictions above. However :
Of these, the first two reasons are the most compelling. So we've looked at the sort of data that ConfigObj can and can't store. We still have a big problem. ConfigObj is designed for storing strings - this means that our data will have been converted to strings when we read it back in. The configspecIf you know the datatype of each member then you can write a configspec. If you pass this into the ConfigObj when you read the config file [5] then you can call the validate method. This uses the configspec to transform the values into the expected data types. It will even transform each member of list values into the right type. Note In fact the configspec does more than just specify the type of each member. It can be used to specify the bounds or parameter of each value. So if your data structure is always going to have members of the same type (but possibly different values) you could write a configspec for it. That sounds like hard work though Note If all your values are strings, you don't need to use a configspec. Lists will automatically be converted into lists of strings without needing validation. Creating a configspecA configspec is a dictionary of checks for a section. In the first step we'll walk a ConfigObj and create a configspec for it. The types we'll check for are strings, booleans, integers, and floats. We'll also check for lists of these types. the check is done using an isinstance test - so subclasses are allowed (but won't be recreated when read from the file). This function modifies a ConfigObj inplace - so it doesn't return anything. It will overwrite any existing configspec. def add_configspec(config): """ A function that adds a configspec to a ConfigObj. Will only work for ConfigObj instances using basic datatypes : * floats * strings * ints * booleans * Lists of the above """ config.configspec = {} for entry in config: val = config[entry] if isinstance(val, dict): # a subsection add_configspec(val) elif isinstance(val, bool): config.configspec[entry] = 'boolean' elif isinstance(val, int): config.configspec[entry] = 'integer' elif isinstance(val, float): config.configspec[entry] = 'float' elif isinstance(val, str): config.configspec[entry] = 'string' elif isinstance(val, (list, tuple)): list_type = None out_list = [] for mem in val: if isinstance(mem, str): this = 'string' elif isinstance(mem, bool): this = 'boolean' elif isinstance(mem, int): this = 'integer' elif isinstance(mem, float): this = 'float' else: raise TypeError('List member "%s" is an innapropriate type.' % mem) if list_type and this != list_type: list_type = 'mixed' elif list_type is None: list_type = this out_list.append(this) if list_type is None: l = 'list(%s)' else: list_type = {'integer': 'int', 'boolean': 'bool', 'mixed': 'mixed', 'float': 'float', 'string': 'string' }[list_type] l = '%s_list(%%s)' % list_type config.configspec[entry] = l % str(out_list)[1:-1] # else: raise TypeError('Value "%s" is an innapropriate type.' % val) Having created a configspec you should then be able to call validate and have it return True : from configobj import ConfigObj from validate import Validator vtor = Validator() config = ConfigObj(filename) add_configspec(config) assert config.validate(vtor) == True Next thing to do is to retrieve the configspec as a list of lines. For this we'll need a new function. This function assumes you have already called add_configspec. def write_configspec(config): """Return the configspec (of a ConfigObj) as a list of lines.""" out = [] for entry in config: val = config[entry] if isinstance(val, dict): # a subsection m = config.main._write_marker('', val.depth, entry, '') out.append(m) out += write_configspec(val) else: name = config.main._quote(entry, multiline=False) out.append("%s = %s" % (name, config.configspec[entry])) # return out This function now returns a configspec that we can use to validate a ConfigObj. It will also restore the type of any non-string values. config = ConfigObj() # set some non string values config['member 1'] = 3 config['member 2'] = 3.0 config['member 3'] = True config['member 4'] = [3, 3.0, True] add_configspec(config) configspec = write_configspec(config) # lets create a copy of the original config # and validate it with the configspec we made b = ConfigObj(config.write(), configspec=configspec) assert b.validate(vtor) == True assert b == config The Next StepGreat - so we now have a way of storing data structures and restoring the values with the correct type. The only problem is that we have to store the type information separately from the actual data - what a nuisance. Wouldn't it be funky if we could store the type info in the data structure. Obviously we'd want to read and write this transparently. Saving it is easy. We create a new subsection in each section called __types__. This contains a dictionary with a copy of the configspec in it. When we call the write method this will automatically get saved out for us def add_typeinfo(config): """ Turns the configspec attribute of each section into a member of the section. (Called ``__types__``). You must have already called ``add_configspec`` on the ConfigObj. """ for entry in config.sections: add_typeinfo(config[entry]) config['__types__'] = config.configspec That looks like it should work. What about reading this back in ? We'll need to do the opposite of course. def typeinfo_to_configspec(config): """Turns the '__types__' member of each section into a configspec.""" for entry in config.sections: if entry == '__types__': continue typeinfo_to_configspec(config[entry]) config.configspec = config['__types__'] del config['__types__'] Putting this together avoids the need for the write_configspec stage : config = ConfigObj() # set some non string values config['member 1'] = 3 config['member 2'] = 3.0 config['member 3'] = True config['member 4'] = [3, 3.0, True] # create a copy to test # because add_typinfo modifies the config orig = ConfigObj(config) add_configspec(config) add_typeinfo(config) config.filename = 'test.ini' config.write() b = ConfigObj('test.ini') typeinfo_to_configspec(b) assert b.validate(vtor) == True assert b == orig So now we have two ways of saving and restoring data structures. Both of them involve calling add_configspec(config) first. Then we can either call write_configspec(config) or add_typeinfo(config). write_configspec is useful where we repeatedly work with similar data structures. If they all have the same structure (or type signature) then a single configspec will work repeatedly. In this case it makes sense to store it separately. If we just want to store and restore an arbitrary data structure (within our limitations) then we should use add_typeinfo and typeinfo_to_configspec. In both cases the intermediate file that is saved is simple enough to be edited by hand. Final StepWe can see in the examples above that our conversions are done with simple two step processes. Like all programmers I am lazy and would prefer this to be a simple one step process. Lets create a couple of convenience functions that do them in a single step : try: from validate import Validator except ImportError: vtor = None else: vtor = Validator() def store(config): """" Passed a ConfigObj instance add type info and save. Returns the result of calling ``config.write()``. """ add_configspec(config) add_typeinfo(config) return config.write() def restore(stored): """ Restore a ConfigObj saved using the ``store`` function. Takes a filename or list of lines, returns the ConfigObj instance. Uses the built-in Validator instance of this module (vtor). Raises an ImportError if the validate module isn't available """ config = ConfigObj(stored) if vtor is None: raise ImportError('Failed to import the validate module.') typeinfo_to_configspec(config) config.validate(vtor) return config def save_configspec(config): """Creates a configspec and returns it as a list of lines.""" add_configspec(config) return write_configspec(config) These functions are all designed to work with dictionary like data structures containing the basic data types. You can initialise a ConfigObj instance from the dictionary (config = ConfigObj(dict)) - so it's a very easy to use technique. If you wanted to extend this system to work with additional data types it wouldn't be hard. You would need to add functions to your Validator instance (see the validate docs) and also amend the functions here to handle the extra types. You can download these functions as a module called ConfigPersist.py. Note Perhaps the most serious restriction is that dictionary keys must be strings. It would be possible to walk a dictionary converting all keys to strings - where the string contains type info. (e.g. 3 becomes '3:int', 3.0 becomes '3.0:float', '3' to '3:str' etc). You would need another function that walks the ConfigObj and rebuilds the dictionary, restoring the type of the keys. I leave this as an exercise to the reader - not least because it reduces the readability of the saved config file. Footnotes
For buying techie books, science fiction, computer hardware or the latest gadgets: visit The Voidspace Amazon Store. If you're looking for a new techie job, try the Voidspace Tech Job Board. This is part of the Hidden Network of technology and programming jobs.
Last edited Fri Feb 15 13:42:08 2008. Counter... |
|||||||||||
|
Blogads
Follow me on: Tech Jobs |