An Introduction to ConfigObj

Handling Configuration Files with ConfigObj

Author: Michael Foord

Note

This article was originally published in the May 2008 issue of the Python Magazine. Many thanks to them for allowing me to reproduce it in full here.

 

 

Introduction

ConfigObj is a powerful and flexible configuration file parser suitable for any configuration needs. This article explores the basic use of configuration files, and some of the more advanced features that make ConfigObj the right choice for your application.

Virtually every non-trivial application has to handle configuration data. There are a variety of options, ranging from using Python files for configuration to XML to storing configuration in a database. Each approach has its strengths and weaknesses, but many programmers default to plain text files in the simple "INI" format. This format has its origins in Windows 3.1, but its basic key-to-value pairing is so easy for users to work with that it has become a de-facto standard for configuration files on other platforms as well.

Naturally the Python standard library provides a library for working with INI files: ConfigParser. This article is about an alternative module for working with configuration files called ConfigObj. But why should you be interested in an alternative? ConfigParser can be slightly awkward to work with. It isn't difficult to use, but can make you do more work than necessary for what should be a simple task. More importantly, ConfigParser has some serious limitations. In this article we'll be looking at how to use ConfigObj for configuration file handling, including some of its more advanced features.

In order to follow the examples in this article you will need to have ConfigObj installed. You can either use easy_install configobj, or download it from the ConfigObj homepage.

The Advantages of ConfigObj

ConfigObj comes out of my very first days of learning Python, where it started life as a set of functions in a project called 'Atlantibots'. Ironically if those of us involved had known that the standard library had a configuration parser, ConfigObj may never have been written.

The latest version of ConfigObj was written by Nicola Larosa and me, with contributions from many others of course. It is widely used, including by some large projects like:

  • Bazaar - a distributed version control system
  • Turbogears 1 - a web application framework
  • Chandler - a Personal Information Manager
  • IPython - an enhanced interactive Python shell
  • matplotlib - a Python 2D plotting library
  • Trac - online project management and issue tracking
  • Elisa - an open source, cross-platform media centre solution

The biggest advantage of ConfigObj is simplicity. Even for trivial configuration files, where you just need a few key value pairs, ConfigParser requires them to be inside a 'section'. ConfigObj doesn't have this restriction, and having read a config file into memory, accessing members is trivially easy.

Let's look at a simple example. Given a filename pointing to a configuration file containing these members:

name = Michael Foord
# this comment belongs to 'DOB'
DOB = 12th August 1974 # an inline comment
nationality = English

We access it with the following snippet of code. After initializing a ConfigObj instance with a filename, you can then access members using dictionary like syntax:

>>> from configobj import ConfigObj
>>> config = ConfigObj('config.ini')
>>> config['name']
'Michael Foord'
>>> config['DOB']
'12th August 1974'
>>> config['nationality']
'English'

Beyond simplicity, ConfigObj has some more important advantages over ConfigParser. These include:

  • Unicode support
  • List values
  • Multi-line values
  • Nested sections (subsections) to any depth
  • When writing out config files, ConfigObj preserves all comments and the order of members and sections
  • Many useful methods and options for working with configuration files (like the merge() and reload() methods)
  • An unrepr mode for persisting Python basic types
  • An integrated validation and type conversion system. This allows you to check the validity of configuration files and supply default values.

Some of these you can take advantage of without having to do anything special in your own code. Particularly useful are multi-line values (with familiar triple quotes) and comma separated list values. Let's expand our simple config file with a couple of new members:

description = """
A hairy individual.
But with many redeeming features."""

attributes = 2 arms, 2 legs, "nose, slightly large"

When we access these members programmatically they become:

>>> config['description']
'\nA hairy individual.\nBut with many redeeming features.'
>>> config['attributes']
['2 arms', '2 legs', 'nose, slightly large']

Of course ConfigObj does allow you to use sections and exposes them using the dictionary access idiom.

Sections and Writing

ConfigObj can handle config files with sections, in fact as we noted earlier it can handle sections nested to whatever horrific depth you desire. As the syntax for nesting sections is different from the usual (loosely defined) INI format, no introduction to ConfigObj can be complete without showing it:

name = Michael Foord
DOB = 12th August 1974
nationality = English

[Favourites]
    food = Steak & chips
    color = Vaguely Purple

    [[software]]
        ide = Wing
        os = Undecided

The 'Favourites' section has a 'software' subsection. Sections are indicated using the section name in square brackets. Subsections are made by increasing the number of matching square brackets around the section name.

Note that the whitespace is not significant in this example, but as with code - indentation visually shows the structure of the information.

As ConfigObj uses dictionary-like access (more properly it uses the 'mapping protocol'), sections are exposed as sub-dictionaries.

>>> config = ConfigObj('config.ini')
>>> config['Favourites']['food']
'Steak & chips'
>>> software = config['Favourites']['software']
>>> software['ide']
'Wing'

When you access the 'Favourites' member of the ConfigObj instance it returns a reference to a Section, a dictionary-like object. In fact, a Section is a subclass of dict and ConfigObj itself is a subclass of Section. All the dictionary methods are available on sections (including the top-level ConfigObj instance).

As well as being able to load files and access the members, you can use the same API to modify members and create new ones. If you then call the write() method of ConfigObj, the changes are written back out. When writing files, ConfigObj does its best to preserve any comments that were in the original, the order of members, and even the newline terminators used. It is also trivially easy to create entirely new config files programmatically:

config = ConfigObj()
config.filename = 'new_file.ini'
config['name'] = 'Fred Smith'
config['Favourites'] = {}
config['Favourites']['color'] = 'Mostly Red'
config['Favourites']['software'] = {'ide': 'Emacs'}
config.write()

The above example creates a fresh ConfigObj instance and then sets the filename on it. Passing in the filename when instantiating ConfigObj is just as valid. New sections are created by setting a member to be a dictionary; either an empty dictionary or with pre-existing members. Finally, the config file is written out by a call to the 'write' method.

There are lots more details you could learn about the syntax of files that ConfigObj works with and the methods available. The goal of this article is not to cover all of these exhaustively, but to look at some of the advanced uses.

Unicode Support

Support for Unicode is one of the factors driving the use of ConfigObj in several projects. The internet has shrunk the world dramatically, making it easy for people all across the world to access your projects. If your application works with text, but you aren't aware of text encodings (and aren't using Unicode) then your app is probably broken in subtle ways.

In our first example, ConfigObj read a text file and parsed it to separate the keys from values. Because we did not specify the encoding, the file was treated as a byte stream. If the file used a multi-byte encoding, like UTF8 or UTF16, they would have been parsed incorrectly because characters would be split on byte boundaries rather than character boundaries.

The best way of handling this is to use known character encodings for your config files and have ConfigObj decode them to Unicode for you. UTF8 is of course a great encoding to use, as it can represent the whole range of Unicode characters and it is backwards compatible with ASCII -- using only one byte per character for the ASCII characters.

>>> config = ConfigObj('config.ini', encoding='UTF8')
>>> config['name']
u'Michael Foord'

This automatically decodes the keys and values into Unicode. If you later modify and write out the config file then the members will be encoded again using the encoding you specified when the ConfigObj was instantiated.

You can modify the encoding after parsing the input file by changing the encoding attribute on the ConfigObj instance. This argument/attribute pattern is used repeatedly to control ConfigObj's behavior. When you instantiate ConfigObj you specify how the file will be parsed. Most of the available options correspond to attributes that you can modify later. Changing the options changes what happens when you write out a config file by calling the 'write' method. For more details see the Options and Attributes sections of the ConfigObj documentation.

If you don't know the encoding of the text files you are using, then you can use a heuristic approach to guess. I have written about techniques for guessing encodings previously. Using Unicode in config files will be easier in Python 3, where strings are Unicode by default, but problems won't disappear altogether (you still need to know the encodings you are working with - and the default encoding can be different across platforms).

From discussing text encodings, it is a natural step to look at how we can use ConfigObj to store data other than strings.

Unrepr Mode

There are, broadly speaking, two different use cases for configuration files. The first is for retrieving options set by the user, this requires mainly read-only access (unless you also provide a user-interface for changing the settings). The second way of using config files is for persisting internal configuration data, rather than plain strings these may include data structures. If you want this data to be human readable (and possibly human editable as well), then you need some kind of text based serialization protocol.

The Python pickle module does have a text mode, but unless your brain can do a reasonable impression of a stack based interpreter the output isn't human readable. XML serialization is another option, but only barely more readable in my opinion. Two good alternatives are YAML and JSON. Both of these have good Python support, but are specialized for storing data-structures. ConfigObj has a special unrepr mode that makes it a good third option; particularly if what you really need is a mapping of names to data.

The unrepr mode was originally implemented for Turbogears, but has also proved popular with other projects. In this mode ConfigObj uses Python syntax for values in the configuration file. Any of the following Python data-types can be stored in values, including nested containers: integers; floats; complex numbers; scalars like None, True, and False; strings; tuples; lists; and dictionaries. The consequences of using Python syntax are that strings must be quoted and you can use the Python string escaping rules. Lists and dictionaries both use their normal Python syntax.

A typical section of a configuration file using unrepr mode looks like this:

kid.encoding = "utf-8"
tg.allow_json = False
tg.mapping = {'/one': 'one', '/two': 'two'}

When this is read in with 'unrepr' mode on you can probably anticipate the result:

>>> config = ConfigObj(filename, unrepr=True)
>>> config['kid.encoding']
'utf-8'
>>> config['tg.allow_json']
False
>>> config['tg.mapping']
{'/one': 'one', '/two': 'two'}

The main departure from Python syntax in unrepr mode is the use of triple quoted strings. These will have the quotes removed first, so you can use triple quoting to spread long entries (like dictionaries or lists with many members) across multiple lines. If you write out the config file though, ConfigObj won't write out triple quotes and will put each value on a single line (using Python escaping rules for newlines). It may not surprise you to hear that repr() is used to create the string representation of values, and the operation is reversed when they are read back in again. What you can't store using unrepr mode are types and instances of user defined classes. The advantage of unrepr mode is that you get type marshalling for free. The disadvantage is that it uses a different and more rigorous syntax for configuration files.

ConfigObj also supports a more sophisticated way of doing type marshalling called validation.

Validation

Configuration files by their very nature are text files, which means working with strings. When you work with the configuration data inside your application, only some of it will actually be strings. Even if your configuration data is all user supplied, at some point you are going to need to convert data to Python types. You may also want to check that the data supplied by the user is valid, that all the required options are present and that values fall within correct bounds or make sense.

ConfigObj provides a mechanism called validation that does all of this. It also allows you to specify default values, so that users only need to supply members that differ from the defaults.

Validation is done in conjunction with an extra module, validate, that comes in the zipfile distribution of ConfigObj, or can be downloaded separately. It isn't required to use ConfigObj; you only need it if you are using validation. To validate a configuration file, you provice a specification for what it should contain. The specification is usually from a text source, but could be built programmatically, and describes the type of value you expect for each key in your configuration file. You can also include any bounds or constraints on the value, and optionally a default value.

When you perform validation, each of the members in your specification are checked and they undergo a process that converts the values into the specified type. Missing values that have defaults will be filled in, and validation returns either True to indicate success or a dictionary with members that failed validation. The individual checks and conversions are performed by functions, and adding your own check function is very easy.

Before we look at how to use them with ConfigObj, let's take a look at the specifications themselves which are referred to as configspecs in the ConfigObj documentation. A configspec has the same basic layout as a configuration file, and in fact they follow the same structure as the configuration file they are intended to check. They use the same ''key = value'' format and the same section/subsection markers. The difference is that the value for each key is a type and constraint specification.

Here's a simple example:

name = string
age = float
attributes = string_list
likes_cheese = boolean
favourite_color = string

This is a simple mapping of keys to expected type. To validate a config file using this configspec is simple. Under the hood ConfigObj parses the configspec in the same way it does config files. This means we have various options as to how we provide the configspec. The two most obvious ways are either to pass in a filename pointing to the configspec or pass it in as a list of lines from the file (strings).

When a ConfigObj instance is created with a configspec we use a validator to do the validation. The standard Validator from the validate module has lots of useful checks built into it, so you can often use it directly. In a short while we will look at creating new checks and adding them to the validator, but first here is a basic example:

import sys
from configobj import ConfigObj
from validate import Validator

config = ConfigObj('config.ini', configspec='configspec.ini')

validator = Validator()
result = config.validate(validator)

if result != True:
    print 'Config file validation failed!'
    sys.exit(1)

When this code runs, and validation succeeds, then we know that the ''config.ini'' file contains the specified members and that they have been converted to the specified types after being read into memory. If validation fails we probably want to provide a more informative error message, which we can do using the dictionary returned by validate().

Checks with Arguments

A lot of the validation checks can also take arguments to specify bounds or constraints on the values as well as type. For example, you can specify minimum and maximum length for strings, a valid range of values for numbers, minimum and maximum lengths for lists, and so on. There is also a useful "option" check that allows us to specify that a value must be from a specific set of options.

Here's an alternative configspec that uses these arguments:

name = string(min=1, max=30)
age = float(min=0, max=200)
attributes = string_list(min=5, max=5)
likes_cheese = boolean()
favourite_color = option('red', 'green', 'blue')

These checks are starting to look suspiciously like function calls and, unsurprisingly, they do map directly to function calls that receive the arguments provided in the checks. As in Python code, the keyword arguments used above are optional if you are providing both a 'min' and a 'max', but they can be useful to make the intent of the arguments clear.

There are a couple of subtle points to notice in this example. In the second example, the boolean check includes parentheses. For checks without arguments, parentheses are optional. In the attributes string_list settings we specified a list with exactly five entries by setting both the maximum and minimum length to five.

Default Values

Another important use of checks is to provide default values where a member is not present in a config file. This means that your end users only need to supply values that are different from the defaults. Default values are set by passing in the keyword argument ''default'' as part of the check.

name = string(min=1, max=30, default=Fred)
age = float(min=0, max=200, default=29)
attributes = string_list(min=5, max=5, default=list('arms', 'legs', 'head', 'body', 'others'))
likes_cheese = boolean(default=True)
favourite_color = option('red', 'green', 'blue', default="red")

There are several points worth drawing out here. Default values can be unquoted or use single or double quotes. As the checks come from text, the default value also has to go through the check function to be converted to the right type. This means that your default must pass the validation check! If you supplied 201 as the default age (which should have a maximum of 200), then a configuration file with a missing age would fail validation even though age has a default. The exception to this rule is where you specify a default of None (without quotes). Where None comes from a default value it will always pass checks, so None as a default value can be a useful way of checking for missing values. Where the check indicates that the configuration option should be a list, the default value has to be a list. A default list can be set by using the 'list constructor' syntax in the default, as you can see in the default value for the 'attributes' member above.

If we create a fresh ConfigObj instance using the above configspec, and then validate, you can see the default values have been filled in:

>>> config = ConfigObj(configspec='configspec.ini')
>>> validator = Validator()
>>> config.validate(validator)
True
>>> config['name']
'Fred'
>>> config['age']
29.0
>>> config['attributes']
['arms', 'legs', 'head', 'body', 'others']

It is worth mentioning that if you write out a config file, any values that were supplied by default values from validation (i.e. they were missing from the original file) won't be written back out. This can actually be a problem. The code we have just used could be a great way to create fresh config files. Simply create a new ConfigObj instance with a configspec providing default values, then validate. Unfortunately calling write() will then create an empty file for us. If you want ConfigObj to include default values when writing then you need to specify copy=True when you call validate(). In the example above, that means simply calling config.validate(validator, copy=True).

So far we have only covered the cases when validation is successful. Let's see how we can make error reporting more helpful when validation fails.

Error Handling

There are basically two kinds of errors we might want to handle. The first is where the syntax used in the config file is invalid and parsing fails. This is basically a fatal error (at least as far as configuration is concerned), because invalid syntax makes it hard to know how much of the configuration file was read in correctly. The second type of error is where the config file was read in fine, but some of the values were either missing altogether or invalid in some other way.

You may not be able to do much about the first kind of error, but you probably want to be able to catch it to report to the user. When ConfigObj fails to read in a config file it raises an exception. There are various different kinds of exceptions it can raise (a DuplicateError for duplicated keys or sections, a NestingError for malformed section headers, and a ParseError for badly formed entries to name a few). These are all subclasses of ConfigObjError, so the simples thing to do is to instantiate the ConfigObj in a 'try...except' block that catches this error. Normally if the config file pointed to by the filename you provide doesn't even exist, ConfigObj won't complain (using ConfigObj to create new config files is perfectly valid). If you would rather this condition raised an error, you can use the file_error keyword argument. To handle both kinds of error you can do the following error handling:

from configobj import ConfigObj, ConfigObjError
try:
    config = ConfigObj(filename, file_error=True)
except (ConfigObjError, IOError), e:
    print 'Could not read "%s": %s' % (filename, e)

The next kind of error is when validation fails. This doesn't raise an exception, but instead the 'validate' method returns a dictionary which tells you which members failed validation. The dictionary essentially follows the structure of the config file, with sections and subsections as dictionaries keyed by their name. Each member will have the value True (for validation succeeds) or False (if validation for that member fails). If a whole section validates then it will just have True in the results dictionary instead of a sub-dictionary. It is easy to write a function that recursively walks the results dictionary and reports (or stores) the ones that failed.

To make it easier for you, ConfigObj comes with an example implementation of a function called flatten_errors() that does most of this for you. You can use this, or look at the implementation for inspiration. The function walks the results dictionary and returns a list of all the keys that failed. For each failed member, the list tells you what section/subsection it was in and the failed member name. If a whole section was missing then the member name will be None.

The following code is an example of how to use flatten_errors() to report validation errors:

from configobj import ConfigObj, flatten_errors
from validate import Validator

config = ConfigObj('config.ini', configspec='configspec.ini')
validator = Validator()
results = config.validate(validator)

if results != True:
    for (section_list, key, _) in flatten_errors(config, results):
        if key is not None:
            print 'The "%s" key in the section "%s" failed validation' % (key, ', '.join(section_list))
        else:
            print 'The following section was missing:%s ' % ', '.join(section_list)

Repeat Sections

One problem with validation is that it requires you know the names of your sections. Some applications use the names of sections in config files as part of the configuration data. ConfigObj allows you to validate files like this by providing a specification that will be used for all subsections in a section (including the root section if you want).

The following configuration data is part of a fictional server configuration:

[sites]

    [[voidspace]]
        directory = '/home/voidspace/webapps/www.voidspace.org.uk/'
        domain = 'voidspace.org.uk'
        subdomains = 'www', # single member list
        active = True
        ip = '68.54.95.48'

    [[resolverhacks]]
        directory = '/home/voidspace/webapps/www.resolverhacks.net/'
        domain = 'resolverhacks.net'
        subdomains = 'www',
        active = True
        ip = '68.54.95.48'

    [[ironpython]]
        directory = '/home/voidspace/webapps/www.ironpython.info/'
        domain = 'ironpython.info'
        subdomains = 'www',
        active = True
        ip = '68.54.95.48'

We can validate the 'sites' section of the configuration file with a section called '__many__':

[sites]

    [[__many__]]
        directory = string
        domain = string
        subdomains = string_list
        active = boolean
        ip = ip_addr

When you validate, the '__many__' section in the configspec is used on every sub-section in the 'sites' section.

Writing a Custom Check

Part of the power of validation is how easy it is to write custom checks and use them in your configspecs. Checks are functions that you register with the validator. They receive the value being checked (plus any arguments passed in the spec), and should either return the converted value or raise an exception to indicate that the check failed.

Let's look at writing a custom check that uses a regular expression to test that a value is an email address. As it isn't doing type conversion, it will return the value unchanged if the value matches - or raise an exception if it doesn't.

Check functions receive the value and arguments as strings (from the config file) or possibly a list of strings for list values. ValidateError is the correct exception to raise when the input is invalid. The following function uses a basic regex to check the email address:

import re
from validate import ValidateError

email_re = re.compile('\w+@\w+(?:\.\w+)')
def email_check(value):
    if isinstance(value, list):
        raise ValidateError('A list was passed when an email address was expected')
    if email_re.match(value) is None:
        raise ValidateError('"%s" is not an email address' % value)

    return value

(Note - I don't actually recommend checking emails using this regular expression. This is for example purposes only!) Having written the check function you pass it into the Validator constructor in a dictionary mapping the check name to the check function.

validator = Validator({'email': email_check})

You specify this check in the configspec with the check name:

contact_email = email()

Nice and simple!

Conclusion

ConfigObj is a deceptively simple tool for working with configuration files and persisting application data. Although the basic pattern can be illustrated with a handful of lines of code, even in this article we haven't covered everything.

The ConfigObj documentation is exhaustive, but here we have expanded out a few of the more advanced ways you can use it, particularly with validation and configspecs. There is plenty we haven't covered, like interpolation and some of the methods available to you like merge() and walk(). If you have any questions (or heaven forbid, find any bugs) then the mailing list is a friendly place to go for help.

For buying techie books, science fiction, computer hardware or the latest gadgets: visit The Voidspace Amazon Store.

Hosted by Webfaction

Return to Top

Page rendered with rest2web the Site Builder

Last edited Tue Aug 2 00:51:34 2011.

Counter...