Handling Lists and Quotes

Functions for Lists, Quotes and CSVs

Author: Michael Foord
Contact: fuzzyman@voidspace.org.uk
Version: 1.4.0
Date: 2005/08/28
License:BSD License [1]
Online Version:listquote online
Support:Mailing List

listquote Manual

Introduction

This module provides functions for turning lists into strings - and back again. It properly handles quoting and unquoting of elements and can even parse recursive lists.

Because a CSV (comma separated value) file is basically a list - the module includes functions for easily reading and writing CSVs.

This module is a part of the pythonutils [2] package. Many of the modules in this package can be seen at the Voidspace Modules Page.

Note

The interfaces to many of the functions have changed since version 1.3.0. See the CHANGELOG for a summary of the changes.

Downloading

As well as being included in the pythonutils package, you can download listquote directly from :

Quoting and Unquoting

These functions handle the quoting and unquoting of elements in a list. Using standard functions means that you can easily process lists without having to worry about whether values are properly quoted.

elem_quote

elem_quote(member, nonquote=True, stringify=False, encoding=None)

Simple method to add the most appropriate quote to an element - either single quotes or double quotes.

If member contains \n it can't be quoted, so a QuoteError is raised. If you want to quote multiline value use the quote_escape function.

If nonquote is set to True (the default), then if member contains none of '," []()#; then it isn't quoted at all.

If member contains both single quotes and double quotes then a QuoteError is raised. If you want to quote values like this, use the quote_escape function.

If stringify is set to True (the default is False) then non string (unicode or byte-string) values will be first converted to strings using the str function. Otherwise elem_quote raises a TypeError.

If encoding is not None and member is a byte string, then it will be decoded into unicode using this encoding.

unquote

unquote(inline, fullquote=True, retain=False)

Unquote a value.

If the value isn't quoted it returns the value.

If the value is badly quoted it raises UnQuoteError.

If retain is True (default is False) then the quotes are left around the value (but leading or trailing whitespace will have been removed).

If fullquote is False (default is True) then unquote will only unquote the first part of the inline. If there is anything after the quoted element, this will be returned as well (instead of raising an error).

In this case the return value is (value, rest).

quote_escape

quote_escape(value, lf='&mjf-lf;', quot='&mjf-quot;')

Escape a string so that it can safely be quoted. You should use this if the value to be quoted may contain line-feeds or both single quotes and double quotes.

If the value contains \n then it will be escaped using lf. By default this is &mjf-lf;.

If the value contains single quotes and double quotes, then all double quotes will be escaped using quot. By default this is &mjf-quot;.

quote_unescape

quote_unescape(value, lf='&mjf-lf;', quot='&mjf-quot;'):

Unescape a string escaped by quote_escape.

If it was escaped using anything other than the defaults for lf and quot you must pass them to this function.

Handling Lists

List Parsing

There has been a major change in version 1.4 of listquote. The biggest function of listquote is to parse lists that are represented as strings. These will usually have been read from a file - for example a CSV file, or config or logging data. The feature is that these lists can include nested lists.

This functionality is now provided by an object called LineParser. There is also a simple function called lineparse that can use it. This provides a measure of backwards compatibility and is also convenient for single line use.

LineParser

LineParser is an object that parses lists from strings. There are several options that control it's behaviour. It will also remove comments from the end of lines.

p = LineParser(options=None, **keywargs)
nested_list = ''' "member 1", "member 2", ["nest 1", ("nest 2", 'nest 2b', ['nest 3', 'value'], nest 2c), nest1b]'''
p.feed(nested_list)
(['member 1', 'member 2', ['nest 1', ['nest 2', 'nest 2b', ['nest 3', 'value'], 'nest 2c'], 'nest1b']], '')
#
new_list = []
for line in list_of_lines:
    new_list.append(p.feed(line)[0])

LineParser also has a reset method. This allows you to reconfigure your parser object without having to create a new one.

p.reset(new_options=None, **keywargs)
#
second_list = []
for line in second_list_of_lines:
    second_list.append(p.feed(line)[0])

Both creating a new parser, and the reset method, take the same arguments. These are a set of options that control the behaviour of the feed method. The options can be passed in as a dictionary - or as explicit keyword arguments that will override any options in the dictionary.

The options (with defaults shown) are :

'recursive': True,
'comment': True,
'retain': False,
'force_list': False,
'csv': False

As you can see from the above examples, it is the feed method that parses each line - and returns the parsed list. feed uses the options set in the parser object.

The return value depends on the options set in the parser.

Note

This can be slightly confusing for first time users. The default is for LineParser.feed to return a tuple for each line. This contains (outvalue, comment). outvalue can be a single item or a list.

See the comment and force_list options for ways to alter this behaviour.

Can parse lists - including nested lists.

If recursive is False then nested lists will cause a BadLineError.

If comment is False feed returns outvalue

If comment is True it returns (outvalue, comment). (Even if the comment is just '').

If force_list is False then outvalue may be a list or a single item.

If force_list is True then outvalue will always be a list - even if it has just one member.

List syntax :

  • Comma separated lines a, b, c, d

  • Lists can optionally be between square or ordinary brackets
    • [a, b, c, d]
    • (a, b, c, d)
  • Nested lists must be between brackets - a, [a, b, c, d], c

  • A single element list can be shown by a trailing quote - a,

  • An empty list is shown by () or []

Elements can be quoted with single or double quotes (but can't contain both).

The line can optionally end with a comment (preeded by a '#'). This depends on the comment attribute. e.g.

"Member 1", "Member 2", "Member 3"  # This is a comment

If the line is badly built then this method will raise one of :

CommentError, BadLineError, UnQuoteError

Using the csv option is the same as setting :

'recursive': False
'force_list': True
'comment': False

The csv setting provides an easy way of handling normal csv files - including the proper handling of quoted elements.

lineparse

lineparse(inline, options=None, **keywargs)

A compatibility function that mimics the old lineparse.

Also more convenient for single line use.

Note

It still uses the new LineParser - and so takes the same keyword arguments as that.

parsed_line = lineparse(some_line) is the equivalent of :

p = LineParser()
parsed_line = p.feed(some_line)

Lists to Strings

The next two functions are for turning lists into strings. Presumably so they can then be written out to a file.

list_stringify

list_stringify(inlist)

Recursively rebuilds a list - making sure all the members are strings.

Can take any iterable or a sequence as the argument and always returns a list.

Useful before writing out lists.

Used by makelist if stringify is set.

Uses the str function for stringification.

Every element will be a string or a unicode object.

Doesn't handle decoding strings into unicode objects (or vice-versa).

makelist

makelist(inlist, listchar='', stringify=False, escape=False, encoding=None)

Given a list - turn it into a string that represents that list. (Suitable for parsing by LineParser).

listchar should be '[', '(' or ''. This is the type of bracket used to enclose the list. ('' meaning no bracket of course).

If you have nested lists and listchar is '', makelist will automatically use '[' for the nested lists.

If stringify is True (default is False) makelist will stringify the inlist first (using list_stringify).

If escape is True (default is False) makelist will call quote_escape on each element before passing them to elem_quote to be quoted.

If encoding keyword is not None, all strings are decoded to unicode with the specified encoding. Each item will then be a unicode object instead of a string.

CSV Functions

The canonical list format file is the CSV - the comma separated values file. LineParser (and the other functions in listquote) provides a trivially simple way to read and write these files (including properly handling quotes).

csvread

csvread(infile)

Given an infile as an iterable, return the CSV as a list of lists.

infile can be an open file object or a list of lines.

If any of the lines are badly built then a CSVError will be raised. This has a csv attribute - which is a reference to the parsed CSV. Every line that couldn't be parsed will have [] for it's entry.

The error also has an errors attribute. This is a list of all the errors raised. Error in this will have an index attribute, which is the line number, and a line attribute - which is the actual line that caused the error.

Example of usage :

handle = open(filename)
# remove the trailing '\n' from each line
the_file = [line.rstrip('\n') for line in handle.readlines()]
csv = csvread(the_file)

csvwrite

csvwrite(inlist, stringify=False)

Given a list of lists it turns each entry into a line in a CSV. (Given a list of lists it returns a list of strings).

The lines will not be \n terminated.

Set stringify to True (default is False) to convert entries to strings before creating the line.

If stringify is False then any non string value will raise a TypeError.

Every member will be quoted using elem_quote, but no escaping is done.

Example of usage :

# escape each entry in each line (optional)
for index in xrange(len(the_list)):
    the_list[index] = [quote_escape(val) for val in the_list[index]]
#
the_file = csvwrite(the_list)
# add a '\n' to each line - ready to write to file
the_file = [line + '\n' for line in the_file]
open(filename, 'w').writelines(the_file)

Exceptions

These are the exceptions used by listquote.

ListQuoteError(SyntaxError)

Base class for errors raised by the listquote module. It is a subclass of SyntaxError - so you can trap for that without having to import these exceptions into your namespace.

QuoteError(ListQuoteError)

This value can't be quoted.

UnQuoteError(ListQuoteError)

The value is badly quoted.

BadLineError(ListQuoteError)

A line is badly built.

CommentError(BadLineError)

A line contains a disallowed comment. A subclass of BadLineError.

CSVError(ListQuoteError)

The CSV File contained errors. Currently only used by csvread.

CHANGELOG

See the source code for full CHANGELOG and TODO/ISSUES.

Changes since version 1.3.0 :

Note

These changes are quite extensive. If any of them cause you problems then let me know. I can provide a workaround in the next release.

Footnotes

[1]Online at http://www.voidspace.org.uk/python/license.shtml
[2]Online at http://www.voidspace.org.uk/python/pythonutils.html

Note

Rendering this document with docutils also needs the textmacros module and the PySrc CSS stuff. See http://www.voidspace.org.uk/python/firedrop2/textmacros.shtml


Certified Open Source