Package pythonutils :: Module listquote
[hide private]
[frames] | no frames]

Module listquote
source code

Having written modules to handle turning a string representation of a list back into a list (including nested lists) and also a very simple CSV parser, I realised I needed a more solid set of functions for handling lists (comma delimited lines) and quoting/unquoting elements of lists.

The test stuff provides useful examples of how the functions work.



Classes [hide private]
ListQuoteError Base class for errors raised by the listquote module.
QuoteError This value can't be quoted.
UnQuoteError The value is badly quoted.
BadLineError A line is badly built.
CommentError A line contains a disallowed comment.
CSVError The CSV File contained errors.
LineParser An object to parse nested lists from strings.

Functions [hide private]
  elem_quote(member, nonquote=True, stringify=False, encoding=None)
Simple method to add the most appropriate quote to an element - either single quotes or double quotes.
  unquote(inline, fullquote=True, retain=False)
Unquote a value.
  quote_escape(value, lf='&mjf-lf;', quot='&mjf-quot;')
Escape a string so that it can safely be quoted.
  quote_unescape(value, lf='&mjf-lf;', quot='&mjf-quot;')
Unescape a string escaped by quote_escape.
  simplelist(inline)
Parse a string to a list.
  lineparse(inline, options=None, **keywargs)
A compatibility function that mimics the old lineparse.
  list_stringify(inlist)
Recursively rebuilds a list - making sure all the members are strings.
  makelist(inlist, listchar='', stringify=False, escape=False, encoding=None)
Given a list - turn it into a string that represents that list.
  csvread(infile)
Given an infile as an iterable, return the CSV as a list of lists.
  csvwrite(inlist, stringify=False)
Given a list of lists it turns each entry into a line in a CSV.
  _test()

Variables [hide private]
basestring  
inquotes  
badchars  
paramfinder  
unquoted  

Imports: re

Function Details [hide private]

elem_quote(member, nonquote=True, stringify=False, encoding=None)

source code 

Simple method to add the most appropriate quote to an element - either single quotes or double quotes.

If member contains ``

`` a QuoteError is raised - multiline values

can't be quoted by elem_quote.

If nonquote is set to True (the default), then if member contains none of '," []()#; then it isn't quoted at all.

If member contains both single quotes and double quotes then all double quotes (") will be escaped as &mjf-quot; and member will then be quoted with double quotes.

If stringify is set to True (the default is False) then non string (unicode or byte-string) values will be first converted to strings using the str function. Otherwise elem_quote raises a TypeError.

If encoding is not None and member is a byte string, then it will be decoded into unicode using this encoding.

>>> elem_quote('hello')
'hello'
>>> elem_quote('hello', nonquote=False)
'"hello"'
>>> elem_quote('"hello"')
'\'"hello"\''
>>> elem_quote(3)
Traceback (most recent call last):
TypeError: Can only quote strings. "3"
>>> elem_quote(3, stringify=True)
'3'
>>> elem_quote('hello', encoding='ascii')
u'hello'
>>> elem_quote('\n')
Traceback (most recent call last):
QuoteError: Multiline values can't be quoted.
"
"

unquote(inline, fullquote=True, retain=False)

source code 

Unquote a value.

If the value isn't quoted it returns the value.

If the value is badly quoted it raises UnQuoteError.

If retain is True (default is False) then the quotes are left around the value (but leading or trailing whitespace will have been removed).

If fullquote is False (default is True) then unquote will only unquote the first part of the inline. If there is anything after the quoted element, this will be returned as well (instead of raising an error).

In this case the return value is (value, rest).

>>> unquote('hello')
'hello'
>>> unquote('"hello"')
'hello'
>>> unquote('"hello')
Traceback (most recent call last):
UnQuoteError: Value is badly quoted: ""hello"
>>> unquote('"hello" fish')
Traceback (most recent call last):
UnQuoteError: Value is badly quoted: ""hello" fish"
>>> unquote("'hello'", retain=True)
"'hello'"
>>> unquote('"hello" fish', fullquote=False)
('hello', ' fish')

quote_escape(value, lf='&mjf-lf;', quot='&mjf-quot;')

source code 

Escape a string so that it can safely be quoted. You should use this if the value to be quoted may contain line-feeds or both single quotes and double quotes.

If the value contains ``

`` then it will be escaped using lf. By

default this is &mjf-lf;.

If the value contains single quotes and double quotes, then all double quotes will be escaped using quot. By default this is &mjf-quot;.

>>> quote_escape('hello')
'hello'
>>> quote_escape('hello\n')
'hello&mjf-lf;'
>>> quote_escape('hello"')
'hello"'
>>> quote_escape('hello"\'')
"hello&mjf-quot;'"
>>> quote_escape('hello"\'\n', '&fish;', '&wobble;')
"hello&wobble;'&fish;"

quote_unescape(value, lf='&mjf-lf;', quot='&mjf-quot;')

source code 

Unescape a string escaped by quote_escape.

If it was escaped using anything other than the defaults for lf and quot you must pass them to this function.

>>> quote_unescape("hello&wobble;'&fish;",  '&fish;', '&wobble;')
'hello"\'\n'
>>> quote_unescape('hello')
'hello'
>>> quote_unescape('hello&mjf-lf;')
'hello\n'
>>> quote_unescape("'hello'")
"'hello'"
>>> quote_unescape('hello"')
'hello"'
>>> quote_unescape("hello&mjf-quot;'")
'hello"\''
>>> quote_unescape("hello&wobble;'&fish;",  '&fish;', '&wobble;')
'hello"\'\n'

simplelist(inline)

source code 

Parse a string to a list.

A simple regex that extracts quoted items from a list.

It retains quotes around elements. (So unquote each element)

>>> simplelist('''hello, goodbye, 'title', "name", "I can't"''')
['hello', 'goodbye', "'title'", '"name"', '"I can\'t"']

FIXME: This doesn't work fully (allows some badly formed lists): e.g. >>> simplelist('hello, fish, "wobble" bottom hooray') ['hello', 'fish', '"wobble"', 'bottom hooray']

lineparse(inline, options=None, **keywargs)

source code 

A compatibility function that mimics the old lineparse.

Also more convenient for single line use.

Note: It still uses the new LineParser - and so takes the same keyword arguments as that.

>>> lineparse('''"hello", 'goodbye', "I can't do that", 'You "can" !' # a comment''')
(['hello', 'goodbye', "I can't do that", 'You "can" !'], '# a comment')
>>> lineparse('''"hello", 'goodbye', "I can't do that", 'You "can" !' # a comment''', comment=False)
Traceback (most recent call last):
CommentError: Comment not allowed :
"hello", 'goodbye', "I can't do that", 'You "can" !' # a comment
>>> lineparse('''"hello", 'goodbye', "I can't do that", 'You "can" !' # a comment''', recursive=False)
(['hello', 'goodbye', "I can't do that", 'You "can" !'], '# a comment')
>>> lineparse('''"hello", 'goodbye', "I can't do that", 'You "can" !' # a comment''', csv=True)
Traceback (most recent call last):
CommentError: Comment not allowed :
"hello", 'goodbye', "I can't do that", 'You "can" !' # a comment
>>> lineparse('''"hello", 'goodbye', "I can't do that", 'You "can" !' ''', comment=False)
['hello', 'goodbye', "I can't do that", 'You "can" !']
>>> lineparse('')
('', '')
>>> lineparse('', force_list=True)
([], '')
>>> lineparse('[]')
([], '')
>>> lineparse('()')
([], '')
>>> lineparse('()', force_list=True)
([], '')
>>> lineparse('1,')
(['1'], '')
>>> lineparse('"Yo"')
('Yo', '')
>>> lineparse('"Yo"', force_list=True)
(['Yo'], '')
>>> lineparse('''h, i, j, (h, i, ['hello', "f"], [], ([]),), k''')
(['h', 'i', 'j', ['h', 'i', ['hello', 'f'], [], [[]]], 'k'], '')
>>> lineparse('''h, i, j, (h, i, ['hello', "f"], [], ([]),), k''', recursive=False)
Traceback (most recent call last):
BadLineError: Line is badly built :
h, i, j, (h, i, ['hello', "f"], [], ([]),), k
>>> lineparse('fish#dog')
('fish', '#dog')
>>> lineparse('"fish"#dog')
('fish', '#dog')
>>> lineparse('(((())))')
([[[[]]]], '')
>>> lineparse('((((,))))')
Traceback (most recent call last):
BadLineError: Line is badly built :
((((,))))
>>> lineparse('hi, ()')
(['hi', []], '')
>>> lineparse('"hello", "",')
(['hello', ''], '')
>>> lineparse('"hello", ,')
Traceback (most recent call last):
BadLineError: Line is badly built :
"hello", ,
>>> lineparse('"hello", ["hi", ""], ""')
(['hello', ['hi', ''], ''], '')
>>> lineparse('''"member 1", "member 2", ["nest 1", ("nest 2", 'nest 2b', ['nest 3', 'value'], nest 2c), nest1b]''')
(['member 1', 'member 2', ['nest 1', ['nest 2', 'nest 2b', ['nest 3', 'value'], 'nest 2c'], 'nest1b']], '')
>>> lineparse('''"member 1", "member 2", ["nest 1", ("nest 2", 'nest 2b', ['nest 3', 'value'], nest 2c), nest1b]]''')
Traceback (most recent call last):
BadLineError: Line is badly built :
"member 1", "member 2", ["nest 1", ("nest 2", 'nest 2b', ['nest 3', 'value'], nest 2c), nest1b]]

list_stringify(inlist)

source code 

Recursively rebuilds a list - making sure all the members are strings.

Can take any iterable or a sequence as the argument and always returns a list.

Useful before writing out lists.

Used by makelist if stringify is set.

Uses the str function for stringification.

Every element will be a string or a unicode object.

Doesn't handle decoding strings into unicode objects (or vice-versa).

>>> list_stringify([2, 2, 2, 2, (3, 3, 2.9)])
['2', '2', '2', '2', ['3', '3', '2.9']]
>>> list_stringify(None)
Traceback (most recent call last):
TypeError: iteration over non-sequence
>>> list_stringify([])
[]

FIXME: can receive any iterable - e.g. a sequence >>> list_stringify('') [] >>> list_stringify('Hello There') ['H', 'e', 'l', 'l', 'o', ' ', 'T', 'h', 'e', 'r', 'e']

makelist(inlist, listchar='', stringify=False, escape=False, encoding=None)

source code 

Given a list - turn it into a string that represents that list. (Suitable for parsing by LineParser).

listchar should be '[', '(' or ''. This is the type of bracket used to enclose the list. ('' meaning no bracket of course).

If you have nested lists and listchar is '', makelist will automatically use '[' for the nested lists.

If stringify is True (default is False) makelist will stringify the inlist first (using list_stringify).

If escape is True (default is False) makelist will call quote_escape on each element before passing them to elem_quote to be quoted.

If encoding keyword is not None, all strings are decoded to unicode with the specified encoding. Each item will then be a unicode object instead of a string.

>>> makelist([])
'[]'
>>> makelist(['a', 'b', 'I can\'t do it', 'Yes you "can" !'])
'a, b, "I can\'t do it", \'Yes you "can" !\''
>>> makelist([3, 4, 5, [6, 7, 8]], stringify=True)
'3, 4, 5, [6, 7, 8]'
>>> makelist([3, 4, 5, [6, 7, 8]])
Traceback (most recent call last):
TypeError: Can only quote strings. "3"
>>> makelist(['a', 'b', 'c', ('d', 'e'), ('f', 'g')], listchar='(')
'(a, b, c, (d, e), (f, g))'
>>> makelist(['hi\n', 'Quote "heck\''], escape=True)
'hi&mjf-lf;, "Quote &mjf-quot;heck\'"'
>>> makelist(['a', 'b', 'c', ('d', 'e'), ('f', 'g')], encoding='UTF8')
u'a, b, c, [d, e], [f, g]'

csvread(infile)

source code 

Given an infile as an iterable, return the CSV as a list of lists.

infile can be an open file object or a list of lines.

If any of the lines are badly built then a CSVError will be raised. This has a csv attribute - which is a reference to the parsed CSV. Every line that couldn't be parsed will have [] for it's entry.

The error also has an errors attribute. This is a list of all the errors raised. Error in this will have an index attribute, which is the line number, and a line attribute - which is the actual line that caused the error.

Example of usage :

{+coloring} handle = open(filename) # remove the trailing '
' from each line
the_file = [line.rstrip('
') for line in handle.readlines()]

csv = csvread(the_file)

{-coloring}

>>> a = '''"object 1", 'object 2', object 3
...     test 1 , "test 2" ,'test 3'
...     'obj 1',obj 2,"obj 3"'''
>>> csvread(a.splitlines())
[['object 1', 'object 2', 'object 3'], ['test 1', 'test 2', 'test 3'], ['obj 1', 'obj 2', 'obj 3']]
>>> csvread(['object 1,'])
[['object 1']]
>>> try:
...     csvread(['object 1, "hello', 'object 1, # a comment in a csv ?'])
... except CSVError, e:
...     for entry in e.errors:
...         print entry.index, entry
0 Value is badly quoted: ""hello"
1 Comment not allowed :
object 1, # a comment in a csv ?

csvwrite(inlist, stringify=False)

source code 

Given a list of lists it turns each entry into a line in a CSV. (Given a list of lists it returns a list of strings).

The lines will not be ``

`` terminated.

Set stringify to True (default is False) to convert entries to strings before creating the line.

If stringify is False then any non string value will raise a TypeError.

Every member will be quoted using elem_quote, but no escaping is done.

Example of usage :

{+coloring} # escape each entry in each line (optional) for index in range(len(the_list)): the_list[index] = [quote_escape(val) for val in the_list[index]] # the_file = csvwrite(the_list) # add a '
' to each line - ready to write to file
the_file = [line + '

' for line in the_file]

{-coloring}
>>> csvwrite([['object 1', 'object 2', 'object 3'], ['test 1', 'test 2', 'test 3'], ['obj 1', 'obj 2', 'obj 3']])
['"object 1", "object 2", "object 3"', '"test 1", "test 2", "test 3"', '"obj 1", "obj 2", "obj 3"']
>>> csvwrite([[3, 3, 3]])
Traceback (most recent call last):
TypeError: Can only quote strings. "3"
>>> csvwrite([[3, 3, 3]], True)
['3, 3, 3']

_test()

source code 

Variables Details [hide private]

basestring

Value:
str,unicode                                                            
      

inquotes

Value:
\s*(".*?"|'.*?')(.*)                                                   
      

badchars

Value:
^[^'," \[\]\(\)#]+$                                                    
      

paramfinder

Value:
(?:'.*?')|(?:".*?")|(?:[^'",\s][^,]*)                                  
      

unquoted

Value:
([^#,"'\(\)\[\]][^#,\]\)]*)\s*([#,\)\]].*)?$