=========================== Handling Lists and Quotes =========================== -------------------------------------- Functions for Lists, Quotes and CSVs -------------------------------------- :Author: Michael Foord :Contact: fuzzyman@voidspace.org.uk :Version: 1.4.0 :Date: 2005/08/28 :License: `BSD License`_ [#]_ :Online Version: `listquote online`_ :Support: `Mailing List`_ .. _Mailing List: http://groups.google.com/group/pythonutils/ .. _`listquote online`: http://www.voidspace.org.uk/python/listquote.html .. _BSD License: BSD-LICENSE.txt .. contents:: listquote Manual Introduction ============ This module provides functions for turning lists into strings - and back again. It properly handles quoting and unquoting of elements and can even parse recursive lists. Because a CSV (comma separated value) file is basically a list - the module includes functions for easily reading and writing CSVs. This module is a part of the pythonutils_ [#]_ package. Many of the modules in this package can be seen at the `Voidspace Modules Page`_. .. note:: The interfaces to many of the functions have changed since version 1.3.0. See the CHANGELOG_ for a summary of the changes. .. _pythonutils: pythonutils.html .. _Voidspace Modules Page: http://www.voidspace.org.uk/python/modules.shtml Downloading ----------- As well as being included in the pythonutils_ package, you can download **listquote** directly from : * `listquote.py (30k) `_ Quoting and Unquoting ===================== These functions handle the quoting and unquoting of elements in a list. Using standard functions means that you can easily process lists without having to worry about whether values are properly quoted. elem_quote ---------- :: elem_quote(member, nonquote=True, stringify=False, encoding=None) Simple method to add the most appropriate quote to an element - either single quotes or double quotes. If member contains ``\n`` it can't be quoted, so a ``QuoteError`` is raised. If you want to quote multiline value use the quote_escape_ function. If ``nonquote`` is set to ``True`` (the default), then if member contains none of ``'," []()#;`` then it isn't quoted at all. If member contains both single quotes *and* double quotes then a ``QuoteError`` is raised. If you want to quote values like this, use the quote_escape_ function. If ``stringify`` is set to ``True`` (the default is ``False``) then non string (unicode *or* byte-string) values will be first converted to strings using the ``str`` function. Otherwise elem_quote raises a ``TypeError``. If ``encoding`` is not ``None`` and member is a byte string, then it will be decoded into unicode using this encoding. unquote ------- :: unquote(inline, fullquote=True, retain=False) Unquote a value. If the value isn't quoted it returns the value. If the value is badly quoted it raises ``UnQuoteError``. If retain is ``True`` (default is ``False``) then the quotes are left around the value (but leading or trailing whitespace will have been removed). If fullquote is ``False`` (default is ``True``) then unquote will only unquote the first part of the ``inline``. If there is anything after the quoted element, this will be returned as well (instead of raising an error). In this case the return value is ``(value, rest)``. quote_escape ------------ :: quote_escape(value, lf='&mjf-lf;', quot='&mjf-quot;') Escape a string so that it can safely be quoted. You should use this if the value to be quoted *may* contain line-feeds or both single quotes and double quotes. If the value contains ``\n`` then it will be escaped using ``lf``. By default this is ``&mjf-lf;``. If the value contains single quotes *and* double quotes, then all double quotes will be escaped using ``quot``. By default this is ``&mjf-quot;``. quote_unescape -------------- :: quote_unescape(value, lf='&mjf-lf;', quot='&mjf-quot;'): Unescape a string escaped by ``quote_escape``. If it was escaped using anything other than the defaults for ``lf`` and ``quot`` you must pass them to this function. Handling Lists ============== List Parsing ------------ There has been a major change in version 1.4 of *listquote*. The biggest function of *listquote* is to parse lists that are represented as strings. These will usually have been read from a file - for example a CSV file, or config or logging data. The feature is that these lists can include *nested lists*. This functionality is now provided by an object called ``LineParser``. There is also a simple function called ``lineparse`` that can use it. This provides a measure of backwards compatibility and is also convenient for single line use. LineParser ~~~~~~~~~~ ``LineParser`` is an object that parses lists from strings. There are several options that control it's behaviour. It will also remove comments from the end of lines. .. raw:: html {+coloring} p = LineParser(options=None, **keywargs) nested_list = ''' "member 1", "member 2", ["nest 1", ("nest 2", 'nest 2b', ['nest 3', 'value'], nest 2c), nest1b]''' p.feed(nested_list) (['member 1', 'member 2', ['nest 1', ['nest 2', 'nest 2b', ['nest 3', 'value'], 'nest 2c'], 'nest1b']], '') # new_list = [] for line in list_of_lines: new_list.append(p.feed(line)[0]) {-coloring} ``LineParser`` also has a ``reset`` method. This allows you to reconfigure your parser object without having to create a new one. .. raw:: html {+coloring} p.reset(new_options=None, **keywargs) # second_list = [] for line in second_list_of_lines: second_list.append(p.feed(line)[0]) {-coloring} Both creating a new parser, and the reset method, take the same arguments. These are a set of options that control the behaviour of the ``feed`` method. The options can be passed in as a dictionary - or as explicit keyword arguments that will override any options in the dictionary. The options (with defaults shown) are : :: 'recursive': True, 'comment': True, 'retain': False, 'force_list': False, 'csv': False As you can see from the above examples, it is the ``feed`` method that parses each line - and returns the parsed list. ``feed`` uses the options set in the parser object. The return value depends on the options set in the parser. .. note:: This can be *slightly* confusing for first time users. The *default* is for ``LineParser.feed`` to return a *tuple* for each line. This contains ``(outvalue, comment)``. ``outvalue`` can be a single item or a list. See the ``comment`` and ``force_list`` options for ways to alter this behaviour. Can parse lists - including nested lists. If ``recursive`` is ``False`` then nested lists will cause a ``BadLineError``. If ``comment`` is ``False`` feed returns ``outvalue`` If ``comment`` is ``True`` it returns ``(outvalue, comment)``. (Even if the comment is just ``''``). If ``force_list`` is ``False`` then ``outvalue`` may be a list or a single item. If ``force_list`` is ``True`` then ``outvalue`` will always be a list - even if it has just one member. List syntax : * Comma separated lines ``a, b, c, d`` * Lists can optionally be between square or ordinary brackets - ``[a, b, c, d]`` - ``(a, b, c, d)`` * Nested lists *must* be between brackets - ``a, [a, b, c, d], c`` * A single element list can be shown by a trailing quote - ``a,`` * An empty list is shown by ``()`` or ``[]`` Elements can be quoted with single or double quotes (but can't contain both). The line can optionally end with a comment (preeded by a '#'). This depends on the ``comment`` attribute. e.g. :: "Member 1", "Member 2", "Member 3" # This is a comment If the line is badly built then this method will raise one of : :: CommentError, BadLineError, UnQuoteError Using the ``csv`` option is the same as setting : :: 'recursive': False 'force_list': True 'comment': False The csv setting provides an easy way of handling normal csv files - including the proper handling of quoted elements. lineparse ~~~~~~~~~ :: lineparse(inline, options=None, **keywargs) A compatibility function that mimics the old lineparse. Also more convenient for single line use. .. note:: It still uses the new ``LineParser`` - and so takes the same keyword arguments as that. ``parsed_line = lineparse(some_line)`` is the equivalent of : .. raw:: html {+coloring} p = LineParser() parsed_line = p.feed(some_line) {-coloring} Lists to Strings ---------------- The next two functions are for turning lists into strings. Presumably so they can then be written out to a file. list_stringify ~~~~~~~~~~~~~~ :: list_stringify(inlist) Recursively rebuilds a list - making sure all the members are strings. Can take any iterable or a sequence as the argument and always returns a list. Useful before writing out lists. Used by makelist if stringify is set. Uses the ``str`` function for stringification. Every element will be a string or a unicode object. Doesn't handle decoding strings into unicode objects (or vice-versa). makelist ~~~~~~~~ :: makelist(inlist, listchar='', stringify=False, escape=False, encoding=None) Given a list - turn it into a string that represents that list. (Suitable for parsing by ``LineParser``). listchar should be ``'['``, ``'('`` or ``''``. This is the type of bracket used to enclose the list. (``''`` meaning no bracket of course). If you have nested lists and listchar is ``''``, makelist will automatically use ``'['`` for the nested lists. If stringify is ``True`` (default is ``False``) makelist will stringify the inlist first (using ``list_stringify``). If ``escape`` is ``True`` (default is ``False``) makelist will call ``quote_escape`` on each element before passing them to ``elem_quote`` to be quoted. If encoding keyword is not ``None``, all strings are decoded to unicode with the specified encoding. Each item will then be a unicode object instead of a string. CSV Functions ============= The canonical list format file is the CSV - the comma separated values file. ``LineParser`` (and the other functions in listquote) provides a trivially simple way to read and write these files (including properly handling quotes). csvread ------- :: csvread(infile) Given an infile as an iterable, return the CSV as a list of lists. infile can be an open file object or a list of lines. If any of the lines are badly built then a ``CSVError`` will be raised. This has a ``csv`` attribute - which is a reference to the parsed CSV. Every line that couldn't be parsed will have ``[]`` for it's entry. The error *also* has an ``errors`` attribute. This is a list of all the errors raised. Error in this will have an ``index`` attribute, which is the line number, and a ``line`` attribute - which is the actual line that caused the error. Example of usage : .. raw:: html {+coloring} handle = open(filename) # remove the trailing '\n' from each line the_file = [line.rstrip('\n') for line in handle.readlines()] csv = csvread(the_file) {-coloring} csvwrite -------- :: csvwrite(inlist, stringify=False) Given a list of lists it turns each entry into a line in a CSV. (Given a list of lists it returns a list of strings). The lines will *not* be ``\n`` terminated. Set stringify to ``True`` (default is ``False``) to convert entries to strings before creating the line. If stringify is ``False`` then any non string value will raise a ``TypeError``. Every member will be quoted using ``elem_quote``, but no escaping is done. Example of usage : .. raw:: html {+coloring} # escape each entry in each line (optional) for index in xrange(len(the_list)): the_list[index] = [quote_escape(val) for val in the_list[index]] # the_file = csvwrite(the_list) # add a '\n' to each line - ready to write to file the_file = [line + '\n' for line in the_file] open(filename, 'w').writelines(the_file) {-coloring} Exceptions ========== These are the exceptions used by listquote. :: ListQuoteError(SyntaxError) Base class for errors raised by the listquote module. It is a subclass of ``SyntaxError`` - so you can trap for that without *having* to import these exceptions into your namespace. :: QuoteError(ListQuoteError) This value can't be quoted. :: UnQuoteError(ListQuoteError) The value is badly quoted. :: BadLineError(ListQuoteError) A line is badly built. :: CommentError(BadLineError) A line contains a disallowed comment. A subclass of ``BadLineError``. :: CSVError(ListQuoteError) The CSV File contained errors. Currently only used by csvread_. CHANGELOG ========= See the source code for full CHANGELOG and TODO/ISSUES. Changes since version 1.3.0 : * Greater use of regular expressions for added speed * Re-implemented ``lineparse`` as the ``LineParser`` object * Added doctests * Custom exceptions * Changed the behaviour of ``csvread`` and ``csvwrite`` * Removed the CSV ``compare`` function and the ``uncomment`` function * Only ``'#'`` allowed for comments * ``elem_quote`` raises exceptions * Changed behaviour of ``unquote`` * Added ``quote_escape`` and ``quote_unescape`` * Removed the ``uni_conv`` option in the CSV functions .. note:: These changes are quite extensive. If any of them cause you problems then let me know. I can provide a workaround in the next release. Footnotes ========= .. [#] Online at http://www.voidspace.org.uk/python/license.shtml .. [#] Online at http://www.voidspace.org.uk/python/pythonutils.html .. note:: Rendering this document with docutils also needs the textmacros module and the PySrc CSS stuff. See http://www.voidspace.org.uk/python/firedrop2/textmacros.shtml .. raw:: html

Certified Open Source