Python Programming, news on the Voidspace Python Projects and all things techie.

rest2web & Another terrible Idea

emoticon:waffle As I mentioned I've been hacking on rest2web a bit.

I've started to add the functionality that will make Next and Previous links possible for pages.

Some of what I'd like to do, specifically auto-building contents pages that include the directories below the current one, would require changing rest2web to a two pass build system.

In the first pass accumulate all the data & metadata for pages, then build them all.

It occurs to me that this would be an opportunity to savagely refactor. Currently rest2web builds datastructures that represent the directories / section / site. These are all built with nested list and dictionary datatypes.

When generating the HTML output it has to regenerate these data structures virtually for every page. This is because each page can have a target location that is actually outside of the directory being built. rest2web needs to recalculate relative links from the current page to every other location. (It also handles making sure that the information about each page is available in the output encoding of the page being handled. This means you can use several different output encodings throughout your site.)

This has the great advantage that you aren't tied to the directory structure that rest2web uses to represent sites. Voidspace uses this facility. It allows you to include in indexes, pages that are actually somewhere else in the site. For example the Library Pages are included in the Cyberpunk section. The template code for the index page is trivially simple.

Recalculating these each time is expensive, and makes the code very fiddly (i.e. difficult to understand). It would be better to generate a tree structure for the site, with custom objects representing the pages and indexes. It would make the code simpler, but it would mean replacing the indextree data structure with something more useful. Hmm...... Neutral

This of course is a terrible idea. Making the code more readable, and more useful, is good - but it's a lot of work.

Note

In practise it may not have that much effect. The relative paths would still need to be calculated. A custom object could possibly do this conversion on demand though.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2006-01-27 11:12:01 | |

Categories: ,


ConfigObj & a Terrible Idea

emoticon:drive ConfigObj has also been spotted in the wild in a few interesting situations. Open Source projects that I've seen using it now include :

  • Bazaar.

    Bazaar is the Python distributed VCS.

    ConfigObj is used to read bazaar.conf and branches.conf.

  • Planet Plus

    A new web application version of Planet, the web aggregator.

  • NeuroImaging in Python

    BrainSTAT is a project with the ultimate goal to produce a platform-independent python environment for the analysis of brain imaging data.

  • Gruik

    Gruik is a free software network packet sniffer.

This is cool because ConfigObj is my favourite project. Not only was it a lot of work, but I genuinely think it provides a simple interface to a lot of functionality. Some of these projects are also using the more advanced features of ConfigObj - like validation and the walk method.

It has occurred to me that the ConfigObj API and syntax is basically compatible with ConfigParser, which is the Python standard library.

Hence the terrible idea. Surprised

It would be a lot of work, but completely feasible, to implement a ConfigParser compatibility layer on top of ConfigObj. This could retain backwards compatibility with ConfigParser, but add all the ConfigObj features like nested-sections and list values. This would address a lot of the issues raised in the ConfigParser Shootout.

So I've donned my flame-proof pants (foolhardy chap that I am) and suggested it on python-dev.

I said something like :

In the past there has been some discussion about a new module to replace ConfigParser. Most notably at http://wiki.python.org/moin/ConfigParserShootout

Specific issues that could be addressed include :

  • Simpler API
  • Nested subsections
  • List values
  • Storing/converting datatypes
  • Config file schema
  • Keeps track of order of values

Plus other issues.

I'm the (co-)author of ConfigObj - http://www.voidspace.org.uk/python/configobj.html

This is a reasonably mature project (now in it's fourth incarnation), and is being used in projects like Bazaar and Planet Plus.

It occurs to me that the ConfigObj API and syntax is almost fully compatible with ConfigParser.

It would be possible to extend to the ConfigObj API to be backwards compatible with ConfigParser. This would bring the added benefits of ConfigObj, without needing to add an extra module to the standard library.

Well nearly. ConfigObj supports config file schema with (optional) type conversion, through a companion module called validate. This could be included or left as an added option.

Anyway. If this stands a chance of acceptance, I'll write the PEP (and if accepted, do the work - which is not inconsiderable).

Summary of ConfigObj

ConfigObj is a Python 2.2 compatible config file parser. It's major feature is simplicity of use.

It reads (and writes) INI file like config files and presents the members using a dictionary interface.

The order of keys/sections is preserved, and it allows nested subsections to any level :

e.g.

key = value
    [section]
    key = value
      [[sub-section]]
      key = value

It is fully documented with a barrage of doctests.

All comments in the config file are also preserved as attributes of the object, and will be written back out. This can be useful for including comments in programatically generated config files.

It is integrated with a powerful validation system.

Difficulties & Differences

A ConfigObj instance is a sub-class of the dictionary datatpe. This means that the get method of ConfigParser clashes.

ConfigObj allows values in the root section (why not ?).

ConfigObj doesn't support line continuations (it does allow multi-line values through the use of triple quoted strings).

ConfigObj currently only allows '=' as a valid divider.

Creating ConfigParser (and related classes) compatibility is a big job.

Solution

All of these problems (if deemed necessary) can be resolved. Either through options, or just by extending the ConfigObj API. I'm happy to put the work in.

Comments ?

Note

I've had no reply to this suggestion on the Python-Dev list. However I have had some feedback from the Bazaar team on improvements I could make in this direction.

This will probably include proper unicode support.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2006-01-26 20:53:21 | |

Categories: ,


rest2web in Action

emoticon:eyeballz rest2web has a new user, Andrew Ittner.

Thanks to his suggestions I'm adding a couple of new ways to display the time/date a page was modified. More exciting changes to follow soon, I promise. Very Happy

I'm also updating the docs. I have compiled a list of sites (all the ones I'm aware of) that use rest2web :

Quote

I just converted my website from using pyblosxom (even though it's not a weblog) to your rest2web. Great package, thanks a lot for sharing it!

-- Christian of Projectpipe

I also use it to build the docs for :

Now does anyone want to take pity on me and create a better HTML template for rest2web, Firedrop2 and Movable Python ? Confused

If you fancy the task, the template is in the rest2web distribution (template.txt in the docs folder).

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2006-01-26 10:41:13 | |

Categories: ,


Akismet Module

emoticon:note It looks like my Akismet Module is being used in a couple of interesting places.

Note

Akismet is a web service that helps prevent comment spam.

akismet.py is a Python interface to the API.

  • Trac Sandbox Spam Filter

    Okay, so it's still in the sandbox, but Trac is a major project.

    Trac is an enhanced wiki and issue tracking system for software development projects.

    It provides a web interface to Subversion repositories.

  • Akizmet

    This provides a wrapper round akismet.py, so that it can be used as a Zope product. Smile

It's always nice to see my code being used. Very Happy

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2006-01-24 12:25:47 | |

Categories: ,


Trapping print From a Single Namespace

emoticon:dove I've recently had cause to want to trap print statements made from an external module.

You can replace sys.stdout with a custom object that does just this - trap the output of print statements. This is the approach StandOut takes. It traps print statements and logs them to a file.

The problem is that there is only one sys.stdout. If you replace it with a custom object then all print statements are caught, and I (well... we) wanted to just trap them from within a single module.

We could have executed the code in another process, but this proved very slow.

It wasn't our code, so we didn't want to have to go through it and remove all the use of print.

At first I wondered if we could shadow the sys module from within that namespace. (Rebind the name sys to a different module in that namespace.) Because sys is a builtin module (compiled into CPython and always available), it didn't work. Calling sys.stdout.write directly would use the new object; using print still used the normal sys.stdout. Sad

I finally found a solution. Because sys.stdout.write is a method call, the namespace it is called from can always be discovered in the globals of the previous stack frame.

If our custom stdout object checks what module that is, it can filter calls to print. Great.

The custom output object needs to look something like :

class StdOut:
    """Replaces a stream with a custom version."""
    def __init__(self, stream, modulename):
        self.stdout = stream
        self.modulename = modulename

    def __getattr__(self, attribute):
        if not self.__dict__.has_key(attribute) or attribute == '__doc__':
            return getattr(self.stdout, attribute)
        return self.__dict__[attribute]

    def write(self, inline):
        if sys._getframe(1).f_globals.get('__name__') == self.modulename:
            # log or whatever
        else:
            self.stdout.write(inline)

    def writelines(self, inline):
        for line in inlines:
            self.write(line)

The magic is done in the first line of the write method :

sys._getframe(1).f_globals.get('__name__')

If you wanted to trap everything from, for example module, you could do something like :

sys.stdout = StdOut(sys.stdout, 'module')
sys.stderr = StdOut(sys.stderr, 'module')

Anything now sent to sys.stdout or sys.stderr from the module namespace, will be handled by whatever code you put in the write method of StdOut.

The only thing that worries me is the Python documentation on sys._getframe says :

This function should be used for internal and specialized purposes only.

I wonder if this solution is robust and thread stable ?

Update

A slightly more stable way of doing this might be to call inspect.currentframe().

This returns (guess what) the current frame. You can then use the f_back attribute to get a reference to the previous frame.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2006-01-23 16:23:01 | |

Categories: ,


A Big Change

emoticon:acrobat Over the weekend something very big has happened in the lives of Delia & me.

We have moved out of community, and in with a couple of friends into Kingsthorpe Northampton. If you want to know more of the details then you can read the entry in my Personal Blog.

We still want to remain committed members of the Church. It does mean I should have more time for some of the projects I've been stalling on, and I also have a permanent internet connection. This should make lots of things easier. Ho hum. Razz

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2006-01-23 16:10:45 | |

Categories:


Hosted by Webfaction

Counter...