Python Programming, news on the Voidspace Python Projects and all things techie.

Python Quirks: Booleans and Regular Expressions

emoticon:info The best euphemism I've found for describing Resolver is that it is a 'numerical analysis tool'. This deliberately doesn't actually tell you what it is, and I'm still not allowed to.

Thankfully Resolver is 'kind-of-almost-nearly-sort-of' in the hands of our first private beta customer and we have started to write the marketing type descriptions of Resolver, so it shouldn't be long before I can reveal all [1].

In the last couple of days at work I have bumped into two Python quirks.

The first is with booleans. When processing certain data-sets we want to treat numbers specially.

I casually replaced the following code:

if type(value) in (int, long, float):

with (just because it looked neater and is slightly more flexible):

if isinstance(value, (int, long, float)):

Can you spot the bug here (the title of this entry is a clue) ?

Of course in Python, for booleans isinstance(value, int) returns True. To our disgrace, none of the unit tests picked up this breakage, but luckily a functional test did and so we have reverted the change.

The reason that booleans subclass integer, is because the initial implementation of True and False was simply as 1 and 0. Interestingly, on Python-Dev recently, Guido rejected a suggestion that this historical accident be removed in Python 3.0. Too many people use 'clever' tricks, like indexing lists with a boolean. To my mind this is a great reason to change this. People would have to write less obscure code instead. Smile

We could always make the int constructor do the right thing when passed a boolean so that someList[someBool] could always be rewritten someList[int(someBool)]. Anyway, rant over.

The next bump in the smooth highway of coding was when creating a very simple regular expression to match European style dates. We wanted to match occurrences of values that look a bit like 23/03/07. Despite the fact that writing regular expressions can be quite easy, I always have a sense of dark foreboding whenever I start writing them - like I am somehow entering a twisty maze that I may never emerge from. I suppose if I had ever written any Perl I would be used to that feeling. Cool

After firing up the good old Python manual, we decided that the \d{m,n} for matching groups of digits looked good, and we needed something like the following:

DATE_RE = re.compile(r'\d{1, 2}/\d{1, 2}/\d{1, 4}')

Needless to say it didn't work, and it took a few minutes to work out why.

Obviously the answer is that regular expressions are whitespace sensitive [2], {m, n} is not the same as {m,n} (which is totally in violation of PEP 8 of course - and just looks wrong to me). Still, you learn something new every day, no matter how hard you try not to.

[1]And won't that be exciting. Embarassed
[2]And yes, I do know about re.VERBOSE just to forestall all you keyboard happy commenters.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2007-03-23 14:13:52 | |

Categories: , ,


Upgrading to Visual Studio Express

emoticon:black_hat At Resolver we are in the process of upgrading all our machines from Visual Studio Professional to Visual Studio C# Express (the free edition).

Until recently all had licenses for the professional version, by virtue of our membership in the Microsoft Partnership program and the cheap accompanying membership of MSDN. One of the conditions of our continued membership in the partnership program was that we hand over a copy of our application to a Microsoft approved 'tester' for scrutiny. This didn't seem like such a hot idea to us, so we bowed out gracefully. Razz

The express edition does just about (continue reading) everything we need, and renewed licenses for Visual Studio Professional would have cost us around two and a half grand, or six grand for full MSDN membership. The only real downside is that the Visual Studio IronPython integration doesn't work with the express edition. Apart from being fun [1], we weren't actually using this; so it was no big loss.

We did have a few bumps in the transition, but I think we've got it nailed now. We don't actually have a great deal of C# in our codebase. We have our executable, a few dialogs created with the forms designer, and part of our test framework needs to use unmanaged code which we have to drop down into C# to access. We deliberately chose to generate C# for our dialogs, because obfuscating them in this way means much less of a temptation for anyone to manually tweak the generated code. Wink

The first hurdle we hit in the changeover was that Visual Studio Express doesn't include rc.exe the resource compiler (which we trigger as part of our build process). This is for compiling resources (like toolbar icons) into an executable or dll. You are supposed to be able to do this from the Visual Studio interface, but we couldn't get it to work. Perhaps with a bit more swearing we could have fixed this, but rc.exe is included in Visual Studio C++ Express, and the license permits you to install both on a single machine: so we do that.

Uuidgen.exe is also not included. Rewriting a Python version of this was trivial.

The most fun hurdle we found was when I tried to uninstall Visual Studio and the SDK. For some reason both Christian and I uninstalled Visual Studio first, and then tried to uninstall the SDK. When you try to uninstall the SDK it informs you that the SDK requires Visual Studio and the uninstaller exits. Great dependency tracking there Microsoft. I'm afraid the only solution we could find was to reinstall Visual Studio Professional, uninstall the SDK and then uninstall Visual Studio again. sigh

[1]You can design GUIs with the Forms Designer and Visual Studio will generate IronPython code for you.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2007-03-23 13:51:40 | |

Categories: , , ,


Three Screens McProgrammer

emoticon:podcast Developing with two monitors makes an incredible difference to the programming experience. Not having to shuffle windows around whilst working is a great productivity boost. 19" monitors also provide a real screen real estate boost over 17" ones. A while ago I bought a 19" monitor to go along with my 17" one. It's great (and it's still amazing how quickly the price drops on these things), a Samsung Syncmaster.

The only problem is that I like to watch a movie whilst I write or develop [1]. sigh, back to one monitor for developing on.

Last week I bought a new 20" wide-screen for the 'movie screen'. (Another Samsung as it goes, and again - no complaints). My graphics card will only support two monitors.

I've fallen in love with the Matrox Triple Head which fools your graphics card into thinking that you have one super wide monitor, and you plug three 19" monitors into the back of the triple head. It's a bit pricey and I'm not ready to invest that much in 19" monitors anyway. In a few weeks they'll be obsolete. Smile

Instead I bought a different piece of Matrox kit from ebay, the Millenium G200. This is a 16MB PCI graphics card, which in its day was the cutting edge of graphics technology. It cost me £2.98. Smile

With this graphics card in as well (Windows was good enough to install the driver straightforwardly) I have three monitors, two for developing and one for entertainment. Smile

Three monitors on my desk

Note

I've also just discovered a really useful utility for working with multiple monitors on Windows: Ultramon.

The smart taskbar it provides is great, and it can also sort proportionally resizing windows that you move between screens with different resolutions.

Shame it costs $40, but it seems worth it.

A bit over the top, but still useful. happy sigh

[1]And how come these 42" plasma screens have a lower resolution than a 19" TFT monitor ? (Not that I can afford one.)

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2007-03-19 23:21:19 | |

Categories: ,


Resolver Internships

emoticon:key There are two places for interns available with Resolver this summer.

Here is the skinny:

We are a software company, building a new kind of numerical analysis tool, using Python and the .NET framework in an Agile development environment. We are offering two internships for the summer of 2007. Successful candidates will be fully-fledged members of the development team for the duration of their stay - we follow the XP practice of pair programming, so mentoring will be continuous and intense, giving an excellent start in professional software engineering for those who want to make it their career.

Experience in Python and .NET isn't necessary, but could be useful. If you are interested then please see the Resolver Jobs Page. You'll need to send us a CV with some background on your programming experience, including any personal projects you have worked on. We're based in London by the way.

We're great to work with, and you'll get paid for hacking on an awesome project. Smile

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2007-03-19 20:58:08 | |

Categories: , ,


A Read Only Proxy Class and Proxying Magic Method Access

emoticon:computer Proxying attribute access with __getattr__ can be a useful technique at times. One example usage is to restrict access to the private API of an object and only allow access to the public API, or to make all attributes on an object read only.

With old style classes this is straightforward:

class ReadOnlyProxy:
    def __init__(self, obj):
        self.__dict__[None] = obj

    def __getattr__(self, name):
        return getattr(self.__dict__[None], name)

    def __setattr__(self, name, value):
        raise AttributeError("Attributes can't be set on this object")

>>> r = ReadOnlyProxy(3)
>>> r == 3
True
>>>

The original object is still available in the __dict__ of our proxy object, but not directly.

We can hide the original object completely, by using a closure (and a factory function) [1]:

def GetReadOnlyProxy(obj):
    class ReadOnlyProxy:
        def __getattr__(self, name):
            return getattr(obj, name)

        def __setattr__(self, name, value):
            raise AttributeError("Attributes can't be set on this object")
    return ReadOnlyProxy()

>>> r = GetReadOnlyProxy(3)
>>> r == 3
True
>>>

Instances returned by GetReadOnlyProxy have access to the object they are proxying through the closure, but given the proxy object you can't back to the original to modify it.

However, there is a problem. If we make ReadOnlyProxy a new style class this approach will suddenly be partly broken. On new style classes, magic methods (like __eq__ for equality comparison) are looked up on the class rather than the instance; they don't go through __getattr__. object has a default definition for most of the magic methods, and so these are used rather than being proxied.

def GetReadOnlyProxy(obj):
    class ReadOnlyProxy(object):
        def __getattr__(self, name):
            return getattr(obj, name)

        def __setattr__(self, name, value):
            raise AttributeError("Attributes can't be set on this object")
    return ReadOnlyProxy()

>>> r = GetReadOnlyProxy(3)
>>> r == 3
False
>>>

Unfortunately there is no simple way round this. Overriding __getattr__ on the metaclass doesn't help (I've tried!), since methods that are defined on a baseclass should be used in general. The situation is the same with __getattribute__, which is similar to __getattr__ but is used to implement the descriptor protocol.

As old style classes are soon to disappear, finding a solution would be nice. Smile

One alternative is to provide a generic proxy function and attach these methods to the proxy class where necessary. Functions attached to a class become methods on instances. We can loop over all the magic methods (functions whose name starts and ends with a double underscore), remembering to skip a few of them; like '__class__' and '__bases__' and so on. What's worse, we need to capture the name we are patching which is a loop variable; so we need to use an inner closure! It looks ungainly, but works fine :

def GetReadOnlyProxy(obj):
    class ReadOnlyProxy(object):
        def __getattr__(self, name):
            return getattr(obj, name)

        def __setattr__(self, name, value):
            raise AttributeError("Attributes can't be set on this object")

    for name in dir(obj):
        if not (name[:2] == '__' == name[-2:]):
            continue
        if name in ('__new__', '__init__', '__class__', '__bases__'):
            continue
        if not callable(getattr(obj, name)):
            continue

        def GetProxyMethod(name):
            def ProxyMethod(self, *args, **keywargs):
                return getattr(obj, name)(*args, **keywargs)
            return ProxyMethod

        setattr(ReadOnlyProxy, name, GetProxyMethod(name))

    return ReadOnlyProxy()

>>> r = GetReadOnlyProxy(3)
>>> r == 3
True
>>>

If you can think of a more elegant looking solution, which is as effective, then let me know. Smile

[1]Thanks to Ka Ping Yee for showing me this technique at PyCon.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2007-03-17 22:48:42 | |

Categories: ,


Threading with Queue

emoticon:cross I'm currently working on rest2web to add caching. It will persist the data-structures it generates, and so be able to process new (and modified) files without having to build the whole site. This will dramatically increase the speed with which rest2web can process site updates.

The change is going well, but is quite fiddly. In the meantime I thought I had another good idea for improving the speed.

rest2web doesn't write out pages which haven't changed, but it uses a very naive algorithm to work out whether the page has changed :

if (thepage != open(targetfile, 'r').read()):
    open(targetfile, 'w').write(thepage)

If only a few files have changed, reading and comparing is still a lot quicker than just writing every file (tested on the Voidspace site).

We currently need to process every page (which generates the page), because processing it builds the data structures that allow other pages (including index pages) to have all the correct links in the sidebars and breadcrumbs.

Both reading and writing the file are relatively expensive blocking operations. This chunk of code needs access to no other state, so it could easily be spun off on another thread. Their is a cost involved in the context switch between threads (only one runs at a time), but the interpreter lock is released around blocking operations like reads and writes.

A common way of doing this sort of thing in Python is to use Queue. I've not written a lot of threaded code for Python, so my colleague Christian walked me through writing it.

The snippet below defines a class called WorkerQueue. When you instantiate it, it creates a worker pool of ten threads. You implement the method _do to perform the task. You call the method do to put a new job on the queue. These are pulled off the queue one by one as threads finish tasks. When you need to exit, call stop which then blocks until all the outstanding threads have finished.

from threading import Thread
from Queue import Queue


STOP = object()


class WorkerQueue(object):

    def __init__(self):
        self.queue = Queue()
        self.pool = []

        for _ in range(10):
            self.pool.append(Thread(target=self.threadloop))

        for thread in self.pool:
            thread.start()


    def do(self, *args, **kwArgs):
        self.queue.put((args, kwArgs))


    def stop(self):
        self.queue.put(STOP)
        for thread in self.pool:
            thread.join()


    def threadloop(self):
        while True:
            args = self.queue.get()
            if args is STOP:
                self.queue.put(STOP)
                break
            self._do(*args[0], **args[1])


    def _do(self, *args, **kwArgs):
        pass

When I added this to rest2web it was around 12% faster building the docs. However, when building voidspace with only a few changed files it made almost no difference at all. This is because writing is the most expensive operation: and when only a few files have changed their aren't many writes and so not much benefit to offset the cost of context switching.

I didn't leave the code in rest2web, I figured the small benefit wasn't worth going multithreaded. Smile

This is still a nice little example of using Queue and a thread pool though.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2007-03-17 01:08:32 | |

Categories: , ,


The Resolver Whitepaper

emoticon:html At Resolver we have been working with some interesting technologies. Resolver the application goes out to our first live customers next week and our patent application is now in [1].

One of the great things that has come out of our work is a white paper. My colleague Jonathan Hartley is hosting it :

Many thanks to all who contributed, especially SCIgen.

[1]Apparently you can patent software in the UK, but the rules are different to US software patents. Our patents are purely defensive in any case (they provide ammunition in case anyone decides to try and accuse us of patent violations).

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2007-03-17 00:40:01 | |

Categories: , ,


Fun With Metaclasses: Indexing Classes

emoticon:ir_scope .NET provides a list like class called the array. As you might expect for a statically typed platform, these are typed. So you need to specify the type of object that the array will contain.

IronPython allows you to do this by overloading normal Python syntax :

from System import Array

values = (1, 2, 3, 4)
intArray = Array[int](values)

This creates an integer array, using a syntax that looks as if you are indexing the class.

You can also do interesting things with class indexing from CPython. Indexing, for both the mapping and sequence protocols is implemented with the magic method __getitem__.

Of course if you define __getitem__ on your class it applies to instances, and not for indexing the class itself. However, magic methods are not looked up on the instance but on the type. Classes, of course, are instances of their metaclass. This means that if we define a metaclass (inherit from type), then indexing the class will call __getitem__ on the metaclass. Smile

>>> class meta(type):
...     def __getitem__(self, name):
...         print name
...
>>> class Test(object):
...      __metaclass__ = meta
...
>>> Test['hello']
hello
>>> Test['goodbye']
goodbye
>>>

Not obviously useful immediately, but a good trick to have up your sleeve.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2007-03-17 00:32:55 | |

Categories: , ,


Hosted by Webfaction

Counter...