Total Ordering: Class Decorator for Filling in Rich Comparison Methods When Only One is Implemented
Raymond Hettinger managed to provide several of the highlights at PyCon UK. During a couple of his talks he extolled the virtues of class decorators which are new to Python 2.6 & 3.0. He also talked about how Python 3 fixes the Python 2.X wart of objects of arbitrary types being comparable and sortable.
In Python 3 comparing arbitrary types throws an error. Raymond challenged the audience to come up with a class decorator that takes a class with any one of the rich comparison methods (except for __eq__ or __ne__) and automatically fill in all the others.
Christian Muirhead and Menno Smits took him up on the challenge and created one - and its very cool. Christian rarely blogs so he gave it to me. I've posted it to the Python Cookbook where you can post comments, corrections and suggestions.
UPDATE: I've modified the recipe to provide two decorators and remove the need to pass in an argument. I've modified the description below to reflect this change.
total_ordering and force_total_ordering are class decorators for Python 2.6 & Python 3.
They provides all the rich comparison methods on a class by defining any one of '__lt__', '__gt__', '__le__', '__ge__'.
They assumes that objects will only be compared with objects of the same type (true for Python 3 in ordering / sorting operations).
total_ordering fills in all unimplemented rich comparison methods, assuming at least one is implemented. __lt__ is taken as the base comparison method on which the others are built, but if that is not available it will be constructed from the first one.
force_total_ordering does the same, but having taken a comparison method as the base it fills in all the others - this overwrites additional comparison methods that may be implemented, guaranteeing consistent comparison semantics.
@total_ordering
class Something(object):
def __init__(self, value):
self.value = value
def __lt__(self, other):
return self.value < other.value
It also works with Python 2.5, but you need to do the wrapping yourself:
class Something(object):
def __init__(self, value):
self.value = value
def __lt__(self, other):
return self.value < other.value
Something = total_ordering(Something)
One of the nice things about class decorators is that they can be used for things that you would otherwise use a metaclass for. It would be easy to modify this recipe to make it a metaclass for Python 2.X where X < 6.
I did actually have to tinker with the code that Christian sent to me. It worked fine with Python 2.5, but blew up horribly with Python 2.6 and Python 3. After some digging around I worked out that the reason it was failing is that it was checking if a class defines a comparison method with hasattr(cls, method_name). This works fine with Python 2.5, but in Python 2.6 / 3.0 object provides explicit default implementations of the rich comparison methods. The test was always returning True...
As far as I can tell (and I may be missing something obvious), other than by parsing the repr, there is no way to tell if a method fetched from a class is really the implementation from object:
Python 2.6 (trunk:66714:66715M, Oct 1 2008, 18:36:04) [GCC 4.0.1 (Apple Computer, Inc. build 5370)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> class X(object): pass ... >>> X.__lt__ <method-wrapper '__lt__' of type object at 0x5baea0> >>> X.__lt__ == object.__lt__ False
(This actually returns True for Python 3 which is nice.)
The answer took me longer than it should have done to track down. The recipe includes a _has_method function that takes a class and searches the method resolution order (the class plus its base classes) to see if the class defines or inherits a method (so long as the method isn't inherited from object).
if _sys.version_info[0] == 3:
def _has_method(cls, name):
for B in cls.__mro__:
if B is object:
continue
if name in B.__dict__:
return True
return False
else:
def _has_method(cls, name):
for B in cls.mro():
if B is object:
continue
if name in B.__dict__:
return True
return False
Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.
Posted by Fuzzyman on 2008-10-05 21:48:44 | |
Categories: Python, Hacking Tags: decorators, comparison, sorting
Dynamic Languages MVP, Blog Template, Conferences, Concurrency and Other Stuff
Last week I received an email from Microsoft:
Congratulations! We are pleased to present you with the 2009 Microsoft MVP Award! This award is given to exceptional technical community leaders who actively share their high quality, real world expertise with others.
MVP stands for 'Most Valued Professional' and is an award given to community contributors. I'm the first MVP for the Microsoft dynamic languages team - although there isn't a dynamic languages division for MVPs so technically I'm a Visual C# (*) MVP. That asterisk is very important!
It wasn't all good news last week though. Menno Smits has been a colleague with us at Resolver Systems for almost a year, but he has just passed away. Well, he received an 'offer too good to refuse' and will now be working with BATS Trading in London. They are a trading platform largely implemented in Python and have been so successful in the states that they have become a stock exchange in their own right. Now they are setting up shop in Europe. Congratulations to Menno and we wish him all the best for the future.
If you're anything more than an occasional visitor to this blog you will probably have noticed the change in template. Largely on the insistence of Christian Muirhead (my colleague and co-author of IronPython in Action) I've chopped out almost everything above the entries - which means less to scroll down through and less Javascript should mean faster loading pages. I'm still intending to do (or at least get Justin to do) a full site redesign once 'the book is finished'. This change will have to do in the mean time.
PyCon UK is done so it's time to look at conferences for next year. I've submitted talks for PyCon 2009 (in Chicago) and ACCU 2009 (in Oxford).
ACCU is a UK community conference (I went for the first time last year and it was great fun). It has talks on a wide variety of topics - last year there were very few Python talks but several on Haskell and Erlang. In the past there have been many more Python talks, but last year a significant proportion of those attending were .NET or C++ developers.
I've proposed two talks (I'm not sure which they will prefer):
- Creating Rich Internet Applications with IronPython & Silverlight 2
- Embedding IronPython and the Dynamic Language Runtime in .NET Applications
I've put forward a talk and a tutorial proposal for PyCon US. My talk was submission number 1 (!) and is on Functional Testing of Desktop Applications. It's a relatively niche subject (not much focus in the Python community on creating desktop applications) - but still an important one, so it will be interesting to see if it is accepted.
The tutorial is on Developing with IronPython. It is based on the tutorial that Menno, Christian and I gave at PyCon UK - except this time it is Jonathan Hartley and I who will be giving it. We had a great time giving the tutorial in Birmingham and learned a great deal doing it. We had seventeen people attending (out of a total of eighty attending the tutorials - so nearly a quarter which isn't bad), and I think that the attendees enjoyed it. We should have got people using IronPython earlier in the tutorial, and we had too much practical stuff - meaning we spent too long in the user interface part of the tutorial. I've been revising the handout notes (I'll post them up here sometime as they are a great introduction to IronPython) based on what we learned.
Interesting advice for new programmers from Anders Hejlsberg (the architect of the C# programming language) in an interview with Computer World:
Go look at dynamic languages and meta-programming: those are really interesting concepts. Once you get an understanding of these different kinds of programming and the philosophies that underlie them, you can get a much more coherent picture of what's going on and the different styles of programming that might be more appropriate for you with what you're doing right now.
Anyone programming today should check out functional programming and meta-programming as they are very important trends going forward.
Before his passing away Menno ported part of his website to rest2web. He also posted a new article on his experiences installing Linux on his shiny new Sony Vaio VGN-BZ11XN notebook. rest2web is a tool for maintaining websites (static HTML) in reStructured Text. It is particularly good for programmers as the templating system is straight Python without requiring you to learn a custom templating language.
I use rest2web to maintain pretty much all the websites I run [1], but haven't done any work on rest2web itself beyond maintenance for the last two years. It simply does everything I need it to. Despite this new sites built on rest2web pop-up regularly. Another recent one is by Andrew Straw. I can always tell a rest2web site, because even with a custom template most people leave in the Page last modified ... timestamp from my default template. This uses the <% modtime %> templating variable. Unfortunately it doesn't play well with a Subversion bug if you keep your website sources under source code control. Andrew has overcome this bug by using the SVN commit time instead and posted notes about how he did it under About this Website.
Final piece of news; both Ted Leung and Mark Shuttleworth talked about the future of Python in their keynote speeches at PyCon UK. They both noted that concurrency was becoming more important and is one of the areas where CPython is lacking because of its poor support for threads. Neither IronPython nor Jython have a Global Interpreter Lock (the GIL), so these are both platforms where threads can be used for concurrency with Python.
Michael Sparks is one of the organizers of PyCon UK, and also the author of Kamaelia, a generator based concurrency library for Python. Kamaelia is capable enough to stream video and audio, but last time Michael tried to use it with IronPython a few bugs (in IronPython) prevented it from working. A lot has changed since then, and with the latest version of IronPython 2 most of Kamaelia 'just works'.
Although Kamaelia presents a very simple API for concurrency oriented programming (usually no need to explicitly work with threads or locking), it does use threads in several key parts under the hood. This means it hits limitations in CPython (Michael Sparks' words not mine), and IronPython doesn't suffer from the same restrictions. So far Michael is impressed with IronPython...
| [1] | This one, IronPython in Action, The Other Delia and Resolver Hacks. Exceptions are The IronPython Cookbook which is a MediaWiki wiki and IronPython-URLs Blog which is a blogspot blog (originally started by Mark Rees). |
Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.
Posted by Fuzzyman on 2008-10-05 16:25:47 | |
Categories: IronPython, Python, Website, Blog on Blogging, Projects, Work Tags: conferences, pycon, rest2web, concurrency, mvp, kamaelia, threading
itemgetter and attrgetter
I've just discovered the attrgetter and itemgetter functions from the Python standard library operator module (both functions new in Python 2.4 with added functionality in Python 2.5). I wish I'd discovered them earlier as although they do very simple jobs they can make code cleaner and more readable.
Both of them return functions that will fetch a specified item or attribute from objects.
One place I could make use of attrgetter is in property definitions. It is a fairly common pattern to have a property with custom behaviour when set, but that merely returns an underlying instance attribute when fetched.
As an example:
def __init_(self):
self._document = None
def _set_document(self, document):
self._document = document
# do other stuff ...
document = property(lambda self: self._document, _set_document)
The 'document' setter method is _set_document, but the getter is merely a lambda that returns self._document.
We can improve this by using attrgetter instead:
...
document = property(attrgetter('_document'), _set_document)
attrgetter is called with a string and returns a function that fetches that attribute. Anything that helps eliminate lambdas has to be good right?
It is roughly the equivalent of:
def getter(thing):
return getattr(thing, attribute)
return getter
In Python 2.5 you can call attrgetter with multiple attributes and the getter it returns will fetch you a tuple of all the attributes. As an added bonus, if you are using CPython, it is nice and fast.
itemgetter is very similar, but instead of fetching attributes it fetches items from sequences or mappings. One place this comes in handy is as a key function when sorting lists.
A common pattern when needing a custom sort order for lists is 'decorate-sort-undecorate' (also known as the Schwartzian Transform from its Perl origin). This involves transforming the list (decorate) into one that can be sorted using the built-in sorted (which returns a new list) or using the list sort method (which sorts in place). You then transform your newly sorted list back (the undecorate).
This pattern is now built-in to Python. Both sorted and sort take a key function to transform each item. The list is sorted on the transformed items, saving you the effort of having to do it yourself. As an example suppose we have a list of tuples like (first_name, last_name), and we want to sort on last name.
We can achieve this by passing in a key function to sort that returns the second item of the tuple:
sorted_items = sorted(items, key=lambda item: item[1])
I'm sure you can see what's coming.
We can use itemgetter to eliminate the lambda:
...
sorted_items = sorted(items, key=itemgetter(1))
itemgetter is roughly the equivalent of:
def getter(thing):
return thing[item]
return getter
As with attrgetter it is nice and fast and the Python 2.5 version can take multiple items and the getter will then return you a tuple. It doesn't just work with sequences, but can also be used with dictionaries.
I actually discovered this when Christian was playing with Raymond Hettinger's Named Tuple Recipe (one of awesomenesses coming in Python 2.6) on IronPython. Its use of itemgetter triggered a very obscure bug in IronPython 2 Beta 4 that has thankfully gone away with the Beta 5 release. Named tuples really are awesome, and we will start using them in Resolver One.
Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.
Posted by Fuzzyman on 2008-09-24 12:54:48 | |
Categories: Python, Hacking Tags: standard library, operator module
Python: Two Phase Object Creation
I've started watching a talk by Alex Martelli on Python and design patterns:
It's very interesting. Alex points out that design patterns can't be taken out of the context of the language (the technology) in which they are being used. For example, the iterator pattern is now built-in to most high-level languages. With first class types and functions, most of the object creational patterns are effectively built-in to Python (class and function factories are trivially easy to write).
I'm only half way through the video, but I liked his explanation of object creation in Python. Python has two different 'magic-methods' (protocol methods) for object construction [1], and it is probably not until you have programming in Python for a while (become a 'journeyman') that you need to understand this.
The method usually described as the 'constructor' is __init__. If you define a class with a 'dunder-init' [2] method, then it is called when a new object is created [3]:
def __init__(self, *args, **keywargs)
...
When you create an instance - SomeClass(*args, **keywargs) - the arguments you use in the call are passed to the __init__ method.
Notice that __init__ is an instance method - it receives the instance self as the first argument. This means that it is really an object initialiser; the instance has already been created when it is called.
The method responsible for creating objects is a class method called __new__. This receives the class as the first argument and should return an instance:
def __new__(cls, *args, **keywargs)
...
return object.__new__(cls)
dunder-new can actually return anything; there is no restriction requiring it to be an instance of the class. It can return an instance of a sub-class picked by the arguments passed in, it can return None or a pre-created instance (useful for the Singleton pattern).
If dunder-new does return an instance of the class then dunder-init is called with the same arguments that were passed to dunder-new - the arguments used in the call to construct the object. The vast majority of classes you write won't need a custom implementation of dunder-new.
So when might you want to use __new__? We've already mentioned using your class as a factory that can return an instance of a subclass. Another reason is for creating immutable objects. Because dunder-init is an instance method it can be called again on an instance. If object state is setup in dunder-init then calling it again with new arguments could change the state of the object:
instance.__init__(*newargs, **newkeywargs)
If you setup the object state in dunder-new instead then calling it again creates a new instance rather than mutating the instance. You'll need to do this if you sub-class the built-in immutable types like strings, numbers or tuples. Even if you only want to write a new dunder-init method that takes new arguments, you will still need to override dunder-new. Any arguments passed to object creation will also get passed to dunder-new; and the default one will barf on extra arguments.
def __new__(cls, value, arg):
# ignore 'arg'
return int.__new__(cls, value)
def __init__(self, value, arg):
self.arg = arg
Creating instances from classes is actually done by the metaclass (yes - they are in the language for a reason other than making things complicated).
You can make objects callable in Python by defining a __call__ method. If a class defines dunder-call, then instances of the class are callable. In fact functions and methods in Python are just examples of callable objects.
Object creation is done by 'calling classes'. So if calling an object results in a call to dunder-call defined on their class, what happens when you call a class?
The same thing... the class of a class is its metaclass - and when you call a class, __call__ on the metaclass is called. For classes that inherit from object (new-style classes), the default metaclass is type. The two phase object construction [4] we have been discussing is implemented in type.__call__, and it is roughly equivalent to:
instance = cls.__new__(cls, *args, **keywargs)
if isinstance(instance, cls):
cls.__init__(instance, *args, **keywargs)
return instance
So you can customize what happens when a class is 'called' by implementing a custom metaclass that overrides __call__.
| [1] | I'm typing here whilst watching the latest series of 'The Shield'. I just mistyped 'object construction' as 'obstruction'... |
| [2] | 'dunder' being shorthand for 'double-underscore'. |
| [3] | These methods are documented in Basic Customization. |
| [4] | Ruby has effectively the same two-phase object creation: Class.new maps to Class.allocate and Class.init. |
Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.
Posted by Fuzzyman on 2008-09-21 13:21:57 | |
Categories: Python, Hacking Tags: metaclasses, oop, objects
Functional Testing of Desktop Applications
At PyCon UK I gave a talk on functionally testing GUI applications. Functional testing involves interacting with an application in the same way as the user. After performing actions you assert that the application behaves as expected.
There has been a lot of talk around the testing of websites, and many useful tools including Selenium, Twill, and Windmill. There isn't so much discussion of testing desktop applications - I guess writing desktop applications isn't in fashion.
Since 2006 I've been working at Resolver Systems creating a desktop spreadsheet development environment called Resolver One. My talk is based on my experiences of testing Resolver One, and I've put it online in article form.
You can find the article(s) at:
The article includes a downloadable project with test framework - including example unit tests and functional tests. There is also a two minute video of the Resolver One automated test suite running.
It covers:
- The why and how of functional testing - including the processes and infrastructure you need around them
- Basic principles of practical testing
- Common problems and ways to overcome them
The test framework used is unittest and the example application is written with IronPython and Windows Forms. The principles discussed apply whichever frameworks you are using.
Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.
Posted by Fuzzyman on 2008-09-18 21:24:12 | |
Categories: General Programming, Python, Writing, Work, IronPython Tags: testing, article, unittest, resolverone, GUI
PyCon UK and Metaclasses in Five Minutes
PyCon UK 2008 is now over. It was exhausting but totally rocked!
On the Saturday I gave a lighting talk: Metaclasses in Five Minutes. I've put it online as an article:
-
Metaclasses have a reputation for being 'deep-black-magic' in Python. The cases where you need them are genuinely rare (unless you program with Zope...), but the basic principles are surprisingly easy to understand.
It was interesting that both Mark Shuttleworth and Ted Leung spoke about the need for Python to adapt to the programming challenges of the future. They both highlighted distributed computing (the cloud) and parallel processing (multi-core) as important. Interestingly Ted Leung also highlighted desktop applications - there are surprisingly few major desktop applications written in Python. (Adobe Lightroom is 40% Lua!)
Another interesting talk was one by Christian Tismer on the work that he and Raymond Hettinger have done recently on Psyco the CPython JIT. The work was sponsored by 'a Los Angeles firm' (probably Fattoc) and is not quite complete - but close enough for a release.
The new work adds newer features (like generators) and more builtin functions to the Python constructs that Psyco is able to speed up. Interestingly, the alternatives for speeding up some of the builtins were to either write custom C versions - or just rewrite in Pure Python and let Psyco JIT as usual...
Oh, and in case you haven't checked out Reddit recently - the Python Sub-Reddit is Python.org themed.
Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.
Posted by Fuzzyman on 2008-09-16 16:26:58 | |
Categories: Life, Python, Writing Tags: pycon, metaclasses, article, psyco
Static Typing, Dynamic Language Performance, V8 and Tracing JITs
There has been a lot of good buzz recently about dynamic language performance; including Steve Yegge's Dynamic Languages Strike Back essay [1] and the more concrete demonstration in the impressive V8 Javascript Engine inside the google Chrome browser.
Two of my favourite quotes on programming languages and 'the typing issue' come from Gilad Bracha [2]:
"Only a subset of all possible programs can be written with statically typed languages. For some people that is enough."
"For optimisation, more is known about a program written in a dynamically typed language at runtime than is known about programs in statically typed languages at compile time"
The first quote makes an interesting point. In a statically typed language the compiler must know (either by inferencing or through explicit declarations) the type of every object everywhere you use it. This places restrictions on what you can do, but because of these restrictions the compiler is able to provide you with extra information (is the program type safe) and make performance optimisations (no need for dynamic dispatch or member lookup - they can all be statically bound).
So it is a trade-off, there are things you just can't do in a statically typed language that you can do in a dynamically typed language - but you make this trade-off for the sake of the compiler.
Note
Those who use statically typed languages don't do it merely because they "haven't yet discovered dynamically typed languages". When you become fluent in any complex system you start to think within its terms - and those who are proficient with statically typed languages think in terms of its types and leverage the information and feedback from the compiler.
This means that when presented with a dynamically typed language you have taken away the framework within which they think. Exactly the same is true for those who have only programmed with dynamically typed languages, they simply can't think within a statically typed framework. This is why the two sides get on so well...
The point of this blog entry is not that dynamic typing is better, but that it is a trade-off (on both sides). However performance needn't be one of those trade-offs.
Of course typically statically typed language implementations are able to be faster than dynamically typed languages where a lot of extra work has to be done at runtime.
Back in the late eighties a bunch of incredibly smart guys did a lot of research around improving the performance of dynamic languages like Smalltalk. They were convinced that dynamic languages could even outperform statically typed languages through runtime optimisations. One of the results was a paper An Efficient Implementation of Self, a Dynamically-Typed Object-Oriented Language Based on Prototypes along with the Strongtalk VM. Unfortunately (to quote Gilad again) this is rocket science, and when Sun snapped them up to work on the Hostspot JIT compiler for the Java Virtual Machine not many people continued their work.
(At PyCon the Microsoft guys I spoke to acknowledged that Hotspot - a dynamic JIT - was superior to the JIT in the .NET framework, but they claimed that their garbage collection implementation is better. In a managed VM garbage collection has a big performance impact and is one of the areas where PyPy is already faster than CPython.)
This work has resurfaced in an interesting place. The paper on Self was referenced from one of the design documents on the Javascript V8 engine. Like Self, Javascript is a prototype based language - and their description of some of the optimisations are surprisingly readable.
From Design Elements of the Google V8 Javascript Engine: Fast Property Access:
"JavaScript is a dynamic programming language: properties can be added to, and deleted from, objects on the fly. This means an object's properties are likely to change. Most JavaScript engines use a dictionary-like data structure as storage for object properties - each property access requires a dynamic lookup to resolve the property's location in memory. This approach makes accessing properties in JavaScript typically much slower than accessing instance variables in programming languages like Java and Smalltalk. In these languages, instance variables are located at fixed offsets determined by the compiler due to the fixed object layout defined by the object's class. Access is simply a matter of a memory load or store, often requiring only a single instruction."
"To reduce the time required to access JavaScript properties, V8 does not use dynamic lookup to access properties. Instead, V8 dynamically creates hidden classes behind the scenes. This basic idea is not new - the prototype-based programming language Self used maps to do something similar. (See for example, An Efficient Implementation of Self, a Dynamically-Typed Object-Oriented Language Based on Prototypes). In V8, an object changes its hidden class when a new property is added."
Of course immediately after the release of Chrome a whole raft of benchmarks came out comparing it to the JIT that Mozilla are working on integrating into Firefox to improve Javascript performance: Tracemonkey.
- Benchmarks from google showing V8 outperforming Firefox
- Benchmarks from Mozille (BrendanEich) showing Tracemonkey outperforming V8
- Benchmarks showing a mixed bag
Tracemonkey (based on Tamarin from Adobe) is a quite different Just-in-Time compiler than the one used in V8. Both compile higher-level languages to native machine code (although interestingly V8 has no intermediate bytecode step). V8 (like the .NET JIT) compiles 'up-front', whilst Tracemonkey - as its name implies is a tracing compiler. This is the same technology being implemented in the PyPy project.
Note
One of the fantastical things about PyPy is that it is more than just 'Python-in-Python' - it is a whole interpreter compiler toolchain.
Interpreters are 'described' in RPython (a static subset of Python) and then compiled into interpreters capable of running standalone (using the C backend) or on the CLR and JVM.
The PyPy tracing JIT is (for a limited range of types currently) capable of emitting machine code (or .NET/Java bytecode) optimised for specific operations (e.g. typed bytecode that only adds numbers if that is what your program is doing). On the .NET / JVM backends this bytecode will be translated into machine code by the JIT of the underlying platform - so your Python code under the PyPy interpreter will be 'double-jitted' on these platforms.
So what is a tracing JIT? Rather than compiling up-front, a tracing JIT analyses the flow of types through your program and can compiled specialised 'paths' for the frequently used parts. If you have a function that is called with integers and adds them, then machine code that performs this operation will be generated. The function is protected with a guard so that if it is ever called with different types then new code can be generated or the normal language mechanisms used.
In theory this approach is capable of offering more optimisations than compilers that operate 'up-front' (like the .NET and V8 JITs). By analysing the flow of types through the program a tracing JIT is capable of making much smarter decisions about what can be inlined for example.
So whilst both V8 and Tracemonkey have plenty of room to get faster (I'm sure), Tracemonkey has the most room ahead of it.
In terms of Python, the CPython Virtual Machine is written in C, with several design decisions 'hard-coded' throughout the source code. These include garbage collection by reference counting and the Global Interpreter Lock [3]. Changing this would be very painful (although integrating Tracemonkey involved Mozilla in a move away from reference counting to garbage collection - and they managed to automate the process of changing a lot of the source code). This was what motivated the creation of the PyPy project. It makes it much easier to experiment with new implementation strategies - and if the JIT lives up to its promises then Python may get faster, a lot faster.
| [1] | Rhino and Tigers has some great comments on typing as well. |
| [2] | I don't have a written source for these quotes (he made them at an SFI conference in Poland that I attended earlier this year), so I have probably mangled the precise wording. |
| [3] | Adam Olsen has created an incredible project called safethread that basically patches the CPython source to remove the Global Interpreter Lock. It does have a significant cost for single threaded code however. |
Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.
Posted by Fuzzyman on 2008-09-10 23:06:36 | |
Categories: General Programming, Python Tags: javascript, chrome, firefox, tracemonkey, JVM, JIT, .NET, compilers, performance, dynamic languages, static typing
Twatter and MultiDoc on Mono Windows Forms (Mac OS X)
At Resolver Systems we've been busy preparing for PyCon UK.
As well as sponsoring the drinks for the Saturday evening dinner there are several of us speaking:
- Giles Thomas (the boss!) on Python Inside and Outside the Spreadsheet Grid
- Jonathan Hartley with Stretching Pyglet's Wings
- Menno Smits on Python on the OpenMoko Freerunner
- Me on Functionally Testing GUI Applications
Plus of course Menno, Christian and I will be giving our Developing with IronPython tutorial on the Friday. We finished and 'handed in' the tutorial handouts, which are a great twenty-odd-page introduction to IronPython, so sometime I'll put them online with the example code.
The example code is a simple Twitter client called Twatter. It uses Windows Forms and as we intend to support it on Windows, Linux and the Apple Mac I've been testing it with Mono 1.9 on Mac OS X 10.5 (Leopard).

Not what you would call beautiful, but acceptable.
(It is currently hardwired to just use my image - and I didn't write that code - so it isn't a bug. Twatter is still a 'work in progress'.)
I've also been playing with MultiDoc, which is a tabbed text-editor and the example application for chapters 3-6 of IronPython in Action.

The interesting thing about this is that the 'Name Tab' dialog was actually created by the Visual Studio forms designer (generating C#) - and works fine from Mono.
The chapter on testing in IronPython has a functional test for MultiDoc, and I've expanded on this (and added more tests) to illustrate some of the principles behind functional testing for my talk.
UPDATE: I've just tried these apps with Mono 2.0 Preview 2. There are some tiny aesthetic improvements, but some bugs that I was going to have to track down have gone - which is nice (unhandled exception on exit for example).
Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.
Posted by Fuzzyman on 2008-09-08 13:35:03 | |
Categories: Python, IronPython, Life, Work Tags: pycon, conference, tutorial, mono, mac, leopard, resolversystems, windows forms
Reasons to Love Python
I've been dabbling in C# recently, and I'm afraid I like it. As statically typed languages go it's a good one. However, on coming back to Python I remember how much I like it. It is not just that it is dynamically typed, but it is all the language features that make it concise and expressive.
A programming language is a medium of expression.
—Paul Graham
Here are a few of my favourites (in no particular order):
Generator Expressions
Python generators are very cool, and have been around since Python 2.2 (2002). They behave very similarly to the Yield Return introduced in C# 2.0 in 2005.
Also in Python 2.2 were list comprehensions (similar to LINQ over objects introduced in .NET 3.5 in 2007) that allow you to combine a loop and filter in a single expression:
The expression in the square brackets is evaluated immediately and is identical to:
for value in iterable:
if value > 0:
result.append(f(value))
Python 2.4 (2004) introduced a novel extension to list comprehensions: generator expressions. Generator expressions are similar to list comprehensions, but instead of square brackets they use parentheses.
The major difference is that instead of being evaluated eagerly they are evaluated lazily. The generator expression returns a generator object that is not executed until you consume it by iterating over it. This means they can be more memory efficient as you can consume individual items from the generator without creating the whole list up front.
One side effect is that they look nicer as the arguments to a function that takes an iterable:
Generators are naturally first class objects that you can pass around your code. David Beazley has an excellent set of slides [1] on using generators for systems programming (and will be giving a tutorial on them at PyCon UK): Generator Tricks for Systems Programmers
One of the examples he gives is a pipeline of generator expressions for parsing Apache log files and summing the amount of data served:
bytecolumn = (line.rsplit(None, 1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print "Total", sum(bytes)
None of the generators are consumed until the final call to sum. As it iterates a line at a time (not keeping the log file in memory) it can handle huge log files - and as a bonus it runs faster than a typical solution with loops!
Everything is an Object
Everything in Python is an object (even None) - functions, classes, methods and modules are all first class objects.
This allows for wonderful programming techniques including higher order functions, factory functions (for classes and functions), creating modules at runtime from databases, meta-programming and so on. It makes testing massively easier and is particularly useful when combined with Python's powerful introspection (inspect any object at runtime to determine its capabilities).
Tuples, Tuple Unpacking and Multiple Return Values
Tuples are one of the built-in container types. Although experts may 'decry' the description, they are effectively immutable lists.
The syntax is straightforward: (1, 2) is a tuple containing two numbers. They can be indexed or iterated over like lists and are a fantastic way of grouping values where otherwise you might create a custom class just to hold a pair of values! They are not unlike the new anonymous types that were introduced along with LINQ in C# - but those you can only use inside the scope where you create them.
Tuples in Python turn up implicitly in a few places, and one of these is where a function or method returns multiple values:
return a * b, a - b
The return value from function is a tuple of two values. Being able to return multiple values is enormously useful - especially if you use it with tuple unpacking. Python can 'unpack' tuples in assignment statements:
It is fair to say that out parameters are only needed in C# because it can't return multiple values (again the alternative is creating a custom class and returning an instance of that instead).
You can also unpack tuples into function calls:
product, difference = function(*parameters)
And the converse is collecting all positional arguments into a tuple (similar to the C# params method signature):
a, b = args
return a * b, a - b
Keyword Arguments
Another small language feature, but invaluable for creating flexible and usable APIs: keyword arguments with default values.
...
function(arg1=2)
function(arg1=2, arg3='nothing')
function(2, None, 'something')
Arguments with default values can be called positionally (as normal) or by keyword. This allows you to call a function or method only overriding values that differ from the default.
In fact it is so useful that support for it has been built into IronPython for working with .NET objects. Keyword arguments used in a constructor are the equivalent to constructing the instance and then setting properties afterwards:
Decorators
One consequence of first class functions is that 'wrapping' functions (one example of higher order functions) becomes possible. Decorators were introduced to provide a convenient syntax for doing that. I didn't follow the debate as I wasn't interested in the feature and I didn't think I would use it. I was wrong - they turn out to be massively convenient for all sorts of things.
A simple example of wrapping a function (without decorator syntax) for exception handling and logging:
def inner(*args, **keywargs):
try:
return function(*args, **keywargs)
except Exception, e:
logger.log("Exception occurred in function '%s': %s" %
(function.__name__, e))
return inner
wrapped = wrapper(function)
The wrapper function takes a function as an argument. It defines an inner function that calls the original function (keeping a reference to it through a closure) with exception handling and logging. It returns the inner function that can be used in the place of the original. (The *args, **keywargs syntax captures all the positional and keyword arguments that inner is called with and calls function with the same arguments.)
Having created the wrapper function we can actually do the wrapping with the decorator syntax:
def function(a, b):
...
We use this all the time at Resolver Systems - for profiling, mocking out names and 'auto-unmocking' them within the scope of a single function, invoking methods onto control threads and so on.
The Interactive Interpreter
No discussion of useful Python features can be complete without a mention of Python's REPL; the interactive interpreter.
You can explore new libraries or check out language features in a matter of seconds. You can even do real work from it - Tim Golden (a Python DBA and WMI guru) says that when working with databases he often uses the interactive interpreter and "slurp the data in, transform it, push it back out and walk away".
I've saved the most important (and most controversial) two for almost last:
Explicit Self
Explicit self certainly stirs up some debates. I like it - it makes Python scoping very explicit. When declaring an instance method in a class body, you declare the instance as the first argument to be passed in to the method - and by convention you name this argument self.
def instance_method(self, arg1, arg2):
...
The useful thing about this is that you can see at a glance which instance attributes your method is using - they are all prefixed with self.
It also makes calling up to base class methods straightforward and consistent without requiring additional syntax:
def instance_method(self, arg1, arg2):
BaseClass.instance_method(self, arg1, arg2)
Of course if you really can't cope with the explicit self then you can always use my Selfless Metaclass that uses bytecode hackery to remove the need to declare it.
Indentation Based Block Structure
Some people really don't like this, which to be honest baffles me a bit. Here's what I wrote for the Why Separate Sections by Indentation Instead of by Brackets or End question on the Python wiki:
In order to separate blocks of code (like for loops, if blocks and function definitions) the compiler / interpreter needs something to tell it when a block ends. Curly braces and end statements are perfectly valid ways of providing this information for the compiler.
For a human to be able to read the code indentation is a much better way of providing the visual cues about block structure. As indentation also contains all the information for the compiler, to use both would be redundant. As indentation is better for humans, it makes sense to use that for the compiler too.
It has the advantage that Python programs tend to be uniformly and consistently indented, removing one hurdle to understanding other people's code. Python does not mandate how you indent (two spaces or four, tabs or spaces - but not both), just that you do it consistently. Those that get used to the Python way of doing things tend to start seeing curly braces as unnecessary line noise that clutters code.
On the other hand, 'the whitespace thing' is possibly the single biggest reason why some developers refuse to even try Python.
Interesting to note that both Haskell and F# also allow you to delimit block structure by indentation (plus Python influenced languages like Boo and Cobra).
I haven't even mentioned language features that come as a consequence of being a dynamically typed language: duck typing (protocols instead of interfaces), heterogeneous container types (no need for generics) and many more...
| [1] | And unlike other talk slides these are very readable even without David explaining them. |
Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.
Posted by Fuzzyman on 2008-09-03 22:33:11 | |
Categories: General Programming, Python, IronPython
PyCon UK: IronPython Tutorial, Socials and Volunteers Needed
The PyCon UK conference (national UK Python conference) draws inexorably nearer - just over a week to go now. We've currently got over 200 people registered and about 80 for the tutorials so it should be a great conference.
If you're coming, don't forget to register for the social events on the Thursday and Friday evenings:
Today Christian, Menno and I have been working on our IronPython Tutorial. The handout notes and example code are basically done, just a little polishing off to do.
The tutorial is based around a simple Twitter client, which Menno has named Twatter! We've been testing Twatter on Windows, Linux and the Mac (it doesn't look bad on the Mac).
At each step we will be introducing a new topic and explain the principles. We'll give the attendees the skeleton of the code and then assist you to add new functionality. It is a very simple application, but we will manage to cover topics including (all using .NET APIs from IronPython):
- Windows Forms
- Databases
- Web services and network access
- Handling XML
- Threading
Preparations for the conference itself are going well. We have great sponsorship this year from quite a few companies. Resolver Systems is paying for the drinks on the Saturday night dinner!
What we still need is more volunteers to help with the practical arrangements. We particularly need volunteer Session Chairs who will introduce the speakers and make sure they finish on time! We also need help putting up the signs on the Thursday and Friday. If you can help please sign up on the volunteer wiki pages or email us using the address from the main PyCon UK website.
Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.
Posted by Fuzzyman on 2008-09-03 19:15:21 | |
Categories: Python, IronPython, Life Tags: conference, pycon, tutorial, social event
Django on Jython, Python Implementations and Performance
Django now runs on Jython which is great news. Jeff Hardy is also making progress running Django on IronPython. As usual the news sparked a plague of comments on Reddit. There seems to be a lot of confusion about the different implementation of Python, and which bits of CPython acts as the reference implementation. (Even Ruby is getting a language specification...)
CPython is the reference implementation but several aspects have been explicitly described as implementation details. These include:
- Stack frames
- Bytecode instructions
- The Global Interpreter Lock
- Reference counting for garbage collection
Jython and PyPy do use Python stack frames, and so tend to have less issues than IronPython when running Python applications that depend on certain obscure implementation details. (IronPython doesn't and is faster in consequence.)
PyPy has implemented the GIL (mainly as a matter of convenience) - Jython and IronPython don't have a GIL and can scale multi-threaded code across several CPU cores.
None of PyPy, Jython and IronPython use reference counting for garbage collection. This means faster garbage collection but non-deterministic calling of destructors - which would normally be called immediately the reference count drops to zero in CPython (it also means no uncollectable cycles either which can happen in CPython when you have cycles involving destructors).
IronPython uses native .NET strings, and so all strings are Unicode. In my experience this has made working with strings much more pleasant in IronPython - roll on Python 3. This also used to be the case with Jython, but I believe that Jython now has byte strings. This makes it easier to get Django running, as Django 1.0 uses the difference between byte-strings and Unicode strings to determine whether it is serving text or binary data.
IronPython does a lot of magic to allow you to store binary data in strings (it can still be a cause of bugs - but they are bugs and should be reported to the IronPython team), but you can't dispatch on type. This makes it questionable whether an unpatched Django will ever run on IronPython without some other flag (or way of patching in a compatible 'bytes' type implementation). Jeff certainly seems to be making good progress though.
A new page popped up recently on the Python wiki (relevant I promise):
This is my answer to the question Why is Python slower than xxx Language ?:
Python as a language is a set of rules (its syntax and semantics) and so doesn't have a 'speed'. Only a specific language implementation can have a measurable speed, and then we can only compare performance with a specific implementation of another language. In general you can't compare the speed of one language to another - you can only compare implementations.
Having said that, as a dynamic language Python will typically perform slower for specific benchmarks than standard implementations of some other languages (although it is faster than plenty of others). As a dynamic language a lot of information about the program can only be determined at runtime. This means that a lot of common compiler tricks, that rely on knowing the type of objects at compile time, can't work. Despite this there are a lot of things that can be done to improve the performance of dynamic languages (beyond the performance of statically typed languages many believe), several of which have been done before in virtual machines like Strongtalk and are being explored for Python in the PyPy JIT tracing compiler.
Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.
Posted by Fuzzyman on 2008-08-14 21:56:09 | |
Categories: Python, IronPython, General Programming Tags: pypy, jython, performance, django
For buying techie books, science fiction, computer hardware or the latest gadgets: visit The Voidspace Amazon Store. If you're looking for a new techie job, try the Voidspace Tech Job Board. This is part of the Hidden Network of technology and programming jobs.

IronPython in Action


