Python Programming, News on the Voidspace Python Projects and All Things Techie.
For my more personal blog, go to the Voidspace Blog. This also has links to the old Techie Blog, God rest its soul.

Python: Values and References

emoticon:new_document An important step for anyone learning Python is understanding that instead of variables with values Python has names referring to objects (often described as names bound to objects). This is slightly different to other languages, but only slightly. Most modern languages (like Java and C#) have a concept of variables as references to objects - but they also have variables that hold values directly.

There was a long discussion recently on the Python newsgroup as to whether Python uses "call by reference" or "call by value". The discussion was mainly an attempt to persuade one individual that call by value isn't a sensible description of the way that Python passes parameters into function / method calls. Despite the apparent pointlessness of the topic, a lot of interesting matters were discussed along the way.

Wikipedia defines Call by Value as:

In call-by-value, the argument expression is evaluated, and the resulting value is bound to the corresponding variable in the function (frequently by copying the value into a new memory region). If the function or procedure is able to assign values to its parameters, only the local copy is assigned - that is, anything passed into a function call is unchanged in the caller's scope when the function returns.

As you can see, this only partly applies to Python. Local assignments inside a function don't affect the calling scope, but values aren't 'copied' into a function call and changes to mutable objects are seen by the caller.

Wikipedia (same page) defines call by reference as:

In call-by-reference evaluation, a function receives an implicit reference to the argument, rather than a copy of its value. This means that the function can modify the argument, what will be seen by its caller.

This seems to fit Python very well, but it is often taken to mean that assignments to parameters passed in by reference will also affect the calling scope. Although in Python mutable objects can be modified inside a function / method call - and those changes will be seen by the caller - rebinding the name to a new object will not affect the calling scope. (Except variables explicitly declared as global, or in Python 3 where you use the nonlocal keyword).

A more common understanding of these terms is that in call by value values are copied when passed as parameters, whereas in call by reference a pointer (reference) to the object is passed. By these definitions Python is clearly call by reference and this definition also matches the distinction that languages like C# make between value types and reference types (I've not programmed in Java, but my understanding is that it makes the same distinction).

In C#, value types (all numeric types, structs, enumerations etc) actually live on the stack, whereas reference types (strings, instances of classes etc) live on the heap and have a pointer on the stack.

It is complicated by the fact that .NET perpetuates the myth that the primitives (the value types) inherit from System.Object (a reference type). The runtime does auto-boxing for you where this matters.

This means that when you pass a value type into a method the value is copied. Structs can be arbitrarily big, so this can be a performance problem. It is often a performance win for working with the numeric types as you remove a level of indirection.

It isn't without problems though - and some of these can be seen if you develop with IronPython.

System.Drawing.Point is a struct (a value type), but it is also mutable (despite the fact that mutable value types are against the .NET design guidelines). Because it is a value type, when you access it you are often accessing a copy - and if you mutate a copy then your changes can be lost.

The following code illustrates the problem:

>>> r = Rectangle(0, 100, 20, 40)
>>> r.Location.X
0
>>> r.Location.X = 20
>>> r.Location.X
0

Because r.Location returns a value type (r.Location is a Point), the update to it is lost. This happens in C# as well as in IronPython, but it is more unexpected in Python because all objects are expected to behave like reference types.

In .NET you can specify that a parameter to a method takes a reference ('out' in C# or ByRef in VB.NET). If you pass in a value type as a reference parameter then the .NET runtime will do boxing / unboxing for you.

This means that we can write code like the following:

int x = 3;
SomeMethod(out x);

x can now be mutated by 'SomeMethod' and have a different value after the method call.

In a 'way' immutable Python types behave a bit like .NET value types, and mutable types like reference types. This analogy breaks down when we push it though.

As a final note, Aaron Brady and others point out that much better descriptions for the Python calling convention is "call by object" or "call by sharing":

From Barbara Liskov's paper, 1992:

"...arguments are passed "by object"; the (pointer to the) object resulting from evaluating the actual argument expression is assigned to the formal."

However a Google search for "call by object", in quotes, returns mostly Python-related links.

The search for call by sharing was more successful:

"Call by Sharing The caller and called routine communicate only through the argument and result objects; routines do not have access to any variables of the caller.... if a routine assigns an object to a formal argument variable, there is no effect on the caller."

Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.

Posted by Fuzzyman on 2008-11-04 13:54:05 | |

Categories: , , Tags: , , ,


Vista UAC and Registry Fun

emoticon:movpy2 A few days ago I was trying to create a Windows MSI installer that set a registry entry computed at install time. Naturally, in order to minimise intrusion and permissions difficulties, I decided to use HKEY_CURRENT_USER (the per user registry hive) rather than HKEY_LOCAL_MACHINE (the machine wide hive that needs admin privileges to write to [1]).

I was using Wix to build the msi. Wix involves plenty of XML joy to use, but once you have it setup (with most of the XML generated from Python scripts) then it isn't too bad. You can embed an executable to be run at install time as a CustomAction, which is ideal for something needing to be computed at install time.

The magic in your wxs template to create and execute a custom action looks something like:

<Binary Id="MyExecutable" SourceFile="MyExecutable.exe"/>

<CustomAction Id="MyExecutable"
          BinaryKey="MyExecutable"
          ExeCommand="command line arguments"
          Return="ignore"
          HideTarget="yes"
          Impersonate="yes" />

<InstallExecuteSequence>
    <Custom Action="MyExecutable" />
</InstallExecuteSequence>

You can use a scripting engine, with either VB or JScript, instead of embedding an executable. Everything on the intarwebz I read suggested that this was a bad idea. Many anti-virus programs decide that scripts inside an installer are dangerous and disable them (whilst running an executable is somehow fine), plus it depends on the user not having a broken scripting engine on their machine.

For this to work straightforwardly you need the Impersonate="yes" line in the CustomAction element. Installers don't run as the user who launched the installer - but as some magic user called 'Local System' instead. Impersonate makes the executable run as the user.

All is fine if the user running the installer has admin privileges. Installing anything into C:\Program Files requires admin privileges, so if the user has UAC switched on (Vista only) then it will warn them - and then install fine, writing the registry entry correctly.

If they don't have admin privileges, but work in (say) a corporate environment, then the install will require an administrator to come and enter admin credentials. (Unless they run not as an admin user but with UAC off - in which case they are stuffed. Vista won't prompt and won't install.) When the administrator enters their credentials then instead of temporarily elevating their permissions for the install (which Windows is well capable of) - instead it switches the installer to run as the admin user instead.

As a consequence the registry entry is set for the admin user and not for the user who launched the installer in the first place. As far as I can tell there is no way round this, and the only reliable thing to do is to write a machine wide registry entry instead. sigh

To ensure that this works you need to tell Wix that your embedded executable requires elevated privileges. You do this by setting Impersonate="no" along with Execute="deferred" (don't ask me why [2]). This makes the Custom Action entry in the wxs XML look like this:

<Binary Id="MyExecutable" SourceFile="MyExecutable.exe"/>

<CustomAction Id="MyExecutable"
          BinaryKey="MyExecutable"
          ExeCommand="command line arguments"
          Return="ignore"
          HideTarget="yes"
          Impersonate="no"
          Execute="deferred"/>

<InstallExecuteSequence>
    <Custom Action="MyExecutable" Before="InstallFinalize" />
</InstallExecuteSequence>

This works reliably, but if anyone has any better techniques for achieving the same goal then I would be glad to hear them.

[1]Thanks to this blog entry for setting me on the right track.
[2]Actually, if you attempt to write to it without the right privileges I think Vista will virtualise (except on 64bit systems) - meaning that attempting to access the key later from another process will fail.

Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.

Posted by Fuzzyman on 2008-11-03 14:36:53 | |

Categories: , Tags: , , , , ,


Functional Testing of Desktop Applications

emoticon:cross At PyCon UK I gave a talk on functionally testing GUI applications. Functional testing involves interacting with an application in the same way as the user. After performing actions you assert that the application behaves as expected.

There has been a lot of talk around the testing of websites, and many useful tools including Selenium, Twill, and Windmill. There isn't so much discussion of testing desktop applications - I guess writing desktop applications isn't in fashion.

Since 2006 I've been working at Resolver Systems creating a desktop spreadsheet development environment called Resolver One. My talk is based on my experiences of testing Resolver One, and I've put it online in article form.

You can find the article(s) at:

The article includes a downloadable project with test framework - including example unit tests and functional tests. There is also a two minute video of the Resolver One automated test suite running.

It covers:

  • The why and how of functional testing - including the processes and infrastructure you need around them
  • Basic principles of practical testing
  • Common problems and ways to overcome them

The test framework used is unittest and the example application is written with IronPython and Windows Forms. The principles discussed apply whichever frameworks you are using.

Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.

Posted by Fuzzyman on 2008-09-18 21:24:12 | |

Categories: , , , , Tags: , , , ,


Static Typing, Dynamic Language Performance, V8 and Tracing JITs

emoticon:key There has been a lot of good buzz recently about dynamic language performance; including Steve Yegge's Dynamic Languages Strike Back essay [1] and the more concrete demonstration in the impressive V8 Javascript Engine inside the google Chrome browser.

Two of my favourite quotes on programming languages and 'the typing issue' come from Gilad Bracha [2]:

"Only a subset of all possible programs can be written with statically typed languages. For some people that is enough."

"For optimisation, more is known about a program written in a dynamically typed language at runtime than is known about programs in statically typed languages at compile time"

The first quote makes an interesting point. In a statically typed language the compiler must know (either by inferencing or through explicit declarations) the type of every object everywhere you use it. This places restrictions on what you can do, but because of these restrictions the compiler is able to provide you with extra information (is the program type safe) and make performance optimisations (no need for dynamic dispatch or member lookup - they can all be statically bound).

So it is a trade-off, there are things you just can't do in a statically typed language that you can do in a dynamically typed language - but you make this trade-off for the sake of the compiler.

Note

Those who use statically typed languages don't do it merely because they "haven't yet discovered dynamically typed languages". When you become fluent in any complex system you start to think within its terms - and those who are proficient with statically typed languages think in terms of its types and leverage the information and feedback from the compiler.

This means that when presented with a dynamically typed language you have taken away the framework within which they think. Exactly the same is true for those who have only programmed with dynamically typed languages, they simply can't think within a statically typed framework. This is why the two sides get on so well...

The point of this blog entry is not that dynamic typing is better, but that it is a trade-off (on both sides). However performance needn't be one of those trade-offs.

Of course typically statically typed language implementations are able to be faster than dynamically typed languages where a lot of extra work has to be done at runtime.

Back in the late eighties a bunch of incredibly smart guys did a lot of research around improving the performance of dynamic languages like Smalltalk. They were convinced that dynamic languages could even outperform statically typed languages through runtime optimisations. One of the results was a paper An Efficient Implementation of Self, a Dynamically-Typed Object-Oriented Language Based on Prototypes along with the Strongtalk VM. Unfortunately (to quote Gilad again) this is rocket science, and when Sun snapped them up to work on the Hostspot JIT compiler for the Java Virtual Machine not many people continued their work.

(At PyCon the Microsoft guys I spoke to acknowledged that Hotspot - a dynamic JIT - was superior to the JIT in the .NET framework, but they claimed that their garbage collection implementation is better. In a managed VM garbage collection has a big performance impact and is one of the areas where PyPy is already faster than CPython.)

This work has resurfaced in an interesting place. The paper on Self was referenced from one of the design documents on the Javascript V8 engine. Like Self, Javascript is a prototype based language - and their description of some of the optimisations are surprisingly readable.

From Design Elements of the Google V8 Javascript Engine: Fast Property Access:

"JavaScript is a dynamic programming language: properties can be added to, and deleted from, objects on the fly. This means an object's properties are likely to change. Most JavaScript engines use a dictionary-like data structure as storage for object properties - each property access requires a dynamic lookup to resolve the property's location in memory. This approach makes accessing properties in JavaScript typically much slower than accessing instance variables in programming languages like Java and Smalltalk. In these languages, instance variables are located at fixed offsets determined by the compiler due to the fixed object layout defined by the object's class. Access is simply a matter of a memory load or store, often requiring only a single instruction."

"To reduce the time required to access JavaScript properties, V8 does not use dynamic lookup to access properties. Instead, V8 dynamically creates hidden classes behind the scenes. This basic idea is not new - the prototype-based programming language Self used maps to do something similar. (See for example, An Efficient Implementation of Self, a Dynamically-Typed Object-Oriented Language Based on Prototypes). In V8, an object changes its hidden class when a new property is added."

Of course immediately after the release of Chrome a whole raft of benchmarks came out comparing it to the JIT that Mozilla are working on integrating into Firefox to improve Javascript performance: Tracemonkey.

Tracemonkey (based on Tamarin from Adobe) is a quite different Just-in-Time compiler than the one used in V8. Both compile higher-level languages to native machine code (although interestingly V8 has no intermediate bytecode step). V8 (like the .NET JIT) compiles 'up-front', whilst Tracemonkey - as its name implies is a tracing compiler. This is the same technology being implemented in the PyPy project.

Note

One of the fantastical things about PyPy is that it is more than just 'Python-in-Python' - it is a whole interpreter compiler toolchain.

Interpreters are 'described' in RPython (a static subset of Python) and then compiled into interpreters capable of running standalone (using the C backend) or on the CLR and JVM.

The PyPy tracing JIT is (for a limited range of types currently) capable of emitting machine code (or .NET/Java bytecode) optimised for specific operations (e.g. typed bytecode that only adds numbers if that is what your program is doing). On the .NET / JVM backends this bytecode will be translated into machine code by the JIT of the underlying platform - so your Python code under the PyPy interpreter will be 'double-jitted' on these platforms.

So what is a tracing JIT? Rather than compiling up-front, a tracing JIT analyses the flow of types through your program and can compiled specialised 'paths' for the frequently used parts. If you have a function that is called with integers and adds them, then machine code that performs this operation will be generated. The function is protected with a guard so that if it is ever called with different types then new code can be generated or the normal language mechanisms used.

In theory this approach is capable of offering more optimisations than compilers that operate 'up-front' (like the .NET and V8 JITs). By analysing the flow of types through the program a tracing JIT is capable of making much smarter decisions about what can be inlined for example.

So whilst both V8 and Tracemonkey have plenty of room to get faster (I'm sure), Tracemonkey has the most room ahead of it.

In terms of Python, the CPython Virtual Machine is written in C, with several design decisions 'hard-coded' throughout the source code. These include garbage collection by reference counting and the Global Interpreter Lock [3]. Changing this would be very painful (although integrating Tracemonkey involved Mozilla in a move away from reference counting to garbage collection - and they managed to automate the process of changing a lot of the source code). This was what motivated the creation of the PyPy project. It makes it much easier to experiment with new implementation strategies - and if the JIT lives up to its promises then Python may get faster, a lot faster.

[1]Rhino and Tigers has some great comments on typing as well.
[2]I don't have a written source for these quotes (he made them at an SFI conference in Poland that I attended earlier this year), so I have probably mangled the precise wording.
[3]Adam Olsen has created an incredible project called safethread that basically patches the CPython source to remove the Global Interpreter Lock. It does have a significant cost for single threaded code however.

Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.

Posted by Fuzzyman on 2008-09-10 23:06:36 | |

Categories: , Tags: , , , , , , , , , ,


Reasons to Love Python

emoticon:mirrormask I've been dabbling in C# recently, and I'm afraid I like it. As statically typed languages go it's a good one. However, on coming back to Python I remember how much I like it. It is not just that it is dynamically typed, but it is all the language features that make it concise and expressive.

A programming language is a medium of expression.

—Paul Graham

Here are a few of my favourites (in no particular order):

Generator Expressions

Python generators are very cool, and have been around since Python 2.2 (2002). They behave very similarly to the Yield Return introduced in C# 2.0 in 2005.

Also in Python 2.2 were list comprehensions (similar to LINQ over objects introduced in .NET 3.5 in 2007) that allow you to combine a loop and filter in a single expression:

result = [f(value) for value in iterable if value > 0]

The expression in the square brackets is evaluated immediately and is identical to:

result = []
for value in iterable:
    if value > 0:
        result.append(f(value))

Python 2.4 (2004) introduced a novel extension to list comprehensions: generator expressions. Generator expressions are similar to list comprehensions, but instead of square brackets they use parentheses.

generator = (f(value) for value in iterable if value > 0)

The major difference is that instead of being evaluated eagerly they are evaluated lazily. The generator expression returns a generator object that is not executed until you consume it by iterating over it. This means they can be more memory efficient as you can consume individual items from the generator without creating the whole list up front.

One side effect is that they look nicer as the arguments to a function that takes an iterable:

sum(f(value) for value in iterable if value > 0)

Generators are naturally first class objects that you can pass around your code. David Beazley has an excellent set of slides [1] on using generators for systems programming (and will be giving a tutorial on them at PyCon UK): Generator Tricks for Systems Programmers

One of the examples he gives is a pipeline of generator expressions for parsing Apache log files and summing the amount of data served:

wwwlog     = open("access-log")
bytecolumn = (line.rsplit(None, 1)[1] for line in wwwlog)
bytes      = (int(x) for x in bytecolumn if x != '-')
print "Total", sum(bytes)

None of the generators are consumed until the final call to sum. As it iterates a line at a time (not keeping the log file in memory) it can handle huge log files - and as a bonus it runs faster than a typical solution with loops!

Everything is an Object

Everything in Python is an object (even None) - functions, classes, methods and modules are all first class objects.

This allows for wonderful programming techniques including higher order functions, factory functions (for classes and functions), creating modules at runtime from databases, meta-programming and so on. It makes testing massively easier and is particularly useful when combined with Python's powerful introspection (inspect any object at runtime to determine its capabilities).

Tuples, Tuple Unpacking and Multiple Return Values

Tuples are one of the built-in container types. Although experts may 'decry' the description, they are effectively immutable lists.

The syntax is straightforward: (1, 2) is a tuple containing two numbers. They can be indexed or iterated over like lists and are a fantastic way of grouping values where otherwise you might create a custom class just to hold a pair of values! They are not unlike the new anonymous types that were introduced along with LINQ in C# - but those you can only use inside the scope where you create them.

Tuples in Python turn up implicitly in a few places, and one of these is where a function or method returns multiple values:

def function(a, b):
    return a * b, a - b

The return value from function is a tuple of two values. Being able to return multiple values is enormously useful - especially if you use it with tuple unpacking. Python can 'unpack' tuples in assignment statements:

product, difference = function(a, b)

It is fair to say that out parameters are only needed in C# because it can't return multiple values (again the alternative is creating a custom class and returning an instance of that instead). Smile

You can also unpack tuples into function calls:

parameters = (a, b)
product, difference = function(*parameters)

And the converse is collecting all positional arguments into a tuple (similar to the C# params method signature):

def function(*args):
    a, b = args
    return a * b, a - b

Keyword Arguments

Another small language feature, but invaluable for creating flexible and usable APIs: keyword arguments with default values.

def function(arg1=3, arg2=None, arg3='fish'):
    ...

function(arg1=2)
function(arg1=2, arg3='nothing')
function(2, None, 'something')

Arguments with default values can be called positionally (as normal) or by keyword. This allows you to call a function or method only overriding values that differ from the default.

In fact it is so useful that support for it has been built into IronPython for working with .NET objects. Keyword arguments used in a constructor are the equivalent to constructing the instance and then setting properties afterwards:

form = Form(Text="Form Title")

Decorators

One consequence of first class functions is that 'wrapping' functions (one example of higher order functions) becomes possible. Decorators were introduced to provide a convenient syntax for doing that. I didn't follow the debate as I wasn't interested in the feature and I didn't think I would use it. I was wrong - they turn out to be massively convenient for all sorts of things.

A simple example of wrapping a function (without decorator syntax) for exception handling and logging:

def wrapper(function):
    def inner(*args, **keywargs):
        try:
            return function(*args, **keywargs)
        except Exception, e:
            logger.log("Exception occurred in function '%s': %s" %
                       (function.__name__, e))
    return inner

wrapped = wrapper(function)

The wrapper function takes a function as an argument. It defines an inner function that calls the original function (keeping a reference to it through a closure) with exception handling and logging. It returns the inner function that can be used in the place of the original. (The *args, **keywargs syntax captures all the positional and keyword arguments that inner is called with and calls function with the same arguments.)

Having created the wrapper function we can actually do the wrapping with the decorator syntax:

@wrapper
def function(a, b):
    ...

We use this all the time at Resolver Systems - for profiling, mocking out names and 'auto-unmocking' them within the scope of a single function, invoking methods onto control threads and so on.

The Interactive Interpreter

No discussion of useful Python features can be complete without a mention of Python's REPL; the interactive interpreter.

You can explore new libraries or check out language features in a matter of seconds. You can even do real work from it - Tim Golden (a Python DBA and WMI guru) says that when working with databases he often uses the interactive interpreter and "slurp the data in, transform it, push it back out and walk away".

I've saved the most important (and most controversial) two for almost last:

Explicit Self

Explicit self certainly stirs up some debates. I like it - it makes Python scoping very explicit. When declaring an instance method in a class body, you declare the instance as the first argument to be passed in to the method - and by convention you name this argument self.

class SomeClass(object):

    def instance_method(self, arg1, arg2):
        ...

The useful thing about this is that you can see at a glance which instance attributes your method is using - they are all prefixed with self.

It also makes calling up to base class methods straightforward and consistent without requiring additional syntax:

class SomeClass(BaseClass):

    def instance_method(self, arg1, arg2):
        BaseClass.instance_method(self, arg1, arg2)

Of course if you really can't cope with the explicit self then you can always use my Selfless Metaclass that uses bytecode hackery to remove the need to declare it. Wink

Indentation Based Block Structure

Some people really don't like this, which to be honest baffles me a bit. Here's what I wrote for the Why Separate Sections by Indentation Instead of by Brackets or End question on the Python wiki:

In order to separate blocks of code (like for loops, if blocks and function definitions) the compiler / interpreter needs something to tell it when a block ends. Curly braces and end statements are perfectly valid ways of providing this information for the compiler.

For a human to be able to read the code indentation is a much better way of providing the visual cues about block structure. As indentation also contains all the information for the compiler, to use both would be redundant. As indentation is better for humans, it makes sense to use that for the compiler too.

It has the advantage that Python programs tend to be uniformly and consistently indented, removing one hurdle to understanding other people's code. Python does not mandate how you indent (two spaces or four, tabs or spaces - but not both), just that you do it consistently. Those that get used to the Python way of doing things tend to start seeing curly braces as unnecessary line noise that clutters code.

On the other hand, 'the whitespace thing' is possibly the single biggest reason why some developers refuse to even try Python.

Interesting to note that both Haskell and F# also allow you to delimit block structure by indentation (plus Python influenced languages like Boo and Cobra).

I haven't even mentioned language features that come as a consequence of being a dynamically typed language: duck typing (protocols instead of interfaces), heterogeneous container types (no need for generics) and many more...

[1]And unlike other talk slides these are very readable even without David explaining them.

Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.

Posted by Fuzzyman on 2008-09-03 22:33:11 | |

Categories: , ,


Django on Jython, Python Implementations and Performance

emoticon:newspaper Django now runs on Jython which is great news. Jeff Hardy is also making progress running Django on IronPython. As usual the news sparked a plague of comments on Reddit. There seems to be a lot of confusion about the different implementation of Python, and which bits of CPython acts as the reference implementation. (Even Ruby is getting a language specification...)

CPython is the reference implementation but several aspects have been explicitly described as implementation details. These include:

  • Stack frames
  • Bytecode instructions
  • The Global Interpreter Lock
  • Reference counting for garbage collection

Jython and PyPy do use Python stack frames, and so tend to have less issues than IronPython when running Python applications that depend on certain obscure implementation details. (IronPython doesn't and is faster in consequence.)

PyPy has implemented the GIL (mainly as a matter of convenience) - Jython and IronPython don't have a GIL and can scale multi-threaded code across several CPU cores.

None of PyPy, Jython and IronPython use reference counting for garbage collection. This means faster garbage collection but non-deterministic calling of destructors - which would normally be called immediately the reference count drops to zero in CPython (it also means no uncollectable cycles either which can happen in CPython when you have cycles involving destructors).

IronPython uses native .NET strings, and so all strings are Unicode. In my experience this has made working with strings much more pleasant in IronPython - roll on Python 3. This also used to be the case with Jython, but I believe that Jython now has byte strings. This makes it easier to get Django running, as Django 1.0 uses the difference between byte-strings and Unicode strings to determine whether it is serving text or binary data.

IronPython does a lot of magic to allow you to store binary data in strings (it can still be a cause of bugs - but they are bugs and should be reported to the IronPython team), but you can't dispatch on type. This makes it questionable whether an unpatched Django will ever run on IronPython without some other flag (or way of patching in a compatible 'bytes' type implementation). Jeff certainly seems to be making good progress though.

A new page popped up recently on the Python wiki (relevant I promise):

This is my answer to the question Why is Python slower than xxx Language ?:

Python as a language is a set of rules (its syntax and semantics) and so doesn't have a 'speed'. Only a specific language implementation can have a measurable speed, and then we can only compare performance with a specific implementation of another language. In general you can't compare the speed of one language to another - you can only compare implementations.

Having said that, as a dynamic language Python will typically perform slower for specific benchmarks than standard implementations of some other languages (although it is faster than plenty of others). As a dynamic language a lot of information about the program can only be determined at runtime. This means that a lot of common compiler tricks, that rely on knowing the type of objects at compile time, can't work. Despite this there are a lot of things that can be done to improve the performance of dynamic languages (beyond the performance of statically typed languages many believe), several of which have been done before in virtual machines like Strongtalk and are being explored for Python in the PyPy JIT tracing compiler.

Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.

Posted by Fuzzyman on 2008-08-14 21:56:09 | |

Categories: , , Tags: , , ,


Generators, finally and Iterator Finalization

emoticon:newspaper Raymond Chen has run a series of posts on the implementation of iterators (generators in Python speak) in C#. The C# compiler creates an inner class that acts as a state machine, which is nothing like as elegant as the Python implementation of course.

Today was part 3 in the series:

This entry concentrates on an additional place (to the expected) that a finally block can be entered: inside the finalizer of an iterator. I was intrigued, and discovered that the same is true in Python. If you have a generator with a finally block, and the iterator is garbage collected before the generator is exhausted, then the finally block is executed:

Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def f():
...   try:
...     for i in range(5):
...       yield i
...   finally:
...     print 'done'
...
>>> it = f()
>>> it.next()
0
>>> it.next()
1
>>> del it
done

This is the right decision of course. Something that is the wrong decision (in my opinion) is that if a finally block is entered because of an exception, and there is a return in the finally block then the exception is swallowed instead of being raised. Either a return in a finally should be disallowed (as it is in C#) or the exception should be raised.

The Python implementation of generators is particularly elegant because of the way functions / stack frames are implemented. As an overview... a stack frame has a code object associated with it. This has the bytecode sequence (as a byte-string) and a counter that points to the current bytecode instruction. Every time a new bytecode is executed the counter is incremented. When the function returns, nothing holds a reference to the stack frame anymore and it is garbage collected (actually they are expensive to create - so a pool of zombie stack frames is kept for reuse).

When you create a generator it holds a reference to the stack frame, and every time you call next execution continues at the next bytecode - until a yield or return is hit. The stack frame is kept alive until the generator is garbage collected.

Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.

Posted by Fuzzyman on 2008-08-14 21:09:59 | |

Categories: , , Tags: ,


Paths with Spaces on the Mac

emoticon:apple Part of the power of systems that inherit a legacy from UNIX is their rich and flexible scripting system. Many desktop applications use this somewhere under the hood - either providing wrapper scripts as part of the application or even to implement core functionality.

A disadvantage of this is that it is much easier to make your application sensitive to (i.e. broken for) paths that contain spaces - simply forget to put quotes in the right places.

This is a problem that afflicts Windows less, simply because a Windows programmer is less likely to implement core functionality by delegating to the shell. Additionally the default home directory and install path for applications both have spaces in them [1], which apart from being annoying when you do use the command line also means applications are more likely to be aware of spaces in paths.

Of course 'shelling out' is a technique that can work cross platform, and the only time [that I can remember...] I had a problem caused by quoting errors in shell scripts on Windoze was the PyLint shell script [2].

The core of Mac OS X is based on BSD, and so it is a modern UNIX 'variant', which makes it much easier to develop cross platform applications that run on Linux and Apple Macs.

The X11 server that comes with Mac OS X is developed separately as the XQuartz project. I recently tried to use some application that depended on a more recent version of X11, so I installed XQuartz. Which promptly broke X11 completely for me...

As you can doubtless guess, the reason XQuartz broke is that my user profile is on a drive with spaces on it. It seemed completely obvious to me that when I added a second drive (and moved a bunch of stuff onto it) it should be called 'Second Drive'...

It took a while to work out the problem (the error log showing the broken path was a good clue) and the solution was to use an updated version of the startX script that will be fixed in the next release.

I had a nice email from one of the developers suggesting that having paths with spaces in was a bad idea. This seems odd to me - if your application can't handle spaces in paths then your application is broken and it isn't the user's fault. All programmers on UNIX type systems should work from paths with spaces on to make sure their applications can handle them... (Maybe only as a punishment if shell quoting bugs are found in their scripts / programs.)

Still, I assured him that the only program I had ever had a problem with was XQuartz and I was worried about potential problems caused by renaming the drive.

Of course I was wrong, XQuartz wasn't the only program I've had problems with due to spaces in the paths. A couple of weeks ago I reported that I couldn't get FileZilla to work on my machine, and that no-one else seemed to have the same problem. Guess what it was actually caused by...

Naturally by the time I worked it out and reported it, it was already fixed...

Oh, and I do use the command line quite a lot on the Mac and no the space doesn't tend to annoy me - shell completion is nice. Smile

[1]Good old 'Documents and Settings' and 'Program Files'...
[2]When I started at Resolver Systems my boss installed Python into 'Program Files'.

Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.

Posted by Fuzzyman on 2008-08-04 11:28:58 | |

Categories: , Tags: , ,


Dependency Injection and Mock(ing) for Testing

emoticon:noise This blog entry is about changes to Mock - A Lightweight Mocking Framework for Testing, one of the projects (and thankfully small) that I will be working on once 'the book' is finally out of the way.

Suppose you have the following code, how do you test some_method:

class SomeClass(object):

    def some_method(self, arg):
        self.other = OtherClass(arg)

Well, we could just call some_method, but what if OtherClass is expensive to construct (or establishes a database connection) and we'd really rather not do that in a unittest?

We can't straightforwardly mock, because the construction of the class is internal to the method. One technique is to use dependency injection, where the dependency class is passed in. We can even make it the default value of a keyword argument:

class SomeClass(object):

    def some_method(self, arg, Dependency=OtherClass):
        self.other = Dependency(arg)

Now it is trivial to test some_method. We can pass in a mock object as the 'Dependency' and test that the body of the method acts on it as it should. The problem I have with this approach is that we have now added code that sets up the default dependency (the default value of the keyword argument), effectively another layer, that we should test as well. (If we only with an explicitly passed in dependency then we test under different conditions to how the code is called in production.)

This approach has other advantages however. The dependency on OtherClass is no longer 'hardwired' and we could use another class or callable at runtime if we wanted to change the behaviour. Of course this could be catering to a future and purely imagined usecase, and YAGNI applies. There is a lot more to dependency injection than this simple example, and it can be an extremely useful technique, if you need it.

At Resolver Systems we take an alternative approach to testing this kind of code, generally mocking instead of using dependency injection. When I discussed with one of my fellow developers whether we should use more dependency injection to reduce the amount of mocking needed in our tests, his response surprised me:

As far as I can tell, dependency injection just screws with your interface.

And it's true. In this example, the first version represented the code we wanted to write. We only modified it, changing the interface in the process, to make it easier to test. Not only does reading the code now give us a distorted view of its use, but tools that use introspection (like automated documentation tools) will also pick up on the changed API.

A standard Python way of testing the first version of the code is to mock out OtherClass at the module level. Because names are resolved dynamically, instead of being statically bound, if we patch the class for the duration of the test then our mock will be used when we call some_method. Unfortunately the code to do this requires a bit of ugly and tedious boilerplate:

import SomeModule
original = SomeModule.OtherClass
SomeModule.OtherClass = Mock()
try:
    something = SomeClass()
    something.some_method(arg)
    # test our Mock is used in the correct way

finally:
    SomeModule.OtherClass = original

Most mocking frameworks are based on the 'record - > verification' pattern for testing. Not only do I find this backwards (I prefer the 'action -> assertion' pattern) but it means you often have to record every step in a test, when you are only interested in testing a corner case - one part of the unit behaviour.

Mock is a very lightweight mocking framework for unit testing. It also includes a decorator that makes it trivial to mock out classes in a module without the boilerplate.

@patch('SomeModule.OtherClass')
def test_some_method(self, MockOtherClass)
    MockSomeClass.return_value = sentinel.Instance

    something = SomeClass()
    something.some_method(sentinel.Arg)

    MockOtherClass.assert_called_with(sentinel.Arg)
    self.assertEquals(something.other, sentinel.Instance)

The patch decorator monkey patches SomeModule.SomeClass only within the scope of the test and passes in the mock it creates to the test method. Actually the example above uses a couple of nice features of Mock (and its patch decorator) that aren't quite available yet. The code doesn't look very different with the current version though.

Of course in Python, modules are first class objects and are effectively singletons. This is one reason why the singleton pattern is so rarely useful in Python. It also means that changing module level attributes for tests is mutating global state. Forgetting, or botching, restoring state after your test will leave your patch in place for subsequent tests and cause you some very weird problems. This is reason enough for some people to say that it should never be done. One real consequence is that you can't run tests in parallel - but whilst running tests in parallel is fine, would you really do it using several threads rather than several processes? That's a recipe for disaster all of its own... Using patch ensures that the patching is undone whether the test passes or fails.

I originally created the Mock module to make some common testing patterns we use at Resolver Systems simpler. Since then we've made several improvements to Mock that have never been synced back in (plus I've had patches and feature requests from other users of Mock):

  • A patch (1 line!) from Kevin Dangoor for nose compatibility (preserving line numbers in patched methods)
  • Add the assert_called_with method
  • Add a side_effect attribute - a callable used every time a Mock is called. This allows you to do nice things like raise an exception if a mock is called, or return members of a sequence for repeated calls rather than just a fixed value.
  • Make the default return value a Mock (unless None is explicitly specified)
  • Modify patch to allow a single string argument specifying what should be patched

Apparently this is my 1000th blog entry (on this blog). The first was on January 23rd 2005, which makes for an average of more than five entries a week over three and a half years! [1]

Another project I omitted from my list of projects I'm interested in was Tweezer. Tweezer is a very nice looking Twitter client by Ed Leafe, built on the Dabo framework. It's still missing some features, but I'm looking forward to hacking on it...

[1]Over the last year I've also averaged more than five entries per week on the IronPython URLs blog, but as that is a link and news aggregator they tend to be a lot shorter.

Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.

Posted by Fuzzyman on 2008-08-01 11:26:12 | |

Categories: , , , Tags: ,


Last Chapter of IronPython in Action Finished

emoticon:black_hat The last chapter of IronPython in Action is finally done. There is still work to do before the book is complete (appendices, index, updating the first chapter in light of the last year in IronPython), but chapters 12 (Databases and Web Services written by Christian Muirhead) through to 15 are now ready to go to the final review phase. This will take a couple of weeks and after making any necessary changes it will then go through for publishing - which could take anything up to three months!

Chapter 15 has been one of the most fun chapters to write. It shows how to use IronPython as a scripting engine in .NET applications (from C# or VB.NET). The chapter table of contents is:

15 Embedding the IronPython Engine

15.1 Creating a custom Executable

15.1.1 The IronPython Engine
15.1.2 Executing a Python File

15.2 IronPython as a Scripting Engine

15.2.1 Setting and Fetching Variables from a Scope
15.2.2 Providing Modules and Assemblies for the Engine
15.2.3 Python Code as an Embedded Resource

15.3 Python Plugins for .NET Applications

15.3.1 A Plugin Class and Registry
15.3.2 Auto-discovery of User Plugins
15.3.3 Calling the User Plugins

15.4 Using DLR Objects from Other .NET Languages

15.4.1 Expressions, Functions and Python Types
15.4.2 Dynamic Operations with ObjectOperations
15.4.3 The Builtin Python Functions and Modules

15.5 Summary

The chapter works through four examples:

  • Creating a custom executable that launches a Python application
  • Using IronPython as an embedded scripting engine - setting up the execution environment and setting and fetching variables in the execution scope
  • Python for 'user plugins' in an application - including error handling, auto-discovery of plugins, exposing an API to the hosted Python engine and so on
  • Interacting with dynamic objects from C# and VB.NET - creating instances of Python classes, calling functions and methods etc

Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.

Posted by Fuzzyman on 2008-07-29 22:09:53 | |

Categories: , , Tags: , , , ,


Feature Freeze for Resolver One 1.2 Release

emoticon:drive The first major update to Resolver One after our public release in January, was version 1.1 in June. The focus of the 1.1 release was performance improvements. Overall we achieved a 30% performance improvement, but for some specific bottlenecks we made massive improvements [1]. As a result of this focus the list of new features in 1.1 wasn't very impressive.

The same is definitely not true of the list of improvements in the forthcoming 1.2 release. As you might expect, the list is a combination of implementing standard spreadsheet features, bugfixes and new features that build on what is unique about Resolver One.

The list doesn't include the major new feature of 1.2. It's a bit special (if you're into spreadsheets), and in my opinion is game-changing for developing spreadsheet applications. We're keeping it under wraps until the release.

Other than this there are still some impressive new features:

Major Features

  • Create dropdown lists in cells from cellranges (or any iterables) in user code
  • Paste special: paste formulae/values/formatting/comments
  • Drag down (or up or left or...) from bottom right of cell to fill a larger range with auto-incrementing values (effectively fill down for numbers and dates etc)
  • Percentage formatting for displaying value - numbers are multiplied by 100 and get a "%" at the end.
  • Percentages become legal syntax in formulae or as constants (direct values) in cells.
  • Persistent worksheets stored across recalcs
  • Support referencing cells in other Resolver One spreadsheets

Minor Features

  • Worksheet header lookups now by value, allowing integer (etc) headers

  • Support slicing on .Rows and .Cols iterators for Worksheets and CellRanges

    For example: worksheet.Rows[3:12] will return rows 3 through 11 inclusive. You can also slice with headers rather than indices.

  • Wait cursor when running click handlers for users' buttons

  • Ability to make visible changes to the grid during a button handler

  • Custom Numeric Types right aligned in grid and used by SUM (recognised because they define __abs__)

  • Performance boost - due to an upgrade to PLY 2.5

  • A whole bunch of standard spreadsheet functions

There are a couple of other features already underway that will hopefully make it in as well.

You can see almost the full list of new features, including bugfixes and improvements in the Financial Edition of Resolver One, in the entry on Resolver Systems News Blog:

The new version should be available within the next two weeks (hopefully sooner but you know how it goes). As always the desktop version of Resolver One is free for non-commerical and Open Source use.

Once the release is out of the way I'm hoping to find time to write up some new articles on Resolver Hacks. One article I'd like to write is an example CRUD application using Resolver One, showing some 'best practices' [2] for developing applications with Resolver One. Particularly the use of the persistent worksheets that cache between recalcs are ideal for this.

While we finish off version 1.2 we're already planning 1.3, and William is making massive progress with Ironclad...

[1]Indexing worksheets with headers went from being n-squared to O(1) for example.
[2]I know, I know. Better practises?

Like this post? Digg it or Del.icio.us it. Looking for a great tech job? Visit the Hidden Network Jobs Board.

Posted by Fuzzyman on 2008-07-28 22:38:49 | |

Categories: , Tags: , ,


For buying techie books, science fiction, computer hardware or the latest gadgets: visit The Voidspace Amazon Store. If you're looking for a new techie job, try the Voidspace Tech Job Board. This is part of the Hidden Network of technology and programming jobs.

Hosted by Webfaction

Counter...


Voidspace: Cyberpunk, Technology, Fiction and More
Search this Site:
 
Web Site

IronPython in ActionIronPython in Action

Blogads

Follow me on:

Twitter

Pownce

Jaiku

Del.icio.us

Shared Feed

Tech Jobs

Hidden Network

Tech Jobs Board

Hosting for an agile web