Python Programming, news on the Voidspace Python Projects and all things techie.

Static Typing, Dynamic Language Performance, V8 and Tracing JITs

emoticon:cat There has been a lot of good buzz recently about dynamic language performance; including Steve Yegge's Dynamic Languages Strike Back essay [1] and the more concrete demonstration in the impressive V8 Javascript Engine inside the google Chrome browser.

Two of my favourite quotes on programming languages and 'the typing issue' come from Gilad Bracha [2]:

"Only a subset of all possible programs can be written with statically typed languages. For some people that is enough."

"For optimisation, more is known about a program written in a dynamically typed language at runtime than is known about programs in statically typed languages at compile time"

The first quote makes an interesting point. In a statically typed language the compiler must know (either by inferencing or through explicit declarations) the type of every object everywhere you use it. This places restrictions on what you can do, but because of these restrictions the compiler is able to provide you with extra information (is the program type safe) and make performance optimisations (no need for dynamic dispatch or member lookup - they can all be statically bound).

So it is a trade-off, there are things you just can't do in a statically typed language that you can do in a dynamically typed language - but you make this trade-off for the sake of the compiler.

Note

Those who use statically typed languages don't do it merely because they "haven't yet discovered dynamically typed languages". When you become fluent in any complex system you start to think within its terms - and those who are proficient with statically typed languages think in terms of its types and leverage the information and feedback from the compiler.

This means that when presented with a dynamically typed language you have taken away the framework within which they think. Exactly the same is true for those who have only programmed with dynamically typed languages, they simply can't think within a statically typed framework. This is why the two sides get on so well...

The point of this blog entry is not that dynamic typing is better, but that it is a trade-off (on both sides). However performance needn't be one of those trade-offs.

Of course typically statically typed language implementations are able to be faster than dynamically typed languages where a lot of extra work has to be done at runtime.

Back in the late eighties a bunch of incredibly smart guys did a lot of research around improving the performance of dynamic languages like Smalltalk. They were convinced that dynamic languages could even outperform statically typed languages through runtime optimisations. One of the results was a paper An Efficient Implementation of Self, a Dynamically-Typed Object-Oriented Language Based on Prototypes along with the Strongtalk VM. Unfortunately (to quote Gilad again) this is rocket science, and when Sun snapped them up to work on the Hostspot JIT compiler for the Java Virtual Machine not many people continued their work.

(At PyCon the Microsoft guys I spoke to acknowledged that Hotspot - a dynamic JIT - was superior to the JIT in the .NET framework, but they claimed that their garbage collection implementation is better. In a managed VM garbage collection has a big performance impact and is one of the areas where PyPy is already faster than CPython.)

This work has resurfaced in an interesting place. The paper on Self was referenced from one of the design documents on the Javascript V8 engine. Like Self, Javascript is a prototype based language - and their description of some of the optimisations are surprisingly readable.

From Design Elements of the Google V8 Javascript Engine: Fast Property Access:

"JavaScript is a dynamic programming language: properties can be added to, and deleted from, objects on the fly. This means an object's properties are likely to change. Most JavaScript engines use a dictionary-like data structure as storage for object properties - each property access requires a dynamic lookup to resolve the property's location in memory. This approach makes accessing properties in JavaScript typically much slower than accessing instance variables in programming languages like Java and Smalltalk. In these languages, instance variables are located at fixed offsets determined by the compiler due to the fixed object layout defined by the object's class. Access is simply a matter of a memory load or store, often requiring only a single instruction."

"To reduce the time required to access JavaScript properties, V8 does not use dynamic lookup to access properties. Instead, V8 dynamically creates hidden classes behind the scenes. This basic idea is not new - the prototype-based programming language Self used maps to do something similar. (See for example, An Efficient Implementation of Self, a Dynamically-Typed Object-Oriented Language Based on Prototypes). In V8, an object changes its hidden class when a new property is added."

Of course immediately after the release of Chrome a whole raft of benchmarks came out comparing it to the JIT that Mozilla are working on integrating into Firefox to improve Javascript performance: Tracemonkey.

Tracemonkey (based on Tamarin from Adobe) is a quite different Just-in-Time compiler than the one used in V8. Both compile higher-level languages to native machine code (although interestingly V8 has no intermediate bytecode step). V8 (like the .NET JIT) compiles 'up-front', whilst Tracemonkey - as its name implies is a tracing compiler. This is the same technology being implemented in the PyPy project.

Note

One of the fantastical things about PyPy is that it is more than just 'Python-in-Python' - it is a whole interpreter compiler toolchain.

Interpreters are 'described' in RPython (a static subset of Python) and then compiled into interpreters capable of running standalone (using the C backend) or on the CLR and JVM.

The PyPy tracing JIT is (for a limited range of types currently) capable of emitting machine code (or .NET/Java bytecode) optimised for specific operations (e.g. typed bytecode that only adds numbers if that is what your program is doing). On the .NET / JVM backends this bytecode will be translated into machine code by the JIT of the underlying platform - so your Python code under the PyPy interpreter will be 'double-jitted' on these platforms.

So what is a tracing JIT? Rather than compiling up-front, a tracing JIT analyses the flow of types through your program and can compiled specialised 'paths' for the frequently used parts. If you have a function that is called with integers and adds them, then machine code that performs this operation will be generated. The function is protected with a guard so that if it is ever called with different types then new code can be generated or the normal language mechanisms used.

In theory this approach is capable of offering more optimisations than compilers that operate 'up-front' (like the .NET and V8 JITs). By analysing the flow of types through the program a tracing JIT is capable of making much smarter decisions about what can be inlined for example.

So whilst both V8 and Tracemonkey have plenty of room to get faster (I'm sure), Tracemonkey has the most room ahead of it.

In terms of Python, the CPython Virtual Machine is written in C, with several design decisions 'hard-coded' throughout the source code. These include garbage collection by reference counting and the Global Interpreter Lock [3]. Changing this would be very painful (although integrating Tracemonkey involved Mozilla in a move away from reference counting to garbage collection - and they managed to automate the process of changing a lot of the source code). This was what motivated the creation of the PyPy project. It makes it much easier to experiment with new implementation strategies - and if the JIT lives up to its promises then Python may get faster, a lot faster.

[1]Rhino and Tigers has some great comments on typing as well.
[2]I don't have a written source for these quotes (he made them at an SFI conference in Poland that I attended earlier this year), so I have probably mangled the precise wording.
[3]Adam Olsen has created an incredible project called safethread that basically patches the CPython source to remove the Global Interpreter Lock. It does have a significant cost for single threaded code however.

Twatter and MultiDoc on Mono Windows Forms (Mac OS X)

emoticon:info At Resolver Systems we've been busy preparing for PyCon UK.

As well as sponsoring the drinks for the Saturday evening dinner there are several of us speaking:

Plus of course Menno, Christian and I will be giving our Developing with IronPython tutorial on the Friday. We finished and 'handed in' the tutorial handouts, which are a great twenty-odd-page introduction to IronPython, so sometime I'll put them online with the example code.

The example code is a simple Twitter client called Twatter. It uses Windows Forms and as we intend to support it on Windows, Linux and the Apple Mac I've been testing it with Mono 1.9 on Mac OS X 10.5 (Leopard).

Twatter running on Mono on an Apple Macbook Pro

Not what you would call beautiful, but acceptable. Smile

(It is currently hardwired to just use my image - and I didn't write that code - so it isn't a bug. Twatter is still a 'work in progress'.)

I've also been playing with MultiDoc, which is a tabbed text-editor and the example application for chapters 3-6 of IronPython in Action.

MultiDoc running on Mono on an Apple MacbookPro

The interesting thing about this is that the 'Name Tab' dialog was actually created by the Visual Studio forms designer (generating C#) - and works fine from Mono.

The chapter on testing in IronPython has a functional test for MultiDoc, and I've expanded on this (and added more tests) to illustrate some of the principles behind functional testing for my talk.

UPDATE: I've just tried these apps with Mono 2.0 Preview 2. There are some tiny aesthetic improvements, but some bugs that I was going to have to track down have gone - which is nice (unhandled exception on exit for example).


Hosted by Webfaction

Counter...