Python Programming, news on the Voidspace Python Projects and all things techie.
CPython Extensions from IronPython: cext, jumpy and now makestub!
William has also had some progress with an alternative approach to using CPython extensions from IronPython, and he has posted some code. This doesn't yet work, but is a proof-of-concept showing that the technique works.
The code (along with a version for Mono) is available from the files area associated with the 'C Extensions for IronPython Mailing List':
I now have a proof-of-concept that actually works: code is in the Files section (zipped VS2005 solution). My last post should be a reasonably accurate guide to its contents, but I think I've fixed the parameter issues. Other notes:
- It should build out of the box, and output everything into ".\build".
- If you have numpy installed in the default location under C:\Python24, then the file ".\build\import-multiarray.bat" should successfully call initmultiarray and redirect the calls it makes into FakePython24. It probably won't crash.
- I have no idea whether it will 'work' with other .pyd modules; but it might, if they contain a code path which only calls Py_InitModule, PyErr_Occurred, and PyErr_SetString before returning.
- There's a problem with ".\python24\python24.def": I can't export both _PyErr_BadInternalCall and PyErr_BadInternalCall at the same time. The underscore-free version hasn't been needed yet, but it bothers me. Does anyone know what the problem might be?
The next step is probably to try to make sense of the array_module_methods pointer passed into Py_InitModule, and see if I can somehow hook its contents up to a real IronPython module.
Seo Sanghyeon has created a version that works with gcc, for Mono. Unfortunately this isn't cross platform either:
William's proof of concept for API interception was impressive. And jumpy is a great name!
As __declspec(naked) is unavailable in GCC, I tried to generate equivalent assembly code. The result is now uploaded to Google Groups as makestub.zip.
Running generate.py should produce stub.so under build directory which impersonate the underlying library.
There are two versions. v1 is the code I initially wrote. It uses objdump to read symbol table, generates C code calling dlopen and dlsym, and generates NASM code to use jumptable. v2 is my attempt to port v1 to Windows, but I haven't succeeded. It introduces dlport.h for portable runtime linking, and can use pexports to read symbol table on Windows. But I couldn't figure out how to link NASM's win32 output to produce DLL.
My code generates jumptable for function symbols only. It doesn't make much sense to jump to PyExc_Exception, for example.
I never thought that working at Resolver would involve assembly language.
|||Thanks to Kwang Yul Seo for reporting this.|
IronPython Demos: Sho and Silverlight
Mahesh demoed a cool new sample as part of his IronPython and Silverlight talk, there was another project that he showed me, but didn't have time to demo.
The Microsoft Silverlight team have created a new demo application that will appear as a sample sometime. (Unfortunately I forgot to get screenshots before he left - sorry). It is a Silverlight IDE that runs in the browser. It is the first Silverlight demo I have seen that looks like a proper application. It has menus, a console, project browser and code editing area. It is for editing IronPython Silverlight applications and can load projects from the web. You can edit files and step through the code, including setting breakpoints. You have a console with live access to the objects in whatever context you are stepping through. Very cool. It puts my Silverlight Web IDE to shame.
The next demo was a Microsoft research project called Sho. This is an interactive scientific computing project and is built on top of IronPython. It provides useful array types and has interactive graphing and charting abilities. Unfortunately it isn't available at all yet, and Microsoft are working out how / whether to productise it. I guess it is their answer to things like Mathematica and Matlab.
You can see a brief demo of it in Mahesh's presentation at Orlando TechEd - http://techedmsftwm.fplive.net/techedmsft/2007/DEV315.wmv (unfortunately a streaming video, I couldn't find a downloadable version).
It looks very cool, and like it would be a good fit for Resolver.
I also learned from his talk that MySpace are a big user of IronPython, using it to manage their servers!
The Dynamic Language Runtime (and a new IronPython Release)
There is a new release of IronPython out: IronPython 2.0alpha6. This fixes a lot of boring bugs , adds relative/absolute imports and also includes an example language implemented using the Dynamic Language Runtime: ToyScript. (It is currently part of the IronPython distribution.)
Martin Maly is the developer who (other than Jim Hugunin) has been on the IronPython team longer than anyone else. Today at TechEd he did a talk on the dynamic language runtime. Here he walked through parts of the ToyScript example, which is around 3000 lines of code and pretty clear.
The DLR basically takes care of your type system and compilation. What you need to do is provide a parser and transform your Abstract Syntax Tree into a 'DLR Tree'. You also provide a set of rules that implement your language semantics. You only need to provide rules where they differ from the default CLR (.NET) behaviour, so you don't need to implement integer addition for example. A very interesting talk and ToyScript looks like a great place to start experimenting if you have a yearning for language design.
The highlight of the talk was Martin showing off his lolcode implementation. The whole thing took him only 14 hours (whilst travelling) and is a pretty complete lolcode implementation built using the DLR. One advantage of implementing a language on the DLR is the Visual Studio support, so he showed setting breakpoints in lolcode and stepping through it. Unfortunately he needs Redmond's permission before he can release it, but he says that will probably come soon...
One of the great things that this conference has confirmed is how much use Microsoft are making of IronPython and the DLR. They are building it into an enormous amount of their products, and the new version of Visual Basic (Visual Basic 10) will use the DLR. This means that the future of IronPython is secure, at least for the foreseeable future.
|||Which of course is unfair. The bug fix does include some Python socket module fixes, but Seo says that there are still some issues remaining unfortunately.|
TechEd Podcasts and Interviews
I've had a great time at TechEd (except for problems with the truly awful Hilton Hotel here). One of the things I have been up to is doing a few podcasts and video interviews.
This is a podcast by some rather odd (but fun) guys. About IronPython and Resolver needless to say. This podcast has a transcript which is great if (like me) you prefer to read than listen.
A five minute interview with the 'TechEd fishbowl' on developing with IronPython. This is an edited version of a fifteen minute long interview that will appear on Virtual TechEd some time soon.
I did a video interview with Mahesh for a French IT site. The video hasn't appeared yet, and although the intro will be in French the interview was in English (of course).
IronRuby, Web app testing, Language Popularity and Stuff
Another pot-pourri of interesting links:.
IronRuby in Action
A friend of mine, Ivan Carrero, is writing an IronRuby book for Manning.
The Ruby benevolent dictator secretly likes Python
Actually its not such a secret. Here's a photo of Matz wearing a Python T-shirt to the recent Rubyconf.
A new website dedicated to tracking language popularity. The results are as valid as all language popularity metrics, but their technique is interesting.
Web application testing with Windmill
A new (Python based) browser testing tool was just announced on the Testing in Python Mailing List. It comes out of the Chandler Project and claims to implement a larger set of a browser testability than Selenium: Windmill.
Windmill implements cross browser testing, in-browser recording and playback, and functionality for fast accurate debugging and test environment integration. Support for Firefox, IE6/7, Safari on * *Windows, Linux and Mac OS 10.4 and 10.5.
TechEd Barcelona: Before the IronPython Talk
I'm not exactly live blogging, I'm sitting in my hotel room typing this for upload later. The hotel wifi prices are beyond extortionate.
Tomorrow, at 10.45am I'm doing a demo of Resolver as part of the IronPython talk by Mahesh and Martin Maly. They're both great guys and it's particularly good to meet Martin who, aside from Jim, has been on the IronPython team longer than anyone else. Martin is doing a talk on implementing languages with the DLR (on Thursday) which will be great.
I'm looking forward to the talk. I went through some Microsoft speaker training and did most of the demo (which I think will be ok). The training was very useful for giving demos:
- Speak louder than you feel comfortable with (unless you're naturally a very loud person)
- When demoing keep your hand off the mouse unless you are actually using it
- Try not to talk and type (which means breaking up the typing by talking in between lines if possible)
- Don't look at the screen (if the display vanishes someone will yell!)
- Moving your head screws up the mic (related to the point above)
I have a tendency to move around with nervous energy though, which will be hard to combat.
Chatting to Mahesh I've learned some great things about the future of the DLR, which unfortunately I'm not allowed to blog.
As well as the robotics demo and general IronPython background, Mahesh will be showing a Microsoft research project built on IronPython. It looks really good and could be a good match for Resolver - but no word on when it will escape from the research labs. More details after the talk.
Oh, and I did a podcast with the NxtGenUG guys, and later took part in the 'Swagily Fortunes' gameshow that they ran. My side was rubbish and we lost horribly, except I won the random star prize of an HTC Touch Phone! It makes a nice companion for the iPhone I suppose.
cext: Further Developments and Accessing Lists (etc) with cext
In my last blog entry I posted an example of accessing CPython extensions from CPython with cext.
Meanwhile, William (a colleague from Resolver Systems) has been experimenting with what I think is likely to be a better general solution. He recently posted the following message to the C Extensions for IronPython Mailing List:
I've been looking into some of the other approaches that were discussed earlier, and I can report limited success at hooking CPython API calls and redirecting them to CLR code (on Windows only, as yet). Specifically, I can invoke a .pyd's initmodule function from CLR code, and redirect the calls it makes to python24 back into managed code. However, it is only the barest sketch of a solution, and has a number of problems.
Specifically, my 'solution' consists of:
- FakePython24, a managed class which contains a few similar functions to python24, and has some extra gubbins for keeping track of function pointers.
- python24.dll, an unmanaged library which makes a reasonable stab at impersonating the real python24.dll.
- CPyImporter, a managed class which imports .pyd modules and makes their initmodule functions accessible.
When we create a FakePython24, it loads the fake python24.dll (which itself loads the real python24), and calls InitFakePython24(), passing in an unmanaged function pointer gleaned from a delegate pointing to its own GetFunctionPointer function.
InitFakePython24 creates an array of function pointers, where possible pointing to patched functions implemented in FakePython24 but otherwise pointing to the real python24 functions. The fake python24 contains implementations for each function which do nothing but call the associated function pointer; once the array has been initialised the redirection is in place.
Independently, we can create a CPyImporter, and call GetModuleImporter with the path to the .pyd file. This creates an instance of an object with one method, init(); when we call this it tries to call the .pyd's initmodule function.
And... as long as the redirection is in place before you call init, it kinda works. I certainly don't marshal the parameters correctly, and my assembler calls work more by luck than judgement: that luck reliably runs out by the 3rd API call. Nonetheless, this is by far the 1337est thing I've ever done, so I consider it a decent proof of concept.
I submit 'jumpy' as a name for this sub-project.
Meanwhile, there is a problem with the example code I posted in my last entry (which I have just modified). The code was importing the sys module from the hosted interpreter, and attempting to append to sys.path.
Unfortunately this has no effect! (Although the path seems to be set correctly for my computer anyway - and it isn't just due to the PYTHONPATH but is probably because python24.dll reads registry entries when loaded.) The reason it doesn't work highlights something to be aware of if you want to use the cext module.
In the code above, the sys module is successfully imported as a proxy object. When you access sys.path, this proxy object recognises that you are accessing a Python list (on the CPython side) and copies it across to IronPython for you. This means that the append is executed on the copy, not on the original. d'oh
The solution would be, either to not copy the list and to proxy access to it as well, or to provide functions on the CPython side allowing you to manipulate sys.path. Probably in the short term providing an alternative access pattern (that always returns a proxied object rather than a copy) would be useful. If the list was proxied then the append call would be proxied 'into' CPython correctly.
Seamlessly Import CPython Modules from IronPython
There is a new release of the "CPython Extensions for IronPython" module (now called cext) I announced recently. Version 0.1.3 includes several small improvements, but also an import hook (created by Seo Sanghyeon) that allows you to import CPython binary extensions from IronPython using normal import statements!
There are two ways of using this module. The first way is with the Import function from the embedding module.
pylab = Import('pylab')
pylab.plot([1, 1, 1.5, 2.5, 3, 3, 3.1])
Which generates a simple plot. As you can see, the pylab module imported from CPython behaves in (apparently) the same way as it does when running directly in CPython.
The second way is to use the import hook. Installing the import hook allows you import Python binary extensions using normal import statements! Python binary extensions are .pyd files on Windows and .so files on other platforms. To install the import hook, execute the following code:
You can then do things like import cElementTree.
The goal is that eventually this will be build into FePy and enabled by an option, so that you can import CPython modules without having to take any special steps.
Software Estimation and Planning Poker
Software estimation has been getting a lot of blog love recently, particularly because of these two articles:
Planning poker is a way of estimating the cost of features and is part of eXtreme Programming. It is an important part of iterative development, and we use it at Resolver Systems (although we don't have any fancy poker cards).
In eXtreme Programming, features are prioritised by the customer (or a customer representative) but estimates are owned by the developers. Iterative development allows you to get a 'shippable' product as early as possible in the development process and keep it shippable. Reliable estimates help the customer to prioritise features based on how expensive (in terms of developer time) they are.
Having estimates owned by the developers make it impossible for Giles  (uhm... I mean 'da management') to force unrealistic deadlines on the development team.
For this process to be useful, the estimates need to be accurate. This is where Joel's article is interesting. He does a good job of explaining why, on average, estimates will be well below the amount of time actually taken:
- There are lots of reasons why a feature may take more time than estimated, but only a few reasons for them to take less time
- A feature estimated to take two days can easily take three days more than estimated, but there is no way it can take three days less
In XP there is a way of tracking the accuracy of estimates: the velocity. You record both your estimates and how long each feature actually takes. Your velocity is the estimated total divided by the actual total. Unforeseen activity, like absences, emergency build fixes and spikes all come out of the velocity (they are included in the total time taken). This is because unforeseen activities happen all the time, and impinge on development times.
At Resolver we have a velocity of about 0.5 (the same figure that Joel comes up with). A feature estimated at three days actually takes six days. This is consistent, our total velocity and our three month running velocity are about the same. This makes it a useful tool when planning our iterations - if we have twelve pair days in an iteration then we know we can allow for six days worth of estimates in that iteration.
|||Which is totally unfair as it was Giles who brought the XP practises to Resolver.|
Spiking: The Answer to the Most Common Objection to Test Driven Development
TDD means writing tests before code, being driven to write code by the failing tests that you write. For a good summary of adding a feature to an application using TDD see Andrzej on Test Driven Development.
Having used TDD for about eighteen months I'm a thorough convert.
The great advantage of TDD is not that you end up with a well tested application, but that it changes the way you develop code. By writing unit tests before the code it is testing, you have to write modular decoupled code. It forces you to think about what API you want, instead of just banging out the code and ending up with whatever API results. Inevitably writing testable code results in better code. Writing the tests becomes part of the way you approach problems and think about API design.
Having great test coverage and an incredible framework for refactoring is just a great side effect of TDD. (A comprehensive test framework takes a lot of the pain out of refactoring - you can change core classes and know when the refactoring is complete because all your tests pass again.)
When I talk to other developers about TDD, most of them are in favour of testing (not many of them do a lot of testing though - but who is going to admit to being against testing!), but many of them come up with the same objection to TDD. It goes something like this:
A lot of the time when I'm coding I don't know how to solve a particular problem. To have to write tests for code that I know I'm going to throw away, or worse for code that I don't even know if it will work, is just a waste of time and would really slow me down.
Of course the folk who enshrined the XP theology are well aware of this, and the solution is built into TDD. The solution is the spike.
Spiking is where you have an problem that you don't know the answer to. Before you can write the tests for the code, you need to know what code you're going to write. This is where spiking comes in, the one you build to throw away. Often most of the code you need to write is straightforward, but there are one or two difficult questions you need answers to. You spike a solution to just these problems, and can then design the code. It is common to have to spike several times during a user story. These are usually just small diversions.
Sometimes a whole user story (feature) can't be estimated (in terms of difficulty and how long it will take) until we know how to implement it. In these cases a single developer will spend time spiking the solution so that it can later be paired on.
Resolver Goes Multinational, Real World Problems with Resolver, IronPython Localization Issues and an Important Announcement
Next week Resolver Systems is going multinational. I'll be in Barcelona, talking about Resolver at the Microsoft TechEd conference. The boss will be in New York meeting with potential customers and our head of sales will be doing the same in Paris.
Meanwhile we've been gradually extending the Resolver beta program. We currently have about 100 beta users and are working towards a full public beta in two or three weeks time . We have about eight hundred people signed up for the beta program which is great.
Although we run Resolver on a variety of hardware, and run the test suite continuously, having Resolver in the hands of so many users has (unsurprisingly) shown up some real world problems.
One of these was a bizarre problem from a user in Turkey. Resolver imports the Python decimal module, so that it can recognise decimals as numeric (numbers are displayed right aligned in the grid). This module isn't essential to Resolver, but when the import fails Resolver refuses to start. Obviously this doesn't happen for us, decimal imports fine.
We were getting a key error in the following lines (around line 2171 of the decimal module):
#name is like _round_half_even, goes to the global ROUND_HALF_EVEN value.
globalname = name[1:].upper()
val = globals()[globalname] # <---- This line raises the exception
Decimal._pick_rounding_function[val] = name
Aside from the fact that this is odd code anyway, it would fail in Resolver on a machine with a Turkish locale. After inserting some diagnostic code in, we discovered the cause of the problem. In Turkish, the uppercase of "i" is "İ" with a dot on top of it. And lowercase of "I" is "ı" wthout a dot on it. When the name round_ceiling is uppercased in Turkish it turns to ROUND_CEİLİNG which can't be found...
This happens because strings in IronPython are unicode, and the upper method (which translates to ToUpper for .NET strings) is locale aware (apparently Turkish is the only locale in which an unaccented latin character has an uppercase value that's different from what its uppercase value would be in en-US). It is therefore a very difficult problem to solve in IronPython.
One simple solution for IronPython is for decimal to use the ToUpperInvariant string method, which would mean we have to patch the Python standard library module that we distribute with Resolver.
In response to this problem, Dino Viehland (a lead developer of IronPython in the Microsoft team) made this suggestion:
One option might be a non-technical solution: Instead of you redistributing the library (or modified library) we distribute it w/ IronPython - and then you're just including the combined package. There's other reasons why it'd be good for us to do this (help, encodings, warnings, etc...).
So, inline with Microsoft's softening approach to Open Source, they are offering to distribute the standard library with IronPython! This is great news.
They have previously been reluctant to do this. In the past they have been sued for distributing software that they didn't have the rights to, even when they had paid for the rights from another firm (who it was subsequently discovered didn't own the rights - but of course suing Microsoft was much more attractive).
The only way we discovered this issue was by having a user run Resolver on a machine with a Turkish locale (many thanks to Çağatay Tengiz for his help and patience and Curt Hagenlocher for his suggestions). I wonder if Python 3 will have same issue with similar code, since the string type will be unicode.
The next problem we have to sort is a user running Resolver on two laptops with 512mb memory. One one machine it works fine, and on the other it crashes when starting up with an out of memory error trying to allocate a GDI handle. Very odd.
|||It's only eighteen months since I said that the beta would be available in about six months. This time we mean it.|
This work is licensed under a Creative Commons Attribution-Share Alike 2.0 License.