Python Programming, news on the Voidspace Python Projects and all things techie.
A Classic Tale of Optimisation
Yesterday we did some performance optimisations on our application . A key step in our processing was taking around 11 seconds (for a moderately, but not excessively, complex data set).
This meant a decidedly sluggish user experience. Our target was to get this below a second. Having no idea how long it would take, we set aside one pair day in this iteration to examine the issue.
First off we suspected that the sequence of events might be triggering the processing twice. This was correct, so bang; half an hour of work and Resolver runs twice as fast.
We thought that losing the next five seconds could prove a lot harder.
The key section was our function from which the processing was done. This triggers a recursive function, which walks a graph calculating dependencies as it goes. As we needed to profile inside these methods, we used the crude but effective method of timing 'blocks' of code in the functions, factoring out the recursion, and printing the results.
The recursive function first checked if the current node had already been visited. This was just two lines of code :
We were pretty sure this wasn't the problem, but for completeness we profiled it as well.
To our surprise, the code was spending most of its time here; about 3.5 of the 6 odd total seconds.
self.visitedNodes was a list, and as that grew the membership test was taking progressively longer. Changing this to a set saved us around three seconds from our total calculation time.
Our next optimisation involved caching one of our generated objects (one per node). As these are first generated during the initial load this ought to improve processing of minor changes.
This it did, but we were in for another surprise. Not only did we get a saving from creating the objects, but also from using them. The code path for using the objects is virtually identical to the one in the initial load phase, and the time spent in this code dropped to nearly zero when we cached the objects We can only conclude that this is a 'free' benefit from the .NET Just In Time compiler.
So in the one day we'd set aside to investigate optimisation, we achieved our target and got processing time down to about 800milliseconds. If you discount GUI time overhead, that's a processing speed improvement of around twenty times. Very satisfying.
|||At Resolver Systems .|
Starting with IronPython
I previously blogged that IronPython didn't have an os module. In fact, if you add the Python standard library to sys.path, you can use os.
My first attempt added the standard library directory as a package, which failed.
If you download IP and use it on a machine that also has Python 2.4 installed, something like the following should work (after running IronPythonConsole.exe) :
I'm pretty sure you can set the environment variable IRONPYTHONPATH, to make this step unnecessary.
Dino Viehland has confirmed that imp.new_module returning None is a bug. However, IP doesn't use Python byte-code at all. Code objects are .NET code objects (presumably in IL byte-code ?). This means that it is unlikely that IP will ever be able to marshal or un-marshal code objects.
Another consequence of running on the CLR is that Python stack-frame objects aren't used. This means (for example) that sys._getframe isn't implemented.
Today I hacked around (a little) with C#. We used a high-performance timer  for profiling. It wasn't so scary.
More on our profiling experiences in the next post.
When I've learned a bit more I hope to put together an article on creating a basic application with IP. .NET may be Micro$oft, but it's not bad.
|||time.time and time.clock only seem to have a resolution of tenths of seconds. The .NET DateTime.Now seemed to have a resolution of around 15ms.|
If you would like a general introduction to rest2web, and you speak German, this could be very useful.
He also has some good ideas about turning rest2web into a more useful tool for making HTML presentations.
The Python Academy also sounds like an interesting place. Mike recently did an interview on the Python 411 Podcast Series that I haven't yet had a chance to listen to.
Test Version of rest2web 0.5.0 alpha
There is now a new test version of rest2web available :
This has lots of new features available, but because of internet problems I haven't been able to update the subversion repository yet. There is still a lot of work to do (including documenting the features), but some of the new ideas need further development and I'm sure there will be bugs to fix.
Most of these features have been added in response to user requests. The main change is that rest2web can now build sites without needing restindexes, templates or index files. This allows you to generate a site just from a bunch of ReST files
The new features already added (since 0.4.0 alpha) include :
Can build a site with no index pages, no template and no restindexes. (The force option.)
Allow passing of global uservalues at the command line and in the config file. (Command line uservalues override config file ones.)
A --template-file= command line option. This overrides the top level template specified in the restindex and allows you to have different templates for the same site, e.g. for building different online versions and documentation versions.
The final_encoding should never be utf8 (not recognised by browsers) - should be utf-8 instead. (Now automatically handled.)
Added and documented initialheaderlevel.
file keyword - should only copy if the file has changed (checks timestamp and size, copies timestamp with file).
File keyword was broken if used outside an index file.
Gallery chokes on thumb.db and animated jpgs. (Now skips all non image files and any image files it can't handle.)
Deleted urlpath from rest2web because it is now in pythonutils.
Implement levels of verbosity. These now work.
A global set of uservalues from the config file. Overrideable in individual pages. (Including __encoding__ special value.)
Fix rendering of uservalues from ReST to HTML. Uservalues in ReST format should now use <* ... *> and <$ ... $> in pages instead of <% ... %> and <# ... #>.
Add uservalues to the namespace.
Added formattime standard function.
Made the namespace/uservalues available in the macros.
Made namespace/uservalues values available to the functions. This isn't yet used but will allow for some more interesting standard functions.
Removed the two <br /> from listend in the standard function minibar.
Added wrapper_class to print_details.
Added os and sys to namespace.
The default crumb for index pages (if no page-title specified) is the filename, minus the extension, turned to title case.
There are still various things to add before 0.5.0 becomes final, but rest2web is shaping up well.
You can see which command line options have been added by typing r2w.py -h at the command line.
Movable Python 1.0.2 Beta
There is a slightly updated version of Movable Python available. This is Movable Python 1.0.2 Beta. It's marked as beta because I haven't yet updated the docs, and I would like a few people to test it before I mark it as final.
The version for Python 2.4 is available for download by anyone who has a license for Movable Python for Python 2.4.
First off, the following modules/components have seen updates :
- Python 2.4.3
- win32 version 208
- wxPython 188.8.131.52
- Firedrop 0.2.1
- ConfigObj 4.3.0 in pythonutils
The biggest change though is an internal refactoring, that you won't see much evidence of. The code has been cleaned up a great deal and separated out into several modules, this will make the big changes I have planned much easier. It is also why I want it testing.
Other changes since 1.0.1 include :
- Verified that __future__ statements are handled correctly.
- Scripts (and customize.py) are now executed in a specific namespace, no more movpy cruft.
- When entering interactive mode (movpy -), any additional command line arguments are passed to IPython.
- imp.find_module has been fixed to work with modules contained in the zipfile. This fix doesn't write any temporary files, but imp.load_module has been patched to accept a StringIO instance.
I have also added built in support for matplotlib. If you have the matplotlib files installed in your lib/ directory (you can grab them from here), then you can run the following command at the command line :
movpy.exe - -pylab
This should drop you straight into a IPython session, with pylab enabled.
IronPython First Impressions
I'm very impressed with IronPython, which is fortunate as I'll be working with it for a while.
It "just works". 99% of the Resolver code-base is in Python, with just a smidgen of C# in the test-code. We're using the .NET framework , but entirely from the Python side. That means that although I'm using .NET objects I've had to learn very little about .NET semantics so far.
The Resolver team tell me that they have only occasionally found bugs in IP. When something is confirmed as a bug, they (we) work around it and put in a regression test so we can tell when it is fixed. Meanwhile, the IP team are on beta 6 (and swear they will get to 1.0 final before they hit beta 10) and fixing bugs at a rapid pace.
There isn't an os module with IronPython . Instead there is an nt module with the following members :
['O_APPEND','O_BINARY', 'O_CREAT', 'O_EXCL', 'O_NOINHERIT', 'O_RANDOM','O_RDONLY', 'O_RDWR,' 'O_SEQUENTIAL', 'O_SHORT_LIVED', 'O_TEMPORARY', 'O_TEXT','O_TRUNC', 'O_WRONLY', '_exit', 'chdir', 'close', 'environ', 'error','fdopen', 'getcwd', 'listdir', 'mkdir', 'open', 'remove', 'rmdir', 'spawnl','startfile', 'stat', 'stat_result', 'unlink', 'waitpid']
Despite this, great swathes of the standard library do work, including PyUnit which we're not yet using for our unit tests.
Unfortunately for some of the experiments I've been trying, I have found the following restrictions :
marshal.loads can't unmarshal a code object contained in a Python 2.4 byte-code file.
marshal.dumps can't marshal a code object.
imp.new_module returns None
As imp is a builtin module, this last one was particularly surprising.
It is however baffling that the IP homepage www.IronPython.com is so out of date.
|||Which is also surprisingly good to work with.|
|||UPDATE: If you add the Python 2.4 standard libraries to sys.path then you do get access to the normal Python os module.|
In one of my projects I've been looking at creating a parser for parts of the Python language. This is the first time I've done anything serious with parsers, other than handcoding my own. It's also involved grappling with grammars and attempting to understand the cryptic Python Language Reference.
As always it gets easier after a while, and even nearly makes sense. I've been using the excellent PLY to create the parser from grammar rules. By the way, PLY works with IronPython; which is nice. The Python grammar rules don't always easily translate into rules acceptable to PLY: much head scratching ensues. I still can't get precedence working in any understandable way.
Along the way I've encountered a couple of language oddities.
Firstly, string conversion with backticks is defined as :
"`" expression_list "`"
The definition for an expression_list allows a trailing comma. However this doesn't work in string conversion (thankfully) :
`(1 + 2),`File "<input>", line 1`(1 + 2),`^SyntaxError: invalid syntax
This makes writing the grammar rules easier, but it looks like the language definition or the implementation is wrong. (This expression won't even parse into an ast.)
Secondly, as far as I can tell, list comprehensions are defined by the list_maker rules, specifically list_for :
"for" expression_list "in" testlist [list_iter]
The expression_list in there suggests that, as far as the grammar is concerned, the following is syntactically correct :
[1 for 1 in iterable]
Obviously it isn't (can't assign to a literal), but I wonder why the grammar is defined in this way ?
If you use the built in parser module to parse the expression to an ast, it does indeed parse correctly. The compiler throws the SyntaxError when you try to compile it :
expr = '[1 for 1 in n]\n'
ast = parser.expr(expr)
parsetree = parser.ast2list(ast)
[258, [320, [298, [299, [300, [301, [303, [304, [305,
[306, [307, [308, [309, [310, [311, [9, '['], [312, [298,
[299, [300, [301, [303, [304, [305, [306, [307, [308, [309,
[310, [311, [2, '1']]]]]]]]]]]]]], [327, [1, 'for'], [319,
[303, [304, [305, [306, [307, [308, [309, [310, [311,
[2, '1']]]]]]]]]]], [1, 'in'], [321, [298, [299, [300,
[301, [303, [304, [305, [306, [307, [308, [309, [310, [311,
[1, 'n']]]]]]]]]]]]]]]]], [10, ']']]]]]]]]]]]]]]], [4, ''],
[4, ''], [0, '']]
Traceback (most recent call last):
File "ast_example.py", line 6, in ?
SyntaxError: can't assign to literal
I guess it's always possible to construct valid parse trees that can't compile, but wouldn't a new node identifier_list have been more appropriate here ? (Since it is only identifiers and not expressions that can be the target of the assignment in a list comprehension).
New Job, New House, No Internet
A hurried entry before work. I'm still offline at home. Yes, great surprise, NTL managed to screw up switching the internet connection on. I even paid for a "managed installation" so that this wouldn't happen, but when the guy came yesterday he fitted the cable and just left a CD; no managed installation. Obviously it doesn't work.
The other members of the team are all intelligent, but friendly with no trace of arrogance. This isn't that uncommon in the world of geekdom, but is refreshing nonetheless. I join about six months into the project, about six months before the first beta; so this is definitely no vapourware (a lot of functionality is already implemented), but nor can I tell you what we're working on.
The particular aspects of XP that we use include pair programming, test driven development (we have a 3:1 test code to application code ratio) and user stories. It's extremely refreshing to be able to be 99% certain that any changes you have made don't break anything else.
We work in two week iterations. Every fortnight our customer representative chooses which user stories are the highest priority for the coming iteration. He can spend (based on our estimate of how long each task will take) the number of pair days we have in the current iteration times our velocity (accumulated average of estimates/reality). Got all that ?
It's a very good way of working, and we also apply other XP concepts like YAGNI. So far the whole experience has been stretching and positive. It is great to finally be talk to other people about Python issues and be understood.
This work is licensed under a Creative Commons Attribution-Share Alike 2.0 License.