Python Programming, news on the Voidspace Python Projects and all things techie.
Python Style Guide
My Python style guide was originally posted as a blog entry in April 2006, just before I lost my amateur status and became a professional programmer. I recently saw it referred to in a newsgroup posting and revisited (and updated) it. After nearly two years my programming style tastes have changed surprisingly little.
Python-System: Implementing .NET Libraries for CPython
This is an early announcement of a new project: Python-System, an implementation of the BCL and other .NET libraries in pure Python. Before you decide I'm completely mad, let me explain my motivation.
The goal here is to provide libraries (the BCL and other .NET libraries) to aid the porting of IronPython code to run on CPython. Specifically, I would like to get Resolver One spreadsheets, exported as code, to run under CPython . I'm opening it up because it may also be useful to other people, and maybe you'll help me.
I will be starting with partial implementations of Array, DateTime, Color, Point and friends (the 'low hanging fruit'). The goal is for 'compatible but not necessarily complete' implementations of the classes (etc) that I need. Code that is useful for other people will happily be added of course.
All there is checked in so far is a test framework and an empty System namespace package. The nice thing about this project is that I can make worthwhile steps in half an hour of coding and work on it as and when I have time.
|||Eventually you will be able to embed Resolver spreadsheets in other Python applications as well as .NET applications.|
Garbage Collection, Strings and Newlines
Today I was working with Christian on an optimisation and memory use story. Those are usually fun as they mean poking around inside the guts of Resolver and thinking about data-structures and algorithms. One of the things we looked at was an out of memory error with a very large spreadsheet. The cause was this innocent enough looking code to normalise newlines in some generated code:
Obviously this code ensures that all newlines in the text use '\r\n' rather than '\n' (as is required for properly displaying in the windows code editor control that is part of Resolver One). It is called on all the code sections that make up a Resolver spreadsheet. This includes the constants and formatting section, which in a large spreadsheet can be substantial.
.NET (and therefore IronPython) uses garbage collection rather than reference counting. Strings are immutable, so every step in the double replace above creates a new string - meaning that this line temporarily triples the memory usage of the string. I don't believe this would happen in CPython, because reference counting would cause the intermediate strings to be freed immediately within the operation. On the .NET framework they aren't freed until garbage collection runs, probably when the current frame of execution exits . This was the cause of out of memory errors in large spreadsheets.
We replaced it with the following code using regular expressions. It uses the regex r'(^|[^\r])\n' to match 'lonely newlines'. Unfortunately re.sub, won't find overlapping matches which means it will miss multiple newlines next to each other. To get round this we use the replace function pattern for calling re.sub:
LONELY_NEWLINES = re.compile(r'(^|[^\r])\n')
# Replacement is slightly complicated by the fact that re.sub only finds
# *non-overlapping* matches. To solve this, we add an extra \r if there's
# another lonely \n directly after this one.
firstChar = match.group(1)
if match.end() < len(text) and text[match.end()] == "\n":
return firstChar + "\r\n\r"
return firstChar + "\r\n"
if text.count('\n') != text.count('\r\n'):
text = LONELY_NEWLINES.sub(ReplaceLonelyNewlines, text)
It's an interesting tale, but it seems like a complex solution to a simple problem. It is also slightly slower than a simple double replace, but we can get a performance win by only invoking it if we know that there are lonely newlines in the text.
If you have an idea for a better way, then let me know.
Orestis (who will be joining us at Resolver in about two weeks) provided a much nicer solution using 'lookbehind' regular expression syntax.
>>> re.sub(r, '\r\n', input_string)
Dino has also confirmed that .NET garbage collection can run at any time (including in the middle of operations), but in our case it clearly wasn't.
|||Before you get too excited about having another reason to use CPython rather than IronPython, consider the following fact that I wasn't aware of until recently. The disadvantage of reference counting is that cycles are hard to free. Python has a cycle detector, but it is unable to free cycles where more than one of the objects implements __del__ - since the cycle detector doesn't know what order to call the destructors in to break the cycle. This means that you shouldn't implement __del__ unless you can guarantee that your objects won't be involved in cycles (or you are happy to leak memory).|
ConfigObj 4.5.0 and validate 0.3.0 Release Candidate
It's been nearly a year since the last release of ConfigObj and validate. In that time ConfigObj has become more widely used. The latest large project to use ConfigObj is IPython. Fernando Perez has developed a module called 'TConfig' that combines ConfigObj with enthought Traits: tconfig.
Additionally there have been a few bug reports and suggestions for feature enhancements. The result is this release candidate for ConfigObj 4.5.0 and validate 0.3.0:
As this release will fix several bugs, and has efficiency improvements, it will be a recommended release. Please check it out and report any problems.
Changes in ConfigObj 4.5.0:
- ConfigObj will now guarantee that files will be written terminated with a newline.
- ConfigObj will no longer attempt to import the validate module, until/unless you call ConfigObj.validate with preserve_errors=True. This makes it faster to import.
- New methods restore_default and restore_defaults. restore_default resets an entry to its default value (and returns that value). restore_defaults resets all entries to their default value. It doesn't modify entries without a default value. You must have validated a ConfigObj (which populates the default_values dictionary) before calling these methods. (Thanks to Arve Knudson)
- BUGFIX: Proper quoting of keys, values and list values that contain hashes (when writing). When list_values=False, values containing hashes are triple quoted.
- Added the reload method. This reloads a ConfigObj from file. If the filename attribute is not set then a ReloadError (a new exception inheriting from IOError) is raised.
- BUGFIX: Files are read in with 'rb' mode, so that native/non-native line endings work!
- Minor efficiency improvement in unrepr mode.
- Added missing docstrings for some overridden dictionary methods.
- Added the reset method. This restores a ConfigObj to a freshly created state.
- Removed old CHANGELOG file.
Changes in validate 0.3.0:
- Improved performance with a parse cache.
- New get_default_value method. Given a check it returns the default value (converted to the correct type) or raises a KeyError if the check doesn't specify a default.
- BUGFIX: A quoted 'None' as a default value is no longer treated as None, but as the string 'None'.
- BUGFIX: we weren't unquoting keyword arguments of length two, so an empty string didn't work as a default.
- BUGFIX: Strings no longer pass the 'is_list' check. Additionally, the list checks always return lists.
- Added 'tuple' check and corresponding 'is_tuple' function (which always returns a tuple).
- A couple of documentation bug fixes.
- Removed CHANGELOG from module.
Thanks to those who gave bug reports and made suggestions. These changes are not all properly documented yet. If no problems are found with this release candidate then I will update the docs and do a proper release.
Test Driven Development: New References
There have been some great new blog entries and projects on TDD recently:
A great blog entry, with pretty pictures, explaining why test first is such a good idea.
Behavior Driven Development Framework for Python (presumably based on Rspec for Ruby). The examples on the codeplex page don't look very compelling but there aren't many people talking about BDD and Python.
A selection of some classic anti-patterns whilst practising TDD. We've seen several of these come up in our test framework, and had to deal with them.
This is notable, not so much because of its conclusions, but because of how little research into the effectiveness of TDD has been done. The research this blog entry analyses basically concludes that there is a correlation between the number of tests written and the quality of the result, and that test first tends to lead to more tests.
Nothing to do with TDD, but there is now a website (kind of...) for the Developer Day in Galway, May 3rd: Techprechaun. This will be a .NET developer event, held in a great location, and is organised by the irrepresible Mick Lohan.
A great quote from Patrick on the ongoing debate on dynamic languages. Deeper Dynamics: Face it. The history of programming is one of gradual adoption of dynamic mechanisms.
Constructing Integers from Floats
What do you expect the following code to do?
No that's not a trick question. Would it surprise you if I said that in Python 3 it could raise an exception?
The problem is that there are many different ways of creating an integer from a float: trunc, round, floor, ceil and friends. None of these are 'specified' by passing a float to the integer constructor, but int has long behaved the same as trunc. The proposal is to make trunc (truncate) a built-in, and make that the officially blessed way of creating an integer from a floating point number. Guido says:
There is actually quite an important signal to the reader that is present when you see trunc(x) but absent when you see int(x): with trunc(x), the implication is that x is a (Real) number. With int(x), you can make no such assumption -- x could be a string, or it could be a totally different type that happens to define __int__, perhaps a custom date/time type.
Personally I see int used to create integers from floats a great deal, and I have never heard of anyone confused by this behaviour. On the other hand it seems very unintuitive that you can't create an integer from a float, when the relationship seems so natural and obvious. It won't be a problem for me if this goes in, I'll just have to change my coding style, but it seems like a mistake to me.
It looks like there is something approaching agreement that int(<float>) will now defer to trunc without having to be deprecated. This means that the behaviour will now officially be 'defined' (whereas before it was officially undefined) without causing code breakage. This is a good decision.
There is also news of a forthcoming bugfix release of Python 2.5 - 2.5.2. Martin v. Lowis will be the release manager (with Anthony Baxter missing in action?): "For 2.5.2, I would like to release a candidate on February 14, and the final version on February 21".
PyCon Schedule Up - Difficult Choices Ahead
Well the PyCon 2008 Schedule is up . My talk is on 12.10pm on Sunday, which means I will spend the first two days worrying about my talk and probably miss the other talks on Sunday in panicked last minute preparation.
My talk is at the same time as "Pylons and TurboGears: Working together on the web (Mark Ramm)" and "Introducing Agile Testing Techniques to the OLPC Project (Dr. Titus Brown)", both of which I would have liked to see. With a schedule this packed there are bound to be some clashes, but this year there are some real doozies!
Saturday 3.20pm: Getting started with test-driven development. (Mr. Jonathan Hartley), IronPython: The Road Ahead (Mr. Jim Hugunin),Don't call us, we'll call you: callback patterns and idioms in Python (Alex Martelli)
Ouch. It will be a shame to miss Alex's talk.
Saturday 11.35am: Core Python Containers -- Under the Hood (Mr. Raymond D Hettinger), The State of PyPy (Maciej Fijalkowski, Laura A Creighton)
PyPy is in a really interesting place right now, but if Raymond is going to talk about under the hood with Python containers then I want to hear it.
Friday 3.20pm: Jython on the Joint Strike Fighter (Mr. George F Rice), The State of Django (Adrian Holovaty)
I don't normally bother with 'case study' talks, but the Joint Strike Fighter sounds too good to miss. On the other hand Django is important to Python and I have never heard Adrian Holovaty talk.
Friday 2.45pm: Using .NET Libraries in CPython (Mr. Feihong Hsu), Rich UI Webapps with TurboGears 2 and Dojo (Mr. Kevin Dangoor)
I'm a big fan of Kevin and Turbogears, but Python.NET is a project that deserves more attention and I obviously have an interest in the use of .NET libraries with Python...
There are lots more interesting talks, and as usual the best part of the conference is the stuff that happens in the hallways. I look forward to seeing you there...
|||And I'm sure will be subject to change, yada yada...|
IronPython Interview from the TechEd Virtual Fish Bowl
I previously linked to a five minute version of this interview, but finally they have posted a longer ten minute version:
- Commercial Development with IronPython
- MP4 (iPod) Version
- WMV Low Quality Direct Link
- WMV High Quality Direct Link
The interview is about the Resolver One spreadsheet and developing with IronPython.
Resolver and Resolver Hacks Updates
Well I did it, a whole week without blogging! This is mainly because I have just moved house. Because of the move I didn't get into work until Thursday. I like going into work, not just because I enjoy my job (which I do), but so that I can grab a bleeding-edge version of Resolver One. We always keep subversion head 'releasable', so it usually has features not yet released. We're working towards a 1.0.1 release in about three weeks. This will mainly be a bugfix release (focussing on usability), but we already have at least one new feature - a dialog for setting the back colour of cells. We're also planning the new features for version 1.1 which should follow about a month after 1.0.1.
I've just added two new articles to Resolver Hacks:
Two example spreadsheets showing how to save and load values to the disk, to be reloaded with a spreadsheet. This is useful for storing the results of calculations that take a long time, so that they only need to be done once rather than every recalc or every time the spreadsheet is loaded. The first example shows how to persist individual cells and the second extends this to storing every value in a worksheet.
Virtual worksheets are worksheets that only exist in code, and are not shown in the grid. This technique allows you to create 'scratchpad' worksheets, for storing intermediate values, without exposing them to the user. Once created we can use a cellrange and the CopyRange function to copy data back to a visible worksheet.
In the meantime, several useful new pages have been added to the online Resolver One documentation:
Giles Thomas has also created another one minute screencast, this time on using the Resolver One Web Server:
My understanding is that a trial version of the server will be available for you to experiment with soon...
Next week Giles will be presenting on Resolver One at the Lang.NET Symposium.
This work is licensed under a Creative Commons Attribution-Share Alike 2.0 License.