Python Programming, news on the Voidspace Python Projects and all things techie.
A Little Bit of Python Episode 4: A Pre-PyCon Special
A Little Bit of Python is an occasional podcast on Python related topics with myself, Brett Cannon, Jesse Noller, Steve Holden and Andrew Kuchling.
The website is in progress and apparently nearly ready, thanks to Jesse and various other people who we will thank as soon as it is done. In the meantime, episode 4 is out. PyCon 2010 is only ten days away and it is the highlight of the year for many of us in the Python community. This episode is a pre-PyCon special where we discuss some of the things that will be happening at the conference and how to get the best out of it.
General links for the podcast feeds and a webpage with an embedded flash player:
- A Little Bit of Python mp3 rss feed
- A Little Bit of Python m4a rss feed
- Podcast homepage (currently redirecting to a temporary home)
If you have feedback, insults or suggestions for new topics you can email us on: all@bitofpython.com. We don't yet have the podcast listed on iTunes; we'll set that up once our permanent online home goes live.
We do have a twitter account, so for news on new episodes follow @bitofpython. A Little Bit of Python is also syndicated on Hacker Public Radio (although they've only released episode one so far).
Like this post? Digg it or Del.icio.us it.
Posted by Fuzzyman on 2010-02-08 00:13:48 | |
Categories: Python, Fun Tags: podcast, pycon, bitofpython, conference
ConfigObj 4.7.1 (and how to test warnings)
I hate doing releases. I haven't managed to automate the whole process (I should probably work on that), although setup.py sdist upload certainly helps. Anyway, the short version of the story (and the real reason I hate releases) is that was a bug in ConfigObj 4.7.0. 4.7.1 is a brown paper bag release to fix the bug
The bug was an error in the way I had setup the deprecation warning for the obsolete options dictionary in the ConfigObj constructor. The bug only affects you if you were still using the options dictionary to configure ConfigObj instances.
If you've never heard of ConfigObj now is an ideal time to try it out. ConfigObj is a simple but powerful config file reader and writer: an ini file round tripper. Its main feature is that it is very easy to use, with a straightforward programmer's interface and a simple syntax for config files. It has lots of other features though :
Nested sections (subsections), to any level
List values
Multiple line values
Full Unicode support
String interpolation (substitution)
Integrated with a powerful validation system
- including automatic type checking/conversion
- and allowing default values
- repeated sections
All comments in the file are preserved
The order of keys/sections is preserved
Powerful unrepr mode for storing/retrieving Python data-types
The reason the bug happened is because not only did I not even try the deprecation warning to make sure it worked, let alone add a test for it. It turns out that adding a test for warnings is easy using the catch_warnings context manager, new in Python 2.6.
from warnings import catch_warnings
with catch_warnings(record=True) as log:
ConfigObj(options={})
# unpack the only member of log
warning, = log
self.assertEqual(warning.category, DeprecationWarning)
Like this post? Digg it or Del.icio.us it.
Posted by Fuzzyman on 2010-02-07 23:52:41 | |
Categories: Python, Projects Tags: release, configobj, configuration, ini
Discover 0.3.2 and the load_tests protocol
discover is a test discovery module for the standard library unittest test framework. Test discovery is built into unittest in Python 2.7 and 3.2. The discover module is a back-port of test discovery to work with Python 2.4 - 2.6 and Python 3.0 / 3.1 [1].
3.2 is a minor bugfix release. Test discovery also includes a new protocol called load_tests. In previous versions the standard tests would be passed in as a list instead of a TestSuite instance. This bug is now fixed both in untitest and in discover.
discover can be installed with pip or easy_install. After installing switch the current directory to the top level directory of your project and run:
python -m discover python discover.py
This will discover all tests (with certain restrictions) from the current directory. The discover module has several options to control its behavior (full usage options are displayed with python -m discover -h). See the documentation on the PyPI homepage for details.
The load_tests protocol is interesting. It allows you to customize how tests are loaded from a module by defining a load_tests function. load_tests takes three arguments, the test loader, the standard set of tests for that module (allowing you to just add tests to the standard set if you want). The third argument is only used for load_tests functions in the __init__.py of test packages.
Here's an example of a test module with two test classes. The load_tests function returns a test suite that only uses one of the test classes:
import unittest
class FirstTest(unittest.TestCase):
def testFoo(self):
self.fail()
class SecondTest(unittest.TestCase):
def testFoo(self):
pass
def load_tests(loader, tests, _):
return loader.loadTestsFromTestCase(SecondTest)
When the test module is loaded by discover, or unittest from Python 2.7 / 3.2, then load_tests will be called to create the test suite for the module.
| [1] | discover.py is only about 300 lines of Python. Supporting 2.x and 3.x in a single code-base is easy with small modules but I wouldn't recommend it for larger projects. |
Like this post? Digg it or Del.icio.us it.
Posted by Fuzzyman on 2010-02-07 23:25:47 | |
Categories: Python, Projects, Tools Tags: release, testing, unittest
Interview with Pardus Linux
I recently did an interview on Python with the Pardus Linux magazine. Pardus Linux is a distribution developed in Turkey (by the Turkish National Research Institute of Electronics and Cryptology) with the goal of being usable by "normal" people rather than just geeks.
Pardus are great supporters and users of Python. A while ago they chose Python as their standard language for custom package and configuration management tools.
The magazine is Turkish, so here's the original in English. It covers topics like why use Python?, choosing a GUI toolkit, the move from Python 2 to 3 and whether there will ever be a Python compiler. The Turkish version is a very nicely produced PDF with waaay too many pictures of me, and even one of my wife.
Who is Michael Foord?
I've been developing with Python since 2002. I've written many articles on Python and maintain several open source projects. In 2006 I started working with a firm called Resolver Systems developing a highly programmable spreadsheet application with the then young IronPython. I wrote the book IronPython in Action with a colleague and was made the first Microsoft MVP (Most Valued Professional) for dynamic languages.
In 2009 I became a core Python developer, Python Software Foundation (PSF) member and part of the Python.org webmaster team. I've also been involved in helping to organise both PyCon in the US and PyCon UK (which in 2009 and 2010 is hosting EuroPython).
I'm now a freelance developer currently working with a German firm called Comsulting. We're developing web applications with Django, deployed on Linux of course, but with the client side written in IronPython on Silverlight.
I blog and write a great deal about Python and other technical topics.
In the real world I live in Northampton, in the UK, with my wife Delia.
Which department of Python development are you working?
I mainly help maintain the standard library, and in particular maintain the testing framework unittest. I've also helped with making parts of the standard library compatible with IronPython.
Why Python? What's the main idea of development of Python?
Python is a widely used programming language with a large(ish) code base. Parts of the code, in particular parts of the standard library are quite old. There are generally good tests, which are getting better, but we still get a lot of feature requests and bug reports. There is always more than enough work, and for me it is a privilege to be able to contribute back to the open source project that has given so much to me.
On a more self-interested note I am a big fan of the unittest style of testing. However, other testing libraries like nose and py.test have advanced the state of the art of testing in Python a great deal whilst unittest has stagnated. It has been great to get some of the tried and tested features from other testing libraries, like test discovery, into the standard library.
In the last few years Python has evolved a great deal, in particular the release of Python 3 which deliberately broke backwards compatibility with the Python 2 series to fix a number of longstanding 'warts' with the language. Python 3 is a great release, and comes with the full Python standard library, but third party libraries and the community are only gradually migrating to Python 3.
The current versions of Python are 2.6 and 3.1, with both Python 2.7 and 3.2 releases imminent.
Even without Python 3 a number of powerful new language features have been added in recent releases. From decorators to generators (and in particular the expansion of generators to provide coroutine functionality in Python 2.5) to conditional expressions, the with statement, abstract base classes and the introduction of new libraries like multiprocessing.
All of this means that the Python community and developers need a chance to adapt to the new features. This is the reason for the Python Language Moratorium (PEP 3003) that mandates no new language features for the next two years. For the developers this means a chance to focus on the standard library rather than adding new language features.
Why did you choose Python?
I kind of got into Python by accident. I had done some programming, including assembly language on the Amiga, up to my time at university but I didn't really use computers for a while after that. A bunch of friends and I were playing a "Play By Email" strategy game called Atlantis and wanted to write a program to automate turns.
Our one constraint on programming language was that there be an implementation for the PocketPC, which was the only computer I had at home at the time. We had more or less decided on Squeak (which would have been a fine choice), but at the last minute someone suggested Python and we went with it.
I really enjoyed programming, and in particular Python, and before long had given up the game but got seriously hooked on Python.
Why should people choose Python?
As it happened Python was a fortuitous choice for me. Python is easy to learn for people new to the language, but is a fully fledged programming language suitable for large application development or small tasks like scripting.
Python is particularly flexible because of the range of programming paradigms and styles it supports. The interactive interpreter (the REPL) makes experimentation easy and fun, and many people do serious work at the interpreter.
New programmers can start with simple scripts, not even needing to write functions if they aren't necessary, moving up through procedural programming to full blown object oriented programming, functional programming and even metaprogramming. With first class namespaces, types and functions all these different programming paradigms are available to the Python programmer.
Python has a rich standard library (it comes "batteries included") that makes it ready to use for a wide range of tasks straight out of the box. There is also a thriving community with a healthy ecosystem of third party libraries and extensions.
The Python language and community extol readability and clarity as the highest virtues in programming. The indentation based block structure, which some people hate but which most people come to appreciate very quickly, is just one way that Python encourages an easy to read programming style.
Comparing with other programming languages, what advantages and disadvantages does Python have?
Python is a highly expressive, dynamically typed language. It is much more productive to use Python than a lower level systems programming language. As one example the codebase of Mercurial (a distributed version control system written in Python) is about 1/10th the size of the git codebase (with comparable features but written in C, Perl and shell scripts). Even when compared to statically typed languages like Java and C#, Python's type system and language features mean that development will typically be faster and result in a smaller code base.
The trade-off is that dynamically typed languages tend to be slower than statically typed languages. On the other hand if you complete the initial implementation faster then you have more time to work on performance optimisation. This really has been my experience - there is much more to be gained in optimising your algorithms than there is from using a different language.
Those who are used to statically typed languages may mourn the passing of type safety in dynamically typed languages, but really type safety is an extremely thin layer of safety. Just because a program compiles doesn't tell you anything about the correctness of the application logic for example. (Otherwise there would never be any bugs in applications written in statically typed languages.) The only way to be serious about the correctness of code is extensive testing (personally I'm an advocate of Test Driven Development, whichever language you are developing in). It just so happens that dynamically typed languages are particularly easy to test because you can create and modify types at runtime and don't have to fight against the compiler. There are many powerful testing libraries and tools available for Python.
How do you comment of Python's rate of usage by people?
Python is very widely used, and has risen in popularity a great deal in the last few years. I think in part the growth of Python recently can be attributed to the publicity around Ruby and Ruby on Rails. Suddenly dynamic languages have become not just acceptable but actually trendy and exciting.
However Python use has been growing in many different areas, not just web development.
Since I first started using Python in 2002 it has become virtually the de-facto administration language for Linux. Pardus is one example of this, Ubuntu is another, where both distributions do everything they can with Python. The Gentoo packaging system, portage, is written in Python and there are plenty of other examples.
Python gets used for science and research, both because it is easy for scientists (who are not primarily programmers) to learn and use and also because of powerful libraries (often wrapping libraries written in C, C++ or Fortran) like Numpy, Scipy, BioPython, matplotlib, NLTK (Natural Language Toolkit) and so on.
Python is also the standard embedded in both the GIS (Geographic Information Systems) and CGI (Computer Generated Imagery) worlds. Large animation houses like Pixar, Imageworks and Industrial Light & Magic all use Python. Applications like Blender and Maya embed Python for scripting.
Python is used for hardware automation, testing systems, application development, web development and just about anything else you can think of. Python is a great general purpose programming language.
Have you ever think to develop an official compiler to create binary files? Or does Python have already?
Well, Python is a highly dynamic language so needs the runtime. It is already a bytecode compiled language but is generally described as an interpreted language because the runtime interprets the bytecode.
Other bytecode compiled languages, like Java and C#, are usually described as compiled languages because they have a Just-In-Time compiler (JIT) that generates and executes native code instead of interpreting bytecode.
There are a few projects to bring a JIT to Python. The most ambitious one is PyPy, an interpreter compiler toolchain that can compile dynamic language interpreters from source code in a static-subset of Python called RPython (Restricted Python). The project also includes a full implementation of the Python language in RPython (plus a smalltalk and prologic implementations). It has native, IL (Intermediate Language - for execution on the .NET and Mono implementations of the Common Language Infrastructure specification by Microsoft) and JVM (Java Virtual Machine) backends - so that a single implementation of a language can be used on all three platforms.
The exciting, and slightly crazy, part of the project is that it also includes a JIT compiler. The PyPy toolchain generates a tracing JIT for language interpreters it compiles. For a long while this was hypothetical and experimental but after several years work there are now lots of benchmarks that run faster on PyPy than they do on CPython (CPython is the standard implementation of Python written in C).
Note
This interview was written before the announcement that Unladen Swallow is aiming to merge with Python 3.
Another project is Unladen-Swallow, sponsored by Google who use Python a lot and have a commercial interest in seeing it run faster. Unladen Swallow uses the LLVM (Low Level Virtual Machine) project to add a JIT to Python. The project includes at least two Python core developers and they see it as a branch rather than a fork; their goal is to merge the performance enhancements back into core Python. The latest releases of Unladen Swallow do show speed improvements over CPython. LLVM is written in C++, and is not brilliantly tested on Windows, both of which could be 'issues' for merging back into Python - so we'll see...
There is also Psyco, which is a specializing compiler that can be bolted-on to CPython. For numeric code in particular it offers massive speed improvements. That project stagnated for a long time, just about maintained but not being developed, but has recently seen a lot of improvements thanks to the work of Christian Tismer.
There are other approaches to creating binary distributions of applications. These typically involve bundling the Python runtime and all the libraries you use together. On Windows a tool called py2exe will do this for you and on the Mac there is py2app. For Linux there is a tool called cx_Freeze and at least one other whose name escapes me at the moment.
Another approach to creating binary applications written in Python is to use a hybrid language like Cython, Pyrex or Shedskin. Shedskin is a static subset of Python that compiles to C++. If you can live without some of the dynamic features of Python you get a compatible syntax, a big performance boost, and compiled binaries. Cython and Pyrex are more often used for writing Python extensions. They are a combination of the Python language with C. You can use Python syntax and semantics mixed and matched with C where you need the speed.
What advantages Python 3 have comparing with Python 2?
The big advantage is that strings and bytes are fully disentangled. In Python 2 we have two string types, the bytestring and Unicode strings. Although it is possible to handle text 'properly' with Python 2 you have to be strict about decoding and encoding at the application boundaries and it is much easier to do-the-wrong-thing and treat text as binary data. If you mix the approaches in Python 2 (sometimes using Unicode and sometimes treating text as binary data) then you're in for a world of hurt (encoding errors in odd places). Python 3 cleans all this up and all strings are Unicode. There is a new bytes type for when you are dealing with binary data. There is a new IO layer in Python 3 to support this.
This alone makes it worth switching to Python 3 in my opinion, and this change couldn't be made without breaking backwards compatibility as it is such a fundamental change.
There are lots of other language changes, there is now only one integer type (unification of the int and long types), comparisons and sorting can only be done with compatible types (the arbitrary ordering of different types in Python 2 didn't really make sense), lots of built-in methods return iterators instead of lists and so on.
The new nonlocal statement is one of my favourite additions in Python 3. It adds true lexical scoping to Python (for changing variables in outer scopes - Python 2 has had lexical scoping for accessing values for a long time). There are also interesting features like function annotations that allow for attaching extra information to function / method signatures. This could be used for enforcing type contracts for example. Programmers are only just beginning to explore some of the things Python 3 makes possible.
The main advantage Python 3 has is that it is the future. Python 2 won't be around for ever and new features and libraries will be added to Python 3 that are never available in Python 2.
It's known that Python 3 doesn't support some of Python 2 codes? Is it really needed?
Unfortunately yes. Some of the changes, particularly the string / bytes changes, just couldn't be introduced in a backwards compatible way. Honestly if it had been possible it would have been done.
What important changes wait for us in future versions of Pyton 2 and Python 3?
One of my favourite changes are all the improvements to the unittest library that come in 2.7 / 3.2.
In Python 2.7 we are trying to restrict new features to ones that are already present in Python 3. This will make the transition between Python 2 and Python 3 easier. In Python 2.7 this means, for example, the addition of the ordered dictionary to the standard library that was added in Python 3.1.
Because of the language moratorium there are very few language changes coming in Python 3.2. There are plenty of other improvements though, including a big improvement to the way floating point numbers are represented (by Mark Dickinson), substantial performance improvements to the Global Interpreter Lock (by Antoine Pitrou), the with statement works with multiple context managers, many bug fixes and so on.
For future development I'd like to see a focus on improving the standard library. In particular although the whole standard library has been converted to work with Python 3 (at least to the point where all the tests pass), a lot of code that previously worked with strings may need careful auditing to see whether it would be better off working with bytes.
When is the end of life time of Python 2?
It looks likely that Python 2.7 will be the last major version of Python. This isn't set in stone, but maintaining the two branches of Python in parallel is a strain on the developers and most of them would rather just be working on Python 3.
(It isn't just two branches - we have Python 2.6 being maintained and Python 2.7 being developed, plus Python 3.1 being maintained and 3.2 being developed. That means bug fixes currently have to be pushed to four branches.)
Even if Python 2.7 is the last major release it will be maintained for several years to come, but the end of life is in sight.
Which GUI (Qt, GTK, etc.) do you prefer while programming with Python?
I've used both Tkinter (bundled as part of the Python standard library) and wxPython in the past. Qt has a very good reputation but the license for using PyQt on Windows was until recently only available under the GPL license or a commercial license. That made it a difficult choice for cross-platform development.
wxPython is ok. It is greatly improved using it from Python with a wrapper library like Dabo that gives it a more Python-like feel.
To be honest my recent GUI development work has all been with Windows Forms for Windows, using IronPython. Surprisingly enough this has a reasonably good API and looks very good on Windows. The Mono implementation of Windows Forms is extensive, but it looks pretty bad by default on the Mac (my main development machine is a Mac). I've seen some very good looking Windows Forms applications run on the Mac with Mono, so I know it's possible I just never learned the right magic tricks.
If the cross-platform license situation improves with PyQt then I would be tempted to use that for future projects. It looks like this is happening in the form of the PySide project, sponsored by Nokia who own the Qt library.
What about Tkinter? Is there any plan about developing Tkinter?
Tkinter looks pretty quirky! It also feels quirky to develop for, but if you just need something simple it is nice to use something that is already in the standard library.
Tkinter is based on Tk, part of the Tcl language. Tk itself has moved on a lot and in Tcl 8.5(?) a fancy new user-interface layer called Ttk became the standard. This is much better looking and has been incorporated into Tkinter in Python 2.7 and 3.2 as the Ttk module.
I've seen screenshots of some very good looking applications written with Ttk, but I have no idea how much magic you have to use to get attractive, native looking UIs or whether you get them for 'free'.
What do you think about Free Software?
I'm a huge fan of free software and very committed to the open source movement. I'm not an enemy of commercial software though. I think it is a good thing that programmers are able to earn a living selling programs. I don't see commercial software and free software as enemies, they can be very good for each other. The development of commercial software, and the money it brings in, sponsors a lot of open source development.
If you use any GNU/Linux distribution, could you say what it is?
I've used Linux on servers regularly. Most servers I've worked with have tended to be Debian based.
Have you ever heard of Pardus GNU/Linux?
I've certainly heard of Pardus, but mainly because of your use and support of Python.
It's known that Python's license is compatible with GPL. Why do you need that?
Well... personally that doesn't make much difference to me. I like, and tend to use for my own projects, the BSD license as they are "more free" than the GPL. On the other hand it would be a major downer if the Python license was incompatible with the GPL and GPL projects couldn't use Python. From that perspective it is important that the Python license is compatible with the GPL.
What's the meaning of developing open source implementation of the Python programming language which is tightly integrated with the .NET Framework according to you?
I see IronPython as a great way of getting developers currently using the Microsoft .NET ecosystem into using free software. IronPython, despite being developed by Microsoft, is fully free software. It's a small step from using IronPython to using Python.
Opposing closed code softwares, free/open source alternatives are written. Wouldn't it be good if an open-source or free alternative of .NET is developed instead of developing IronPython?
There is a very good free implementation of .NET called Mono - and IronPython runs on that too. IronPython is developed by Microsoft and so in no way does the development of IronPython take away from work on Mono.
What's your favorite free software written in Python?
Ha, actually quite a difficult question. Not many of the applications I use regularly are written in Python - except for a couple that aren't free software (the Wing IDE and the Resolver One spreadsheet application). Django is my current favourite free software library written in Python. Developing web applications with Django is great fun.
What do you want to say our readers?
Thank you for the opportunity to speak to you, and thank you for reading what I have to say.
I've recently become involved in a Python related podcast called "A Little Bit of Python". Definitely worth listening to if you're interested in Python.
Like this post? Digg it or Del.icio.us it.
Posted by Fuzzyman on 2010-02-02 10:28:52 | |
Categories: Python, Writing Tags: interview, Pardus, Linux, Turkish
A Little Bit of Python Episode 3
A Little Bit of Python is an occasional podcast on Python related topics with myself, Brett Cannon, Jesse Noller, Steve Holden and Andrew Kuchling. We still don't have our own website although that is due to land any day now. Meanwhile episode 3 has just gone live. The topics covered include the Python transition to using Mercurial, the release of the first alphas of Python 2.7 and the furore caused by comments on the Python Package Index.
General links for the podcast feeds and a webpage with an embedded flash player:
- A Little Bit of Python mp3 rss feed
- A Little Bit of Python m4a rss feed
- Temporary webpage home for A Little Bit of Python
If you have feedback, insults or suggestions for new topics you can email us on: all@bitofpython.com.
We don't yet have the podcast listed on iTunes; we'll set that up as soon as we have a permanent home for the podcast.
Like this post? Digg it or Del.icio.us it.
Psyco 2 Binaries for Windows and Python 2.4, 2.5 and 2.6
Pysco is a specializing compiler (a kind of JIT) for Python written by Armin Rigo. The difficulty of maintaining and extending psyco was one of the motivating factors behind the inception of PyPy. Psyco itself was maintained to remain compatible with recent versions of Python but still didn't optimise more recent features like Python generators and floats.
Development of psyco was recently taken over by Christian Tismer. Christian made many improvements to Psyco, but Windows binaries were never released. Available here are compiled binaries of psyco 2 for Windows for Python 2.4, 2.5 and 2.6.
- pysco 2.0.0 for 32bit Windows and Python 2.6 (.zip)
- pysco 2.0.0 for 32bit Windows and Python 2.5 (.zip)
- pysco 2.0.0 for 32bit Windows and Python 2.4 (.zip)
(Compiled from SVN head on January 11th 2010.)
From the official psyco project page on Sourceforge, prebuilt binaries are only available for Python 2.5. I've also built binaries for psyco 1.6 for Python 2.4 and 2.6:
Binaries for Python 2.2 and 2.3 (pysco 1.5.1) are available from my Python modules page.
Like this post? Digg it or Del.icio.us it.
Posted by Fuzzyman on 2010-01-12 00:16:10 | |
Categories: Python, Website Tags: psyco, binaries, download, windows
A Rambling Recording on Member Lookup in Python (podcast)
I was thinking about the Python object model, in part as a result of my post on The Python Class Statement. Python is a really easy language to learn, but it also has advanced features like its protocols, descriptors and metaclasses, that make the full object model pretty complex - and that's before you start looking at the corner cases.
It would be really nice to write up a single document describing the Python object model, including all of its intricacies. That sounds too much like hard work, so instead I recorded a rambling hand-wavy description of member lookup in Python. I don't go into full blown detail, but then this is a podcast - it won't seriously mislead you and no-one is going to use it as a reference guide...
- Python Attribute Lookup Part 1 on Audioboo
- Python Attribute Lookup Part 2 on Audioboo
- Python Attribute Lookup in full mp3 (9minutes 8MB)
This was recorded using the Blue Fire iPhone app whilst I was wandering around outside. I chopped out about half my pauses and coughing using Audacity, so if you think the quality is rough you should have heard the first version.
Topics covered include:
- Member lookup on instances and classes
- How the interpreter looks up protocol ('magic') methods
- __getattr__ and its mysterious cousin __getattribute__
- Descriptors, bound methods, properties and friends
In the podcast I mention the new technique I have for dynamically mocking magic methods. Magic methods, when they are called for you by the interpreter, are usually looked up directly on the class. Unfortunately Python is not entirely consistent, some magic methods are still looked up on the instance first before the class. This is gradually being fixed in Python (in 2.7 they pretty much all fixed), but the inconsistency is a pain for mocking the magic methods.
Mock now allows you to mock the magic methods by assigning an appropriate function, that takes self as the first argument, to the magic method on the mock instance. By default mocks do not have the magic methods implemented except the ones it uses itself. When you assign to them it dynamically grows them on just that instance - all other mock instances are unaffected. Magic methods can then be looked up on the class or the instance, either way works (and you can delete them):
>>> from mock import Mock
>>> m = Mock()
>>> m
<mock.Mock object at 0x429770>
>>> m.__repr__ = lambda self: 'A Mock Object'
>>> m
A Mock Object
>>> m.__repr__()
'A Mock Object'
>>> del m.__repr__
>>> m
<mock.Mock object at 0x429770>
You can also use Mocks for magic methods. Here's an example of mocking out the built-in open function when used as a context manager:
@patch('__builtin__.open')
def test_with_statement(self, mock_open):
mock_open.__enter__ = Mock()
mock_open.__exit__ = Mock()
mock_open.__exit__.return_value = False
with open('filename') as handle:
handle.read()
mock_open.assert_called_with('filename')
mock.__enter__.assert_called_with()
mock.__enter__.return_value.read.assert_called_with()
mock.__exit__.assert_called_with(None, None, None)
The version of mock with magic method support hasn't yet been released, but you can pull it from the google code SVN repo. When I have time to write docs it will be released as 0.7.0.
There's a bit of trickery involved in making this work. If you're interested in how it's done look at the implementation of __new__ and __setattr__.
Like this post? Digg it or Del.icio.us it.
Posted by Fuzzyman on 2010-01-10 20:25:22 | |
Categories: Python, Projects, Hacking Tags: mocking, testing, podcast, magic methods
Notes on the Python Class Statement
Python classes are created at runtime, usually when you execute a script, or import the module they are defined in. Class creation is done primarily with the class statement. The class statement is executed by the Python runtime to create the class. Functions and names assigned in the body of the class statement become methods and attributes of the class.
You can easily see that the code inside the body of the class is executed, and that it can contain arbitrary code, by putting a print statement inside the class body:
>>> class ClassName(object):
... print 'hello world'
...
hello world
Any assignments that happen in the body of the class definition create class members. Class and function definitions both cause names to be assigned, so classes defined inside the body of another class statement can be accessed as class attributes and functions defined inside the body of a class become methods.
Here's a trivial example with simply assigning a value to the name X:
>>> class SomeClass(object):
... X = 3
...
>>> SomeClass.X
3
We can combine the fact that arbitrary code is executed with the assignment rule to conditionally define class members:
>>> import sys
>>> class SomeClass(object):
... if sys.platform == 'darwin':
... X = 3
... else:
... X = 4
...
>>> SomeClass.X
3
What happens in class creation (in Python 2 - the rules change slightly in Python 3 as the metaclass mechanism is improved) is that the class body is executed, the collection of names and values are passed as a dictionary (along with the class name and a tuple of the base classes) to the metaclass which is 'called' (if the metaclass is a type - which it usually is - the metaclass is instantiated) and the resulting class object is assigned to the name in the scope in which it was defined. The resulting class is an object like everything else in Python. Unless the class uses __slots__ the dictionary of members becomes the class __dict__. This dictionary is protected by being wrapped in a dictproxy. Although you can fetch members directly from the dictproxy you can't directly assign or delete members, instead you have to go through the normal attribute setting / deleting mechanisms:
>>> class SomeClass(object):
... X = 3
...
>>> SomeClass.__dict__
<dictproxy object at 0x50b7b0>
>>> SomeClass.__dict__.keys()
['__dict__', 'X', '__module__', '__weakref__', '__doc__']
>>> SomeClass.__dict__['X']
3
>>> SomeClass.__dict__['Y'] = 4
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'dictproxy' object does not support item assignment
>>> del SomeClass.__dict__['X']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'dictproxy' object does not support item deletion
>>> SomeClass.Y = 4
>>> del SomeClass.X
>>> # X has now gone from the __dict__ and Y appeared
>>> SomeClass.__dict__.keys()
['__module__', 'Y', '__dict__', '__weakref__', '__doc__']
An interesting example of assignment creating class members is what happens when you put a list comprehension inside a class body. An implementation detail of list comprehensions is that variables used in the list comprehension 'leak' into their surrounding scope. A list comprehension in a class body creates an unexpected class member:
>>> class SomeClass(object):
... [foo for foo in (1, 2, 3)]
...
>>> SomeClass.foo
3
The same isn't true of generator expressions where the variable doesn't leak:
>>> class AnotherClass(object):
... list(bar for bar in (1, 2, 3))
>>> AnotherClass.bar
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'AnotherClass' has no attribute 'bar'
The variable leaking from list comprehensions is a side-effect and should not be relied on.
Whilst the code in the class statement is being executed it creates a temporary namespace. Code can refer to names already assigned as if they were local variables.
>>> class SomeClass(object):
... X = 3
... b = [a * X for a in (1, 2, 3)]
...
>>> SomeClass.b
[3, 6, 9]
A common use for this is to create aliases, where you give the same member two or more names. In this example cost is an alias to the calculate_price method:
>>> class SomeClass(object):
... def calculate_price(self, quantity):
... return quantity * 10.0
... cost = calculate_price
...
>>> instance = SomeClass()
>>> instance.calculate_price(20)
200.0
>>> instance.cost(20)
200.0
It is also the standard way of creating properties before Python 2.6:
>>> class SomeClass(object):
... _value = None
... def get(self):
... return self._value
... def set(self, value):
... self._value = value
... value = property(get, set)
...
The value property is created using the get and set functions from the scope that forms the class members.
Unfortunately we have a problem with generator expressions. Generator expressions create their own scope, causing names to be looked up lexically and ignoring the temporary class scope.
>>> class AnotherClass(object):
... X = 3
... b = list(a * X for a in (1, 2, 3))
...
Traceback (most recent call last):
File "<stdin>", line 3, in AnotherClass
File "<stdin>", line 3, in <genexpr>
NameError: global name 'x' is not defined
If you're interested in how metaclasses are involved in class creation then you should read: Metaclasses in five minutes. (Hopefully readable even for non-gurus.)
An interesting reference on why the class statement in Python contains executable code is this article by Guido van Rossum, the creator of Python: How Everything Became an Executable Statement.
Like this post? Digg it or Del.icio.us it.
Posted by Fuzzyman on 2010-01-10 13:49:13 | |
Categories: Python, Hacking Tags: language, classes, objects
Release: ConfigObj 4.7.0 and validate 1.0.1
I've just released ConfigObj 4.7.0 and validate 1.0.1. ConfigObj is an easy to use configuration file reader and writer module. ConfigObj is easy to use, with a host of powerful features including:
Nested sections (subsections), to any level
List values
Multiple line values
String interpolation (substitution)
Integrated with a powerful validation system
- including automatic type checking/conversion
- repeated sections
- and allowing default values
When writing out config files, ConfigObj preserves all comments and the order of members and sections
Many useful methods and options for working with configuration files (like the 'reload' method)
Full Unicode support
The headline feature of 4.7.0 is an ~25% performance improvement in reading and validating configuration files. The cost of this performance improvement is losing compatibility with Python 2.2 [1]. Python 2.3 is now the minimum version of Python supported by ConfigObj.
The next new feature is the addition of the extra_values section attribute and the get_extra_values function. extra_values is populated by validation and lists all config file members (in that section) that weren't specified in the configspec file. get_extra_values is a convenience function that returns a list of all values and sections that appear in the config file but aren't in the configspec.
Another important change is that passing in arguments to the ConfigObj constructor via an options dictionary is not possible any more. This was a hangover from previous versions and if you were using it you can just change you code to the following (that works with older versions of ConfigObj as well):
config = ConfigObj(filename, **options)
4.7.0 also brings in a minor change in syntax. Previously spurious commas in list values would be ignored (e.g. value = first, , second). They are now invalid syntax.
There have been several other minor bugfixes. See the list below for details.
All Changes in Version 4.7.0
- Minimum supported version of Python is now 2.3
- ~25% performance improvement thanks to Christian Heimes
- String interpolation now works in list value members
- After validation any additional entries not in the configspec are listed in the extra_values section member
- Addition of the get_extra_values function for finding all extra values in a validated ConfigObj instance
- Deprecated the use of the options dictionary in the ConfigObj constructor and added explicit keyword arguments instead. Use **options if you want to initialise a ConfigObj instance from a dictionary
- Constructing a ConfigObj from an existing ConfigObj instance now preserves the order of values and sections from the original instance in the new one
- BUGFIX: Checks that failed validation would not populate default_values and restore_default_value() wouldn't work for those entries
- BUGFIX: clear() now clears 'defaults'
- BUGFIX: empty values in list values were accidentally valid syntax. They now raise a ParseError. e.g. "value = 1, , 2"
- BUGFIX: Change to the result of a call to validate when preserve_errors is True. Previously sections where all values failed validation would return False for the section rather than preserving the errors. False will now only be returned for a section if it is missing
- Distribution includes version 1.0.1 of validate.py
- Removed __revision__ and __docformat__
The only change in version 1.0.1 of validate was removing a test whose behaviour was dependent on platform / Python version and was impossible to make pass consistently.
| [1] | Some of the performance improvements come from using the in operator instead of has_key, hence losing compatibility with Python 2.2. The rest of the improvements come from short-circuiting interpolation checks if the config member doesn't contain the characters being used for interpolation. |
Like this post? Digg it or Del.icio.us it.
Posted by Fuzzyman on 2010-01-10 00:14:15 | |
Categories: Python, Projects Tags: release, configobj, validate
PyCrypto 2.1.0 Binaries for Windows 32bit Python 2.6, 2.5 and 2.4
PyCrypto is a Python cryptography package originally created by Andrew Kuchling and now maintained by Dwayne C. Litzenberger. For a while I've been hosting Windows binaries for version 2.0.1. Dwayne has now done a new release, version 2.1.0.
I've built installers for 32 bit Windows for Python 2.6, 2.5 and 2.4. The 2.6 installer was built with Visual Studio 2008. The Python 2.5 and 2.4 installers were built with Visual Studio .NET 2003. They were built without GMP support (only needed for Crypto.PublicKey._fastmath). The installers come without guarantees, but all the tests pass.
- PyCrypto 2.1.0 for Python 2.6 (.zip)
- PyCrypto 2.1.0 for Python 2.5 (.zip)
- PyCrypto 2.1.0 for Python 2.4 (.zip)
- PyCrypto 2.1.0 Announcement and Release Notes
- PyCrypto Homepage
- Windows Binaries Download (for 2.0.1 & 2.1)
Like this post? Digg it or Del.icio.us it.
Posted by Fuzzyman on 2010-01-09 15:57:29 | |
Categories: Python, Website Tags: PyCrypto, release, windows, download, crypto, binaries
Fun with Unicode, Latin-1 and a C1 Control Code
Unicode is a rabbit-warren of complexity; almost fractal in nature, the more you learn about it the more complexity you discover. Anyway, all that aside you can have great fun (i.e. pain) with fairly basic situations even if you are trying to do the right thing.
This particular problem was encountered by Stephan Mitt, one of my colleagues at Comsulting. I helped him find the solution, and with a bit of digging (and some help from #python-dev) worked out why it was happening.
We receive data from customers as CSV files that need importing into a web application. The CSV files are received in latin-1 encoding and we decode and then iterate over them to process a line at a time. Unfortunately the data from the customers included some \x85 characters, which were breaking the CSV parsing.
One of the problems with the latin-1 encoding is that it uses all 256 bytes, so it is never possible to detect badly encoded data. Arbitrary binary data will always successfully decode:
>>> data = ''.join(chr(x) for x in range(256))
>>> data.decode('latin-1')
u'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f...'
If you iterate over a standard file object in Python 2 (i.e. one that reads data as bytestrings) then you iterate over it a line at a time. This splits lines on carriage returns (\x0D) and line feeds (\x0A). If you're on Windows then the sequence \x0D\x0A (CRLF) signifies a new line. If you're trying to do-the-right-thing, and decode your data to Unicode before treating it as text, then you might use code a bit like the following to read it:
import codecs
handle = codecs.open(filename, 'r', encoding='latin-1')
for line in handle:
...
This was the cause of our problem. When decoding using latin-1 \x85 is transcoded to u'\x85', which Unicode treats as a line break. So if your source data has \x85 embedded in it, and you are splitting on lines, where the lines break will be different depending on if you are using byte-strings or Unicode strings:
>>> d = 'foo\x85bar'
>>> d.split()
['foo\x85bar']
>>> u = d.decode('latin-1')
>>> u
u'foo\x85bar'
>>> u.split()
[u'foo','bar']
This could still be a pitfall in Python 3, where all strings are Unicode, particularly if you are porting an application from Python 2 to Python 3. Suddenly your data will behave differently when you treat it as Unicode. The answer is to do the split manually, specifying which character to use as a line break.
The problem isn't restricted to \x85. The Unicode spec on newlines shows us why. \x85 is referred to by the acronym NEL, which is a C1 Control Code: NEL Next Line Equivalent to CR+LF. Used to mark end-of-line on some IBM mainframes.
In fact NEL belongs to a general class of characters known as Paragraph Separators (Category B). This category includes the characters \x1C, \x1D, \x1E, \x0D, \x0A and \x85. Splitting on lines will split on any of these characters, which may not be what you expect. It certainly wasn't what we expected.
For us the solution was simple; we just strip out any occurence of \x85 in the binary data before decoding.
Note
Marius Gedminas suggests that the data is probably encoded as Windows 1252 rather than Latin-1. He is probably right.
There are some interesting notes on Unicode line breaks in this Python bug report: What is an ASCII linebreak?.
Like this post? Digg it or Del.icio.us it.
Posted by Fuzzyman on 2010-01-07 12:42:27 | |
Categories: Python, Work, Hacking Tags: Unicode, latin-1, encoding
Archives
Counter...

