Python Programming, news on the Voidspace Python Projects and all things techie.

Open Source Licensing and Contributions

emoticon:file1 There have been discussions on the issues of licensing and accepting contributions to open source projects on the Python-dev and the testing in Python mailing lists.

This is an area that can be very confusing, and potentially problematic for open source projects. Just because a project is licensed under a free software license doesn't automatically say anything about the status of code contributed to the project. The copyright of contributed code is owned by the person who wrote it and if you merge it with your project you create a derivative work owned jointly by all contributors. You can't license the work to others (or change the license) without the explicit permission of all those who own the copyright. The 'standard' ways round this are either to require all contributors to assign copyright to the project or to have all contributors sign an agreement licensing all their contributions to the project. The second approach is the one taken by the Python project.

This advice, applicable to Python itself, was posted to the Python-Dev mailing list by Martin von Loewis:

Van's advise is as follows:

There is no definite ruling on what constitutes "work" that is copyright-protected; estimates vary between 10 and 50 lines. Establishing a rule based on line limits is not supported by law. Formally, to be on the safe side, paperwork would be needed for any contribution (no matter how small); this is tedious and probably unnecessary, as the risk of somebody suing is small. Also, in that case, there would be a strong case for an implied license.

So his recommendation is to put the words

"By submitting a patch or bug report, you agree to license it under the Apache Software License, v. 2.0, and further agree that it may be relicensed as necessary for inclusion in Python or other downstream projects."

into the tracker; this should be sufficient for most cases. For committers, we should continue to require contributor forms.

Contributor forms can be electronic, but they need to name the parties, include a signature (including electronic), and include a company contribution agreement as necessary.

For more information on copyright and how it applies to open source development I highly recommend the book by Van Lindberg Intellectual Property and Open Source.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2009-07-02 16:59:00 | |

Categories: , Tags: ,


Exception handling and duck typing

emoticon:podcast Exceptions are one of the great features of high level languages that making coding less tedious. Instead of manually checking for possible errors and returning error codes we can use exceptions. There is a slightly tautological saying; that exceptions should only be used for exceptional circumstances. This, according to Bruce Eckel, comes from the C++ days when exception handling became mainstream. The goal of C++ was for exception handling code to have no performance impact on your code. This made actually raising exceptions much slower [1].

In fact the use of exception handling for flow control has been enshrined within Python through the use of the StopIteration exception to indicate that an iterator is exhausted. This is an implementation detail, but one that you need to be aware of if you write your own iterators or are manually advancing an iterator by calling next.

There is another idiom that is often advocated in Python; the use of exceptions for duck typing. Where you have a code path receiving an object that may be one of several types you often want to handle different objects in different ways. Rather than do strict type checking with isinstance it is more idiomatic in Python to base how you use objects on what operations they support - this is the essence of duck typing. In general there are two ways of implementing this: 'look before you leap' and 'it is better to ask forgiveness than permission'.

An example of 'look before you leap' is to check whether an object has the attribute / method you want to use:

if hasattr(something, 'methodname'):
    something.methodname()

Or an alternative that only does the attribute lookup once:

method = getattr(something, 'methodname', None)
if method is not None:
    method()

The 'better to ask forgiveness' pattern uses exception handling instead:

try:
    something.methodname()
except AttributeError:
    pass

Exception handling can also be used for flow control, for example as a convenient way of breaking out of nested for loops:

class Stop(Exception): pass

try:
    for x in range(max_x):
        for y in range(max_y):
            if match(x, y):
                raise Stop
except Stop:
    # match found
else:
    # match not found

For this use I prefer to use an inner function with an early return:

def find_match():
    for x in range(max_x):
        for y in range(max_y):
            if match(x, y):
                return x, y
result = find_match()
if result is None:
    # match not found
else:
    x, y = result

I'm not a fan of abusing exception handling for flow control or even duck typing. The main reason for this is code readability. If you see exception handling code it is natural to assume that it is there to handle exceptions. If exceptions are also used for flow control or duck typing then seeing exception handling code tells you nothing about the intent of the code and have you to read it in detail to work out what it is for. As there are usually alternatives that are just as readable I tend to prefer to only use exception handling for exceptional circumstances unless there is a compelling reason.

In Python 2.6 / 3.X the new Abstract Base Classes permit isinstance to support duck typing. This new machinery provides mechanisms like the ABCMeta metaclass, the __subclasshook__ hook, and a large number of abstract base classes in the collections module that you can inherit from. These not only signal the available behaviour of your class (classes that inherit from Set support set operations) but also provide relevant methods if you implement a core subset. At last it will be possible to distinguish between mapping and sequence types implemented as pure-Python classes! [2]

[1]A design philosophy that is still true in the Microsoft .NET framework, which is the reason why IronPython is faster than CPython for try...except blocks where an exception is not raised and much slower than CPython for try...except blocks where exceptions are raised.
[2]As the methods for indexing and deletion for both mapping and sequence types are __getitem__, __setitem__ and __delitem__ it has always been difficult to tell them apart.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2009-07-01 18:50:43 | |

Categories: Tags: , ,


Hosted by Webfaction

Counter...