Python Programming, news on the Voidspace Python Projects and all things techie.

Danger with django expression objects

emoticon:new_document I've recently been dealing with a bunch of concurrency issues in our django app. Some of the views modify database rows (model objects), which is unsurprising, but we did have a few situations where concurrent modifications could cause modifications to be overwritten (silently lost). Following blog entries will describe some of the other fun we've had with this, this entry looks at Django expression objects and one particular caveat with their use.

Given the following trivial django model definition:

from django.db import models

class Foo(models.Model):
    field = models.IntegerField()

Consider the following code:

foo = Foo.objects.get(id=some_id)
foo.field += 3
foo.save()

The problem here is that if this code is executed concurrently on the same database row by two requests then the second request could fetch the field value after the first request has fetched it, but before the change has been saved. The first request would then save its change, followed by the second request doing the same and overwriting the first.

Django provides one solution to this by allowing the change to be executed in the database which is then atomic. This is done by the F() object which uses references to fields instead of modifying the field directly.

Here's the same example using F():

from django.db.models import F
foo = Foo.objects.get(id=some_id)
foo.field = F('field') + 3
foo.save()

foo.field (the instance member) has an expression object assigned to it and calling .save() causes the expression to be evaluated in the database.

The hidden danger here is that this expression object will be evaluated every time you call save. If you have a code path that potentially makes several changes to a model instance, and saves those changes independently, then your expression can be evaluated multiple times.

This code illustrates the problem:

>>> from django.db.models import F
>>> from foo.models import Foo
>>> f = Foo.objects.create(field=3)
>>> f.field = F('field') + 27
>>> f.field
<django.db.models.expressions.ExpressionNode object at 0x...>
>>> f.save()
>>> Foo.objects.get(id=f.id).field
30
>>> f.save()
>>> Foo.objects.get(id=f.id).field
57

This happened to us when we changed a code path sometime after the switch to using these expressions. Fortunately our tests picked up on it, but it was "fun" to debug and work out where the failure was being caused.

Note

jmv comments on this entry that for similar use cases he uses the QuerySet update method in conjunction with the F() to make the updates atomic. This (effectively) saves and updates from a single call.

It doesn't work on individual model instances but on QuerySet, so the code would look like:

Foo.objects.filter(id=f.id).update(field=F('field') + 1)

This updates the queryset, so if you need the model instance with the modified value then you need to pull it out of the QuerySet after the update. As it doesn't leave the expression object on the instance it solves the problem of multiple saves.

Like this post? Digg it or Del.icio.us it.

Posted by Fuzzyman on 2011-05-06 15:58:05 | |

Categories: , Tags: , ,


Hosted by Webfaction

Counter...