Akismet - The Python API

Stopping Comment Spam with Akismet

Author: Michael Foord
Contact: fuzzyman@voidspace.org.uk
Version: 0.2.0
Date: 2009/06/18
License:BSD License [1]
Online Docs:akismet.py online

The Akismet API

Note

As of version 0.2.0 akismet.py can be used with Google AppEngine.

Introduction

Akismet is a web service for recognising spam comments. It promises to be almost 100% effective at catching comment spam. They say that currently 81% of all comments submitted to them are spam.

It's designed to work with the Wordpress Blog Tool, but it's not restricted to that - so this is a Python interface to the Akismet API.

You'll need a Wordpress Key to use it. This script will allow you to plug akismet into any CGI script or web application, and there are full docs in the code. It's extremely easy to use, because the folks at akismet have implemented a nice and straightforward REST API.

Note

If possible your program should inform Akismet of false positives and false negatives.

Informing Akismet helps makes the service more reliable. Smile

To do this, use the submit-spam and submit-ham functionality.

Most of the work is done by the comment-check function.

Downloading

You can download akismet.py from :

This contains the docs, akismet.py, and a test CGI called test_akismet.py.

apikey.txt

The easiest way to use akismet.py is to provide the wordpress key and blog URL in a text file called apikey.txt.

This should be in the current directory when you create your Akismet instance (or call setAPIKey with no arguments).

The format for apikey.txt is simple (see the example one in the distribution.

Lines that start with a # are comments. The first non-blank, non-comment line should be the API key. The second line should be the blog URL (or application URL) to use.

# Lines starting with '#' are comments
# The first non-blank, non-comment, line should be your api key
# The second your blog URL
#
# You can get a wordpress API key from http://wordpress.com/
some_key
some_blog_url

The Akismet Class

The akismet API provides four functions. akismet.py gives you access to all of these through a single class.

The four akismet functions are :

In addition to these, the Akismet class has the following user methods and attributes :

Creating an Instance

Akismet(key=None, blog_url=None, agent=None)

To use the akismet web service you need an API key. There are three ways of telling the Akismet class what this is.

  1. When you create a new Akismet instance you can pass in the API key and blog url.
  2. If you don't pass in a key, it will automatically look for apikey.txt in the current directory, and attempt to load it.
  3. You can set the key and blog_url Attributes manually, after creating your instance.

User-Agent

As well as setting your key, you ought to pass in a string for Akismet to create a User-Agent header with. This is the agent argument.

According to the API docs, this ought to be in the form :

Program Name/Version

akismet.py adds it's version number to this, to create a User-Agent in the form liked by akismet.

The default User-Agent (if you don't pass in a values to agent) is:

Python Interface by Fuzzyman/0.2.0 | akismet.py/0.2.0

Example 1

#example 1
api = Akismet(api_key, url, agent='Example/0.1')

Example 2

#example 2
if os.path.isfile('apikey.txt'):
    api = Akismet(agent='Example/0.2')

# The key and URL are loaded from
# 'apikey.txt'

Example 3

#example 3
url = 'http://www.voidspace.org.uk/cgi-bin/voidspace/guestbook.py'
api_key = '0acdfg1fr'
if not os.path.isfile('apikey.txt'):
    api = Akismet(agent='Example/0.3')
    api.key = api_key
    api.blog_url = url

# The key and URL are set manually

setAPIKey

setAPIKey(key=None, blog_url=None)

Set the wordpress API key for all transactions.

If you don't specify an explicit API key and blog_url it will attempt to load them from a file called apikey.txt in the current directory.

This method is usually called automatically when you create a new Akismet instance.

Akismet Methods

These four methods equate to the four functions of the Akismet API.

verify-key

verify_key()

This equates to the verify-key call against the akismet API.

It returns True if the key is valid.

The docs state that your program ought to call this at the start of the transaction.

It raises APIKeyError if you have not yet set an API key.

If the connection to akismet fails, it allows the normal HTTPError or URLError to be raised. (akismet.py uses urllib2)

comment-check

comment_check(comment, data=None, build_data=True, DEBUG=False)

This is the main function in the Akismet API. It checks comments.

It returns True for spam and False for ham.

If you set DEBUG=True then it will return the text of the response, instead of the True or False object.

It raises APIKeyError if you have not yet set an API key.

If the connection to Akismet fails then the HTTPError or URLError will be propogated.

As a minimum it requires the body of the comment. This is the comment argument.

Akismet requires some other arguments, and allows some optional ones. The more information you give it, the more likely it is to be able to make an accurate diagnosise.

You supply these values using a mapping object (dictionary) as the data argument.

If build_data is True (the default), then akismet.py will attempt to fill in as much information as possible, using default values where necessary. This is particularly useful for programs running in a CGI environment. A lot of useful information can be supplied from evironment variables (os.environ). See below.

You only need supply values for which you don't want defaults filled in for. All values must be strings.

There are a few required values. If they are not supplied, and defaults can't be worked out, then an AkismetError is raised.

If you set build_data=False and a required value is missing an AkismetError will also be raised.

The normal values (and defaults) are as follows :

  • 'user_ip': os.environ['REMOTE_ADDR'] (*)
  • 'user_agent': os.environ['HTTP_USER_AGENT'] (*)
  • 'referrer': os.environ.get('HTTP_REFERER', 'unknown') [2]
  • 'permalink': ''
  • 'comment_type': 'comment' [3]
  • 'comment_author': ''
  • 'comment_author_email': ''
  • 'comment_author_url': ''
  • 'SERVER_ADDR': os.environ.get('SERVER_ADDR', '')
  • 'SERVER_ADMIN': os.environ.get('SERVER_ADMIN', '')
  • 'SERVER_NAME': os.environ.get('SERVER_NAME', '')
  • 'SERVER_PORT': os.environ.get('SERVER_PORT', '')
  • 'SERVER_SIGNATURE': os.environ.get('SERVER_SIGNATURE', '')
  • 'SERVER_SOFTWARE': os.environ.get('SERVER_SOFTWARE', '')
  • 'HTTP_ACCEPT': os.environ.get('HTTP_ACCEPT', '')

(*) Required values

You may supply as many additional 'HTTP_*' type values as you wish. These should correspond to the http headers sent with the request.

submit-spam

submit_spam(comment, data=None, build_data=True)

This function is used to tell akismet that a comment it marked as ham, is really spam.

It takes all the same arguments as comment_check, except for DEBUG.

submit-ham

submit_ham(self, comment, data=None, build_data=True)

This function is used to tell akismet that a comment it marked as spam, is really ham.

It takes all the same arguments as comment_check, except for DEBUG.

Error Classes

In the course of using akismet.py, there are two possible errors you could see.

AkismetError

This is for general Akismet errors. For example, if you didn't supply some of the required information.

This error is a subclass of Exception.

This error is also raised if there is a network connection error. This can happen when the Akismet service or domain goes down temporarily.

Your code should trap this and handle it appropriately (either let the comment through or push it onto a moderation queue).

APIKeyError

If apikey.txt is invalid, or you attempt to call one of the akismet methods without setting a key, you will get an APIKeyError.

This error is a subclass of AkismetError.

Usage Example

A simple example that loads the key automatically, verifies the key, and then checks a comment.

api = Akismet(agent='Test Script')
# if apikey.txt is in place,
# the key will automatically be set
# or you can call ``api.setAPIKey()``
#
if api.key is None:
    print "No 'apikey.txt' file."
elif not api.verify_key():
    print "The API key is invalid."
else:
    # data should be a dictionary of values
    # They can all be filled in with defaults
    # from a CGI environment
    if api.comment_check(comment, data):
        print 'This comment is spam.'
    else:
        print 'This comment is ham.'

Akismet Test CGI

Included in the distribution is a file called test_akismet.py.

This is a simple test CGI. It needs cgiutils to run.

When activated, it allows you to put a comment in and test it with akismet. It will tell you if the comment is marked as ham, or spam.

To confirm that your setup is working; any post with viagra-test-123 as the name, should be marked as spam.

Obviously you will need an API key for this to work.

You can try this online at :


TODO

Make the timeout adjustable ?

Should we fill in a default value for permalink ?

What about automatically filling in the 'HTTP_*' values from os.environ ?

CHANGELOG

2009/06/18 Version 0.2.0

If 'blog' is not in the data dictionary passed to comment_check it will be added even if build_data is False. Thanks to Mark Walling.

Fix for compatibility with Google AppEngine. Thanks to Matt King.

Added a setup.py.

2007/02/05 Version 0.1.5

Fixed a typo/bug in submit_ham. Thanks to Ian Ozsvald for pointing this out.

2006/12/13 Version 0.1.4

Akismet now traps errors in connections. If there is a network error it raises an AkismetError.

This can happen when the Akismet service or domain goes down temporarily.

Your code should trap this and handle it appropriately (either let the comment through or push it onto a moderation queue).

2006/07/18 Version 0.1.3

Add the blog url to the data. Bugfix thanks to James Bennett.

2005/12/04 Version 0.1.2

Added the build_data argument to comment_check, submit_spam, and submit_ham.

2005/12/02 Version 0.1.1

Corrected so that ham and spam are the right way round Smile

2005/12/01 Version 0.1.0

Test version.

Footnotes

[1]Online at http://www.voidspace.org.uk/python/license.shtml
[2]Note the spelling "referrer". This is a required value by the akismet api - however, referrer information is not always supplied by the browser or server. In fact the HTTP protocol forbids relying on referrer information for functionality in programs.
[3]The API docs state that this value can be " blank, comment, trackback, pingback, or a made up value like 'registration' ".