A Very Brief Introduction to Python

And its Data-Types

An Image Vaguely Related to Programming

Follow my exploration of living a spiritual life and finding the kingdom at Unpolished Musings.

Introduction

The Python programming language is an Open Source, cross-platform, high level, dynamic, interpreted [1] language.

The Python 'philosophy' emphasises readability, clarity and simplicity, whilst maximising the power and expressiveness available to the programmer. The ultimate compliment to a Python programmer is not that his code is clever, but that it is elegant. For these reasons Python is an excellent 'first language', while still being a powerful tool in the hands of the seasoned and cynical programmer.

Python is a very flexible language. It is widely used for many different purposes. Typical uses include :

  • Web application programming with frameworks like Zope, Django and Turbogears
  • System administration tasks via simple scripts
  • Desktop applications using GUI toolkits like Tkinter or wxPython (and recently Windows Forms and IronPython)
  • Creating windows applications, using the Pywin32 extension for full windows integration and possibly Py2exe to create standalone programs
  • Scientific research using packages like Scipy and Matplotlib

This is a brief introduction to the Python and its basic datatypes. It may someday become a full Python tutorial.

If you have any specific questions to do with Python programming, you should try the Python Newsgroup, which is generally a friendly place.

For further resources, try the following sites :

The Interactive Interpreter

You can download Python for almost every platform from the Python website. Linux and Mac OS X will almost certainly already have Python installed, but it may not be the latest version.

We recommend using the latest stable version of Python, which at the time of writing is Python 2.5.

After installing Python, launch the interpreter. On a Linux platform you should be able to do this by typing python at a command prompt. On windows you should double click on the Python icon.

This will launch the interactive interpreter. It will look something like this :

Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

The interactive interpreter allows you to type in individual Python commands, and see the results straight away. It is a great way to experiment, but also a useful tool.

Type x = 3, then hit enter, followed by print x followed by enter. You should see the following results in your interpreter 'session' :

Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 3
>>> print x
3
>>>

Creating and Running Programs

In order to be a useful language we have to be able to create programs and run them.

Python programs are just text files that contain instructions for the Python interpreter. You can create them with any text editor (including notepad), or a programmers text editor called an IDE: Integrated Development Environment. IDEs for Python will often have useful built in tools like syntax checkers, debuggers, code browsers and much more.

Most Python installs include a simple IDE called IDLE, which is very handy.

Comments

Any line which starts with a hash (#) is a comment. It will be skipped by the interpreter. There is no multiline comment in Python, you should start every line with a hash instead.

Comments in code are good (so long as you keep them up to date), but nothing like so important as making your code readable. Smile

Variables

In the interactive interpreter session we actually created a very simple Python program. We created a variable called 'x' and associated it with the value 3, the number. We then printed this value to the screen.

In almost any programming language, the 'variable' is the most basic concept. Variables are the way that we can store values and work with them. We might process these values and save them to a file, or perhaps send them across the internet, but variables are the way we keep track of them.

The line of code x = 3 is a statement which means it does a job. The job it does is to assign the value 3 to the variable 'x'. In Python terminology we say that it binds the name 'x' to the number three. This idea of variables being names bound to objects (or names which 'reference' objects) is important in Python.

A variable name in Python can be almost anything, so long as it obeys certain rules :

  • It can only contain ascii characters, digits and underscores (no spaces or other symbols)
  • It cannot start with a digit
  • It cannot be any words from a list of reserved keywords.

Basic Variable Types

Just as variables are a basic concept in programming, so are the different types of values that a variable can have. We call these 'data-types', because they are different types of ways we can store data. In programming the word 'type' has a specific meaning. It is a precise way of categorising different sorts of objects. Every object has a 'type'.

The most basic data types are listed below.

It is often possible to convert between these different types (for example converting the text '10' into the integer value 10), but that is a subject for another day.

Numbers

  • Integers

    An integer is a whole number. In theory any whole number from minus infinity to plus infinity. In practise the range is limited by the computer, but with Python the maximum size is limited only by the amount of memory your computer has !

    'Under the hood' Python actually uses two types of integer, the integer and the 'long integer' for integers bigger than a certain amount. Occasionally you may care whether a number is an integer or a long integer, but in practise it will rarely matter.

  • Floating Point Numbers

    Computers are very good at dealing with whole numbers. This is why the 'integer' is a basic data-type. A 'Floating point number' is any number that has digits after the decimal point. For example 6.3 or -1.8.

    You can also store your whole numbers as floating point numbers, just include a decimal point when you write it: 1.0 or -6.0. Because dealing with floating point numbers is much more complicated for computers they are much slower to work with. If you know your numbers will be integers then use the integer type.

    You can also write floating point numbers using scientific notation: 1.75e26 (which is a very big number).

    If you need to a lot of complicated maths with floating point numbers, then you may be interested in extensions like numpy or General Multiprecision Python project. These extensions provide very fast ways of doing mathematical operations. For normal use the capabilities built in to Python will be sufficient.

Warning

In current versions of Python, dividing an integer by an integer will return an integer. You can get unexpected results like :

>>> 7/2
3

To get the answer you expect, one of the numbers must be a floating point number :

>>> 7.0/2
3.5

There is also a complex number basic datatype in Python. They aren't very common, but can be extremely useful for certain types of maths. They are written :

3i + 4j

Strings

A string is any piece of text. Python does not distinguish between a string that is a single character or a large block of text. Because of the many ways that Python provides for working with strings, it is an ideal language to do text processing with.

The name 'string' probably comes from the fact that programming languages often treat text as a sequence of characters 'stringed' together.

Python has two different string-types.

The basic string type (which I have called 'normal strings' below) store the text as a sequence of bytes (numbers) with one byte per character. These should only really be used if you know your text will only have ASCII values in it. Unfortunately they are terribly convenient to use, so many programs just use this sort of string. They can also be used for storing binary data in. They are sometimes called byte-strings, which is quite a good description of what they do.

Unicode strings store text internally using the unicode standard. They are slightly more complicated, because you must know the 'encoding' the text is stored in whenever you read the text in or save it out. In the long run this can save a great deal of confusion.

Below is an overview of the two different string types, and how to include them in your programs.

  • Normal Strings

    You include a string in your program by surrounding it with single quotes or double quotes :

    x = 'This is a string'
    y = "And so is this."

    If you want to include a string in your code (a 'string literal') which has line breaks in it, then you can use triple quotes. The quotes can again be single quotes or double quotes :

    x = """This string
    goes over multiple lines.
    So it includes line breaks."""


    y = '''This string uses single quotes.
    But it is still multi-line.
    As you might have guessed.'''

    If you need to include special characters (like a line breaks or a quote character) in a string then you can use escaped characters, shown on this page.

    Python follows the Unix convention of using the backslash as the escape character. For example :

    # The variable x is a string which contains
    # a single line break
    x = '\n'

    # The variable y has an escaped quote character
    # To include the quote in the string (instead of
    # it terminating the string)
    y = 'Don\'t worry about quotes'

    # z is a single character - a backslash which has to be escaped
    # with another backslash
    z = '\\'

    If you want to be able to include backslashes in your strings without having to escape them, you can use a 'raw string'. If you precede the string literal in your code with an r it is a raw string. This is particularly useful for regular expressions which tend to use a lot of backslashes.

    raw_string = r'a raw string may contain unescaped \s in it'

    Because raw strings are designed to be useful for regular expressions, and not say Windows file paths, they cannot end with a backslash. The backslash will be interpreted as escaping the quote that follows it, as well as being included in the string. Annoying and not very useful in my opinion, but Python can't be too perfect. Razz

    Python strings are stored internally using numbers, they are not encoded using any encoding, the interpreter just receives what you give it and stores it as a sequence of bytes. For this reason basic strings are often referred to as 'byte-strings'. You can also use them to store binary data.

    Because they are stored as bytes, your string literals (in your program) should only really contain ASCII characters. You can include non-ASCII by declaring an encoding at the start of your program, but you should only do this if you know what you are doing.

    If you intend to use (or allow your users to use) non-ASCII characters in your program, then storing strings without specifying an encoding is asking for trouble. Unless you are very careful you will probably end up treating text incorrectly and garbling your data. It is far better to use unicode strings, which are explained briefly below.

  • Unicode Strings

    As mentioned above, 'normal strings' in Python are byte-strings and store data as a sequence of numbers. This can be very convenient for ASCII text, but as soon as you use non-ASCII characters is a recipe trouble.

    For a good 'how-to' on using Unicode with Python, read A Crash Course in Character Encodings.

    Unicode strings have an internal representation which keeps text as text. Effectively the computer knows which characters your text compares. You will still have to choose an encoding when you read or write your data.

    You can include unicode literals in your program by preceding the string with a u. Unless you specify the encoding at the start of your program then you should still only use ASCII characters, but you know your string will be stored as unicode rather than a byte string.

    The quoting and escaping rules for unicode strings are the same as for byte strings :

    x = u'A unicode string'

    y = u'''A multi-line unicode
    string.'''


    y = ru"""This is a
    raw, multi-line
    unicode string !!!
    """

    Forcing the Python interpreter to convert between unicode and byte-strings without specifying an explicit encoding can be done, but is just begging for a UnicodeEncodeError or UnicodeDecodeError to bring your program to a 'crashing' halt. This happens when one of the strings that you are 'implicitly' converting, for example by adding a unicode string to a byte string, has characters which can't be represented in the system default encoding (which will usually be ASCII). The article mentioned above (A Crash Course in Character Encodings) explains when this can happen and how to avoid it.

Python strings, like the other built-in types, have lots of useful methods for doing string-type-things with them. Refer to the string methods in the Python docs for a full list.

Other Useful Values

There are a few other useful values which can be very handy.

  • None: Often written null in other languages, it is usually used to represent 'nothing'.
  • True: The 'boolean' value of true.
  • False: The boolean value of false.

These values turn up a great deal when programming. You can include them in your program by writing them literally :

x = None
y = True
z = False

More Complex Data Types

In order to work with information, you will often want to group data together and work on it in that group. Python contains lots of powerful concepts for doing this, including fancy things like iterators, generators and list comprehensions.

The most fundamental way of grouping data together is to use 'data-structures'. These are objects that store your values together. With Python you are free to create your own custom data structures, but it has four very powerful ones built in. Objects that store data for you are called containers.

All these four data-types (the list, the tuple, the set and the dictionary) have lots of different operations you can perform on them. Many of these are built-in to them and are called 'methods'. This introduction will not cover these, instead it will just introduce you to them and how you include them in your programs.

The List

A list stores a group of items one after the other. It keeps them in order for you. You can add or remove items as you wish, sort your list or search through it to see if it contains a specific value.

You can start with an empty list and add things to it, or use a pre-populated list. Items in your list can be any object, including other lists. This means that you can have lists of lists, and use them to create complex nested structures. They are a very useful fundamental building block.

A list is written using square brackets, with commas between the items :

# w is an empty list
w = []

# x has only one member
x = [3.0]

# y is a list of numbers
y = [1, 2, 3, 4, 5]

# z is a list of strings
# and other lists
z = ['first', [], 'second', [1, 2, 3, 4], 'third']

We access the members of a list by 'indexing' into it. The index must be an integer representing the position of item. The first item is in position zero. You can index a variable by placing square brackets after the variable with the index in.

If you try to access a position that does not exist (for example the fifth item of a list with only four members), you will get an IndexError.

You can access from the end of the list, by using negative indexes. Position -1 is the last item, -2 is the one before the last etc.

>>> x = [1, 2, 3, 4, 5]
>>> print x[0]
1
>>> print x[4]
5
>>> print x[-1]
5
>>> print x[-5]
1
>>> print x[10]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IndexError: list index out of range

Lists have lots of methods for working with them. They also have special syntax for working on parts of a list. This is called slicing. See the page on sequence types for details.

Tuples

Tuples are a lot like lists in that they are used to store sequences of information. Unlike lists they can't be altered after they have been created.

Because the interpreter knows the length and types of the members of a tuple, working with them can be more efficient than using a list.

You can index and slice tuples, but not change them.

You come across tuples 'implicitly' in many places in Python, for example functions that return multiple values return them as a tuple.

Tuples are created using parentheses instead of square brackets :

>>> x = (1, 2, 3,4)
>>> x
(1, 2, 3, 4)
>>> x[0]
1
>>> x = (1,)
>>> x
(1,)
>>> x[0]
1
>>> x = ()
>>> x
()
>>> x[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IndexError: tuple index out of range

You may have noticed that to create a tuple with only one member you need to add a comma into the mix: (1,). (This: (1) isn't a tuple.) To create an empty tuple (there must be a reason why you'd want to do that sometime), do this (), or even this: tuple(). Smile

By convention in Python, tuples are used when the elements are heterogeneous (different types of objects), lists are used for homogeneous collections. This is a distinction observed more in theory than in practise.

Dictionaries

Dictionaries are an extremely useful and flexible datatype. In fact Python is built on them. Smile

Dictionaries store values with keys, so you can look them up with the key. You can use them to build complex nested data-structures or just for storing values.

Like the other data-types we've looked at so far, Python has built in syntax for creating dictionaries :

>>> x = {'key1': 'value1', 'key2': 'value2'}
>>> print x
{'key2': 'value2', 'key1': 'value1'}

You can retrieve the values by indexing the dictionary, using the same syntax as for list; except using the key instead of the position. (Dictionaries are unordered by the way.)

>>> x = {'key1': 'value1', 'key2': 'value2'}
>>> print x['key1']
value1

You don't have to use strings as keys, you can use any object that doesn't change (any 'immutable' object). This means that strings and numbers are in and dictionaries and lists are out. Smile

This is another place where tuples come in handy.

>>> attemptedKey = {}
>>> x = {attemptedKey: 'some value to store'}
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: dict objects are unhashable

>>> anotherTry = (None, 2)
>>> x = {anotherTry: 'some value to store'}
>>> print x
{(None, 2): 'some value to store'}

See the Mapping Types Reference for a list of methods available on dictionary objects.

Sets

The set is an extremely useful data type. It was introduced in Python 2.3 and became a built-in in Python 2.4.

Sets contain unique members. They are very efficient if you want to test if a value is in the set (test for membership). It is a lot faster than checking if an object is in a list.

Unlike the other data-types we've seen, there is no syntax for using them. To create a set, you use the set constructor. See below :

>>> x = set()
>>> x
set([])
>>> x.add(1)
>>> x
set([1])
>>> x.add(1)
>>> x
set([1])
>>> x.add(2)
>>> 1 in x
True
>>> 2 in x
True
>>> 3 in x
False
>>>
[1]Programs are actually compiled to byte-code before being executed, but the byte-code is then interpreted. In some ways this is similar to Java or .NET which also compile to byte-code. Because these languages are statically typed with JIT compilers, they are generally considered as compiled languages whilst Python is generally considered as being interpreted.

For buying techie books, science fiction, computer hardware or the latest gadgets: visit The Voidspace Amazon Store.

Hosted by Webfaction

Return to Top

Page rendered with rest2web the Site Builder

Last edited Tue Aug 2 00:51:34 2011.

Counter...