The Future of Programming
How High Level Can We Get ?
Dynamic Languages and the Future
Recently my Grandpa asked me to explain the difference between Python and the programming language used to write Windows. I blogged my reply. My comparison of Python and C/C++ included something like :
Python is a high level language, C and C++ are low level languages.
This seemingly innocuous statement caused a couple of raised eyebrows.
First with my father, who is a real old timer (and thankfully doesn't read this blog) recalled the early days of computers. He remembers the days when assembly language was new, and considered a high level programming abstraction. For the first time programmers didn't have to use numbers (the machine code) to program, but could use symbols (like JMP for a jump instruction) that carried more obvious meaning for programmers. Using labels and relative offsets the assembler could also calculate memory locations, meaning that the programmer no longer needed to reference absolute locations.
A colleague of mine (well, my boss) brought to mind game programming not quite so far back in history. The speed intensive parts of games had to be written in assembly language, but modern computers were so fast (and had so much memory) that large parts of a game could now be written in a high level language like C, or even a really high level language like C++.
This is all a question of degree of course. A programming language is only low level or high level relative to an alternative.
In his excellent essay In the Beginning Was the Command Line, Neal Stephenson explores similar issues from the point of view of computer operating systems. He explains that operating systems are metaphors . They provide a recognisable representation of the mysterious incantations actually performed by a computer. For example, when you drag and drop a file from one location to another the computer is performing a mind-boggling number of operations. (Involving moving the disk head to the right location to read the file, adjusting the 'on disk' representation of the file system and writing a new file, determining whether it needs to perform a similar set of operations to delete the file.) It is much easier for humans to understand this in terms of dragging and dropping a file. The danger of course is that when the abstraction leaks, in other words something goes wrong, we don't understand what has really happened and can't deal with the problem.
Computer programming languages are similarly a metaphor. They bridge the gap between something the computer can understand (binary) and something that humans can understand, and are capable of crafting programs with.
Even machine code numbers are an abstraction. The real language of computers is a whirling electromagnetic dance that no human will ever speak. Perhaps the early days of programming computers by flicking switches and feeding in punched cards is closest to that dance, but it is still a level of abstraction that permitted us to communicate with our lovingly created inanimate servants.
Assembly language, which uses symbols to represent instructions is effectively one level of abstraction up from machine code. It is easier to write and understand (STOR is easier to remember than the hex number 2F for a store instruction!). There is however almost a one-to-one correlation between the code written and the instructions executed by the processor. It is low level because it is very close to the hardware.
When C was written, it was an attempt (successful) to abstract away the hardware (and instruction set) differences between different processors. Code written for one computer could be easily ported to another machine by first writing a C compiler. This led to the Unix operating system - a whole programming environment that could be ported to a new platform by creating a C compiler and compiling the Unix source code.
With C and C++ the hardware is abstracted away a bit. We still work with pointers, which refer to locations in memory of values and functions, and we also have to manage our own memory.
Python (and other languages) are another level of abstraction up; higher up and further from what the underlying processor is actually doing. We don't have to manage our own memory or declare variables, and we have a rich set of basic datatypes (like lists and dictionaries) that are built into the core language. When you do a simple action in Python, like fetch a value stored in a dictionary, the computer actually runs many instructions to fetch the value stored against the key you supply and return it as a Python object you can use.
With the Python mapping protocol (and subclassing dict or building dictionary like objects) you can define for yourself what it means to fetch a value from a dictionary. Perhaps querying a database to get the value. This means that the semantics of the code you write is abstracted up another level from what is happening under the hood. All this is at the cost of some efficiency of course.
Now whilst it is theoretically possible to write a large desktop application in assembly language it would be insanely difficult. When it came to debugging, or even worse another person trying to read your code, insanity doesn't begin to describe it. There comes a level of complexity where you need a higher level of abstraction. In order to be able to understand and visualize what the computer is doing you need a better metaphor. So the question is, in ten and twenty years time will we still regard Python as a high level language, or will it have been replaced by a new and more effective metaphor ?
The advantage of the lower level, closer to the metal, abstractions is that you have complete control over what the computer is doing. The disadvantage is that with power comes responsibility, and you have this level of control whether you want it or not. Modern computers (for some value of modern) do have an enormous amount of power. A typical desktop application may spend most of it's time waiting for user input. In this situation, it is far more important that your code is easy to write and easy to understand than that you have complete control over when memory is allocated and deallocated or what shortcuts you can take to break out of your loops early on the machine code level. When coding processor intensive tasks, like games or number crunching algorithms, it may be very important that you squeeze the maximum performance out of your hardware. Python enables this by easily (!!?!???) allowing you to write performance bottlenecks as C (or C++ or Fortran etc) extensions and integrating them relatively seamlessly with your Python code.
Modern tools and compiler tricks mean that higher level abstractions (where you hand low level control over to your tools) can actually provide better performance. For example it is said that optimising compilers are capable of producing faster assembly language than even good coders. Similarly, JIT compilers (like the one in the Common Language Runtime of the .NET framework) are purportedly pushing for "better than C" performance. So the trend is inexorably towards higher level languages and tools.
The programming environment is also changing. The two major changes I see are :
- Distributed processing where the platform a program may be running on is no longer a single machine or processor
- Virtual environments where a program is an object in a 3D virtual world
The first of these problems is a concurrency issue. Languages address this with models like threading, inter-process communication, remote object invocation libraries and the like. Some languages build concurrency support into the core language, Erlang being the obvious example. It's not yet clear that programming languages are going to need to change fundamentally to handle distributed processing. It looks like languages can evolve (and add-on libraries be created) to facilitate what is a new programming paradigm within existing languages.
In fact there is a good reason not to abstract away concurrency issues altogether. Local resources (memory and objects etc) are generally fast to access. Remote resources are typically hundreds of times slower (currently), and may not be available at all if a network path fails. A programmer should treat local and remote resources differently. An abstraction that allows us to handle them using the same techniques is fine, one which prevents us from knowing how our resources are allocated or created isn't.
Irrespective of this, Python (and other languages like Java or .NET languages), are defined in terms of syntax and semantics. They are run by virtual machines created for the platforms they run on. The implementation of the language could change dramatically without automatically requiring the language to change .
The second new paradigm above, virtual worlds, is much more interesting.
The online game Second Life provides a 3D virtual environment. You can create objects and script their behaviour. There are animals and flying cars, and all sorts of other things, that have a visual appearance and interact with their environment in a programmed way. The 3D world is like the operating system, and the objects like processes; programs. There is an Open Source operating system project called Croquet which is a 3D shared virtual world operating system which looks very interesting.
This is all but a foreshadow of what is to come. Imagine (if you will...) a massive shared virtual world of the kind envisaged by Neal Stephenson in Snowcrash or William Gibson in Neuromancer. Very complex systems operate and interact in this environment. The programs that form their 'real' existence have to understand data in the same way we see things, and respond to changes (like movement) forced on them by their environment.
Computer operating systems become a better metaphor for the user as they better resemble the everyday world. The more recognisable the elements of the metaphor (trashcans and folders anyone ?) are, the more usable they are. For communication, entertainment and business use humans will inevitably push towards a more immersive 'virtual reality' experience.
It is possible to visualize a form of programming that is much closer to carpentry or mechanics. Virtual people assemble 'programs' (which are virtual artefacts really) from components and raw materials with certain properties, perhaps applying filters like textures and plugging in data storage blocks and defining triggers for patterns of behaviour. At last programming would become a true craft like carpentry.
This would be a much higher level of abstraction than we have previously seen in programming. There is however a serious problem with this idea, which I'm sure you've spotted. The problem is complexity and precision. For performing complex analysis on large business data sets (for example) we need to be able to precisely specify complex behaviour. What we are looking for is the appropriate metaphor that allows the most flexibility whilst providing the simplest and clearest interface to that.
I commute daily to London, about an hour train journey each way. I don't mind this as I love my job. The first thing I noticed about my early morning silent companions was the plethora of iPods. Maybe they are just more recognisable, but they seem to have captured the market. The next think that I noticed was the number reading books, good old-fashioned printed books. In fact since I've started commuting, as well as being an excellent opportunity to write blog entries, I'm reading more books than since I was a teenager. Words, whether printed or spoken, are a concrete and awesome way of communicating. One of my favourite sayings of my grandfather (the other one) was that he preferred the radio to television because the pictures were better. Words are one of the most powerful mediums of communication that humans have devised, and that isn't about to change.
I like the way Stephenson presents the Metaverse in Snowcrash. Partly because it is a world in which hackers are kings.
He describes how in a programming environment with defined laws (like the laws of physical science that largely govern nature) hackers script objects, like motorbikes. One of the sports in the young metaverse is hackers racing each other on bikes they have coded themselves. These "scripts" would be recognisable to us as computer programs, but what language would they be written in ?
The programming paradigm 'de jour' is object oriented programming. This packages data together with code relevant to the data type. So you get an object, and it can interact with other objects based on their properties, attributes and the interface they present.
It's not just me asking what paradigm will come next. In an article "The Post-OOP Paradigm", Brian Hayes asks "has OOP had it's day ?". He suggests that Aspect Oriented Programming is the next step. (You can read an extract of this article in A Mini History of Programming.)
There is another body of opinion that isn't convinced that OOP was ever necessary. Paul Graham is looking for a programming language that will last one hundred years. He argues that with forty years of computer science, we have enough experience to examine the issues and see which programming constructs and fundamental operators are enduring. His cut is a dialect of Lisp called Arc, which isn't object oriented.
A typical tutorial on Object Oriented Programming might describe classes an objects something like this.
If we wanted to categorise certain animals, say certain reptiles, we might start with a reptile class. All reptiles have certain things in common, skin type and that they are cold blooded for example. Snakes are particular types of reptiles. All snakes have characteristics that they inherit by virtue of being reptiles, they also have some characteristics that are specific to snakes. A Python is yet more specific still with features specific to its species as well as its genus.
In OOP terms reptile, snake and Python are all classes. Pythons inherit properties from their baseclass (sometimes called superclass) snake which in turn inherits from reptile. In this way we can define related hierarchies of classes which inherit from each other. This corresponds well to the biological analogy which says that reptile, snake and python are terms useful for classifying animals.
We can also have specific Pythons. In OOP terms these are individual instances of our classes. Zac and Phoebe may both Be Pythons, but they will have differences depending on the data they were initialised with (to mix the metaphors).
Object orientation is a good programming metaphor because it accords with our experience of reality. Because of this it is easy to visualize programs in terms of objects interacting. It uses patterns of thinking that we are familiar with and are ingrained in our consciousness. Perhaps this is why functional programming languages, oft claimed to be more powerful in the grasp of the right minds have never taken off. The metaphor is too alien to most people for it to fit our brains.
So if we are looking for a new paradigm, we need one that better fits our thinking and makes it easier to visualize our program models. I'm not convinced a better model will ever exist. Many who have architected medium to large projects find that visualising data types and processes as objects and object interactions makes describing systems and prototyping them much easier. Being able to abstract away the details and provide a project overview through UML modelling and the like make it easier to examine big design issues. Once again OOP matches very well the structures you create in this sort of visualisation.
So does OOP (or indeed Python) have what it takes to lead us into the future and beyond ?
We've looked at how programming languages can be categorised in levels corresponding to how close to the machine instructions they are. In a similar way individual programs are usually structured with high and low level objects. The low level objects are usually closer to carrying out the instructions from elements higher up in the program structure. As a quick example imagine an application that creates charts from complex datasets. The user clicks on a UI button asking for a chart showing sales of a product against time.
In the program this may cause a controller object to ask the model for the dataset that matches the query. This is likely to be a very simple method call asking for the correct data in the right form. The model might pass this request onto its registered data sources. These in turn might be represented by lower level objects which know how to access the data they represent. So a high level API call on one object may mean many calls filtering down the levels and involving lots of different objects. This is analagous to our brain issuing a high level 'walk' instruction. This results in hundreds of complex actions involving our motor-neurone system with corrections from our balance feedback systems. These processes (thanks to hours of tortuous practise in childhood) are almost entirely subconscious, and our higher level brain functions only need to cope with a simpler API involving walk, stop, left and right functions.
So even in an entirely new programming environment, like the 3D virtual worlds that fiction promises us, the OOP paradigm may still be appropriate. Low level objects can handle interacting with data from the environment (like the individual rods and cones in these eyes). Although the data these objects must work with is complex, they only need to be able to process it enough to provide slightly more structured information to the next objects up in the structure (like device drivers in current operating systems). Finally, astride the top layer, is the humble programmer who is now able to issue commands to his virtual motorbike like accelerate, faster, fire rockets. So who's ready to build the future, it looks fun and it looks object oriented.
|||A comparison that is itself a metaphor I suppose...|
|||The current design of Python, with its GIL can't take advantage of multiple processors on a single machine. At some point in the future that is going to be a real big issue.|
For buying techie books, science fiction, computer hardware or the latest gadgets: visit The Voidspace Amazon Store.
Last edited Tue Aug 2 00:51:34 2011.