Python Programming, news on the Voidspace Python Projects and all things techie.



emoticon:lighton Well... I've discovered the source of my layout problems described in the post below. I attributed the problem to the fact that IE doesn't respect min-width or max-width CSS properties. In fact it was down to another IE peculiarity. Docutils renders inline literals like this - <tt><span class="pre">this is an inline literal</span></tt>. It sets the background colour with CSS and it also has the following rule :

span.pre {
  white-space: pre }

This breaks my page layout in IE

I've already discovered that using the <pre>..</pre> tags with long lines breaks my layout. IE renders the long line on a single line (which is probably correct), which squeezes my sidebar out of existence. But even having a small inline literal in a line seems to affect the whole line. Bizarre.

Like this post? Digg it or it.

Posted by Fuzzyman on 2005-04-14 20:06:33 | |



Aaargh... Again

emoticon:html The css2 properties min-width, max-width, etc, are ignored by Internet Explorer. This means that this blog is currently difficult to read, because of the sidebar, if you're using IE. Good job only about 90% of my viewers use IE hey !

There are a few articles on the web about it. This piece of javascript promised to solve all my problems, but didn't.

This article seems to be the way to do it, so I'll have to hack around later and make it work. It's yet another browser specific hack - which is what makes web work such a pain. I might just go back to tables for my layout - which work and do what you think they do.

Speaking of javascript, I've been tinkering with it a bit recently and hope to make better use of it soon.

Like this post? Digg it or it.

Posted by Fuzzyman on 2005-04-14 12:35:57 | |

Categories: ,


Being Duped

emoticon:python There's been several implementations of duplicate file finders knocked around on comp.lang.python recently. Today I acquired some new music and needed to check which I already had. Not only was I not connected to the internet, but it's obviously far more fun to do my own implementation.

As I've already written an object that recursively represents a directory structure. I thought it would be a cinch to modify it to keep a record of all files with the same file size, and then go compare hashes for all the same sized files.

Well, it was a cinch - a neat little 4k chunk of code that compares files. It checked 70gig of files in about a minute [1]. It only hashes the first 10k... but heck - not bad for five minutes of hacking.

    import psyco

import sys, os, time, md5
from os.path import *

sizedict = {}
dupdict = {}

class CompObj(str):
    def __new__(self, dirpath, relpath='', parent=None):
        """This method overloads the __new__ method to stop our arguments confusing it."""
        return str.__new__(self, relpath)

    def __init__(self, dirpath, relpath='', parent=None):
        thisdir = join(dirpath, relpath)
        self.files = {}
        self.dirs = {}
        self.size = self.numfiles = self.numdirs = 0 = basename(thisdir)
        self.path = relpath
        self.parent = parent    # link to parent dir
        for apath in os.listdir(thisdir):
            thepath = join(thisdir, apath)
            if islink(thepath): continue
            if isdir(thepath):
                thedir = CompObj(dirpath, join(relpath, apath), self)
                self.dirs[apath] = thedir
                self.size += thedir.size
                self.numdirs += (thedir.numdirs + 1)
                self.numfiles += thedir.numfiles
                thisfile = self.makefiledict(apath, thepath, relpath)
                self.files[apath] = thisfile
                self.size += thisfile['size']
                self.numfiles += 1

        self.ctime = getctime(thisdir)
        self.mtime = getmtime(thisdir)

    def makefiledict(self, filename, filepath, relpath):
        """Form a dictionary of file attributes."""
        me = join(self, filename)
        filedict = { 'path' : relpath }
        filedict['name'] = filename
        s = filedict['size'] = getsize(filepath)
        filedict['ctime'] = getctime(filepath)
        filedict['mtime'] = getmtime(filepath)
        fileext = splitext(filename)[1].lower()
        if fileext not in ['.jpg', '.txt', '.ini']: # excluded types !          
            a = sizedict.get(s)
            if not a:
                sizedict[s] = me
                if s in dupdict:
                    dupdict[s] = [a, me]
        return filedict


logfile = 'logfile.txt'
thedir = 'F:\\music'

if __name__ == '__main__':
    start = time.time()
        log = open(logfile, 'w')

        for entry in dupdict:
            hashdict = {}
            minidupdict = {}

            for something in dupdict[entry]:
                thisfile = join(thedir, something)
                bin_data = open(thisfile).read(10000)
                m =
                a = hashdict.get(m)
                if not a:
                    hashdict[m] = something
                    if m in dupdict:
                        minidupdict[m] = [a, something]

            for member in minidupdict:
                log.write('\n\n\nHash Value : ')


    print 'That took :', str((time.time() - start))[:6], 'seconds'

Pass this code a directory in the thedir variable and it builds an object that represents the whole directory structure. It also uses a couple of dictionaries sizedict and dupdict to keep track of any with the same size.

Once it's done that it then compares md5 hash of all the first 10k of files with the same size, and writes a list to the logfile. Lovely :-)

[1]That does seem excessively quick. It may be that a lot of the disk information was cached from a previous run through. Even so, it performs well.

Like this post? Digg it or it.

Posted by Fuzzyman on 2005-04-13 23:14:17 | |

Categories: ,


Wonderful Web

Seeing as this is ReST - lets try a contents

Daily Dose

emoticon:globepage Just over a week ago I got this weblog subscribed to by PlanetPython. Now the editors of Pythonware Daily URL subscribe to PlanetPython. In that period 3 or so of my blog entries have been featured on the Python Daily URL page. Well, that's nice.... the times that's happened before I've had a couple of hundred or so extra visitors from the mention, which gives me a nice warm fuzzy feeling.

This month I've had 2782 referrals and counting.... blimey, Norah Batty with wheels on !


emoticon:dollars I'm still experimenting with ways of getting voidspace to pay for itself. I'm giving Google AdSense a try. Obviously I've had to agree to the contract terms, and promise not to ask you to click on the links on the sidebar. :-)

Fame and Fortune

emoticon:exclaim PyZine has published the next of my articles. This is a tutorial on writing CGI Applications. What's nice is that the article is one of the free ones. This means I can point people to it. The next one in the series covers a few more complex issues related to CGI - like character encodings and fun things like that.

Have a ReST

ReST is a very nice relaxed text markup language. Text marked up in ReST basically looks like nicely formatted plain text. This blog is done in ReST, I'm in the process of building a website creation tool that will allow me to store my content in ReST, and PyZine also accept article submissions in ReST. How lovely.

Like this post? Digg it or it.

Posted by Fuzzyman on 2005-04-12 08:28:24 | |

Categories: ,


Getting Emotional

emoticon:baldguy Or at least testing the emoticon macro in firedrop... woohoo, it works and it's really easy. As for what emotion it's portraying... hmmm....

Like this post? Digg it or it.

Posted by Fuzzyman on 2005-04-11 19:20:34 | |


Hosted by Webfaction