#13-09-04 # v1.1.5 # **build 3** # dataenc.py # password encoding and comparing module # The strength of the SHA hashing module - in an ascii safe, timestamped format. # uses a binary to ascii encoding # and will timestamp the encodings as well. # (Binary watermarking of data). # to be used in a CGI called 'custbase' - to check logins are correct # (For the CGI to function it needs the users password (or it's SH5 hash) to be encoded into # each page as a hidden form field. This exposes the encrypted password in the HTML source of each page. # This module provides functions to interleave a timestamp *into* the hash. # Even if the encoded 'timestamped hash' is extracted from the HTML source, the CGI can tell # that the password has expired). # Contains functions to : # do binary to ascii encoding using a TABLE mapping (and ascii to binary) # binary interleave - to disperse one set of binary data into another (e.g. as a 'watermark' or date/time stamp) # to extract the watermark again # convert a decimal value to base 64 digits, and base 64 digits back to 8 bit digits.. # Creating and retrieving a timestamp from the current time/date # functions for testing and setting bits in a byte (or larger value) # (including a bitwise operator object that comes from the python cookbook # and is no longer used here, but included for reference). # Wrapping all that together to return an ascii encoded, date stamped SHA hash of a string # Copyright Michael Foord # You are free to modify, use and relicense this code. # No warranty express or implied for the accuracy, fitness to purpose or otherwise for this code.... # Use at your own risk !!! # If you have any bug reports, questions or suggestions please contact me. # If you would like to be notified of bugfixes/updates then please contact me and I'll add you to my mailing list. # E-mail michael AT foord DOT me DOT uk # Maintained at www.voidspace.org.uk/atlantibots/pythonutils.html """ DOCS for dataenc as a module When run it should go through a few basic tests - see the function test() This module provides low-level functions to interleave two bits of data into each other and separate them. It will also encode this binary data to and from ascii - for inclusion in HTML, cookies or email transmission. It also provides high level functions to use these functions for time stamping passwords and password hashes, and also to check that a 'time-stamped hash' is both valid and unexpired. The check_pass function is interesting. Given an encoded and timestamped hash it compares it with the hash (using SD5) of a password. If it matches *and* is unexpired (you set the time limit) it returns a new encoded time stamp of the hash with the current time. I use this for secure, time limited, logins over CGI. (Could be stored in a cookie as well). (On the first login you will need to compare the password with the stored hash and use that to generate a time stamped hash to include in the page returned. Thereafter you can just use the check_pass function and include the time-stamped hash in a hidden form field for every action.) The binary data is interleaved on a 'bitwise' basis - every byte is mangled. -- CONSTANTS The main constant defined in dataenc.py is : TABLE = '_-0123456789' + \ 'abcdefghijklmnopqrstuvwxyz'+ \ 'NOPQRSTUVWXYZABCDEFGHIJKLM' TABLE should be exactly 64 printable characters long... or we'll all die horribly Obviously the same TABLE should be used for decoding as for encoding.... note - changing the order of the TABLE here can be used to change the mapping. Versions 1.1.2+ of TABLE uses only characters that are safe to pass in URLs (e.g. using the GET method for passing FORM data) OLD_TABLE is the previous encoding map used for versions of dataenc.py previous to 1.1.2 See the table_dec function for how to decode data encoded with that map. PSYCOIN = 1 This decides if we attempt to import psyco or not (the specialising compiler). Set to 0 to not import. If we attempt but fail to import psyco then this value will be set to 0. DATEIN = 1 As above but for the dateutils and time module. We need to import dateutils for the expired and pass_enc functions (amongst others) to work fully. FUNCTIONS Following are the docstrings extracted from the public functions : pass_enc(instring, indict = {}, **keywargs) Returns an ascii version of an SHA hash or a string, with the date/time stamped into it. e.g. For ascii safe storing of password hashes. It also accepts the following keyword args (or a dictionary conatining the following keys). (Keywords shown - with default values). lower = False, sha_hash = False, daynumber = None, timestamp = None, endleave = False Setting lower to True makes instring lowercase before hashing/encoding. If sha_hash is set to True then instead of the actual string passed in being encoded, it's SHA hash is encoded. (In either case the string can contain any binary data). If a daynumber is passed in then the daynumber will be encoded into the returned string. (daynumber is an integer representing the 'Julian day number' of a date - see the dateutils module). This can be used as a 'datestamp' for the generated code and you can detect anyone reusing old codes this way. If 'daynumber' is set to True then today's daynumber will automatically be used. (dateutils module required - otherwise it will be ignored). Max allowed value for daynumber is 16777215 (9th May 41222) (so daynumber can be any integer from 1 to 16777215 that you want to 'watermark' the hash with could be used as a session ID for a CGI for example). If a timestamp is passed in it should either be timestamp = True meanining use 'now'. Or it should be a tuple (HOUR, MINUTES). HOUR should be an integer 0-23 MINUTES should be an integer 0-59 The time and date stamp is *binary* interleaved, before encoding, into the data. If endleave is set to True then the timestamp is interleaved more securely. Shouldn't be necessary in practise because the stamp is so short and we subsequently encode using table_enc. If the string is long this will slow down the process - because we interleave twice. pass_dec(incode) Given a string encoded by pass_enc - it returns it decoded. It also extracts the datestamp and returns that. The return is : (instring, daynumber, timestamp) expired(daynumber, timestamp, validity) Given the length of time a password is valid for, it checks if a daynumber/timestamp tuple is still valid. validity should be an integer tuple (DAYS, HOURS, MINUTES). Returns True for valid or False for invalid. Needs the dateutils module to get the current daynumber. unexpired is an alias for expired - because it makes for better tests. (The return results from the expired function are logically the wrong way round, expired returns True if the timestamp is *not* expired..) check_pass(inhash, pswdhash, EXPIRE) Given the hash (possibly from a webpage or cookie) it checks that it is still valid and matches the password it is supposed to have. If so it returns a new hash - with the current time stamped into it. EXPIRE is a validity tuple to test for (see expired function) e.g. (0, 1, 0) means the supplied hash should be no older than 1 hour If the hash is expired it returns -1. If the pass is invalid or doesn't match the supplied pswdhash it returns False. This is a high level function that can do all your password checking and 'time-stamped hash' generation after initial login. makestamp(daynumber, timestamp) Receives a Julian daynumber (integer 1 to 16777215) and an (HOUR, MINUTES) tuple timestamp. Returns a 5 digit string of binary characters that represent that date/time. Can receive None for either or both of these arguments. The function 'daycount' in dateutils will turn a date into a daynumber. dec_datestamp(datestamp) Given a 5 character datestamp made by makestamp, it returns it as the tuple : (daynumber, timestamp). daynumber and timestamp can either be None *or* daynumber is an integer between 1 and 16777215 timestamp is (HOUR, MINUTES) The function 'counttodate' in dateutils will turn a daynumber back into a date. sixbit(invalue) Given a value in it returns a list representing the base 64 version of that number. Each value in the list is an integer from 0-63... The first member of the list is the most significant figure... down to the remainder. Should only be used for positive values. sixtoeight(intuple) Given four base 64 (6-bit) digits... it returns three 8 bit digits that represent the same value. If length of intuple != 4, or any digits are > 63, it returns None. **NOTE** Not quite the reverse of the sixbit function. table_enc(instring, table=TABLE) The actual function that performs TABLE encoding. It takes instring in three character chunks (three 8 bit values) and turns it into 4 6 bit characters. Each of these 6 bit characters maps to a character in TABLE. If the length of instring is not divisible by three it is padded with Null bytes. The number of Null bytes to remove is then encoded as a semi-random character at the start of the string. You can pass in an alternative 64 character string to do the encoding with if you want. table_dec(instring, table=TABLE) The function that performs TABLE decoding. Given a TABLE encoded string it returns the original binary data - as a string. If the data it's given is invalid (not data encoded by table_enc) it returns None (definition of invalid : not consisting of characters in the TABLE or length not len(instring) % 4 = 1). You can pass in an alternative 64 character string to do the decoding with if you want. return_now() Returns the time now. As (HOUR, MINUTES). binleave(data1, data2, endleave = False) Given two strings of binary data it interleaves data1 into data2 on a bitwise basis and returns a single string combining both. (not just the bytes interleaved). The returned string will be 4 bytes or so longer than the two strings passed in. Use bin_unleave to return the two strings again. Even if both strings passed in are ascii - the result will contain non-ascii characters. To keep ascii-safe you must subsequently encode with table_enc. Max length for the smallest data string (one string can be of unlimited size) is about 16meg (increasing this would be easy if anyone needed it - but would be very slow anyway). If either string is empty (or the smallest string greater than 16meg) - we return None. The first 4 characters of the string returned 'define' the interleave. (actually the size of the watermark) For added safety you could remove this and send seperately. Version 1.0.0 used a bf (bitfield) object from the python cookbook. Version 1.1.0 uses the binary and & and or | operations and is about 2.5 times faster. On my AMD 3000, leaving and unleaving two 20k files took 1.8 seconds. (instead of 4.5 previously - with Psyco enabled this improved to 0.4 seconds.....) Interleaving a file with a watermark of pretty much any size makes it unreadable - this is because *every* byte is changed. (Except perhaps a few at the end - see the endleave keyword). However it shouldn't be relied on if you need a really secure method of encryption. For many purposes it will be sufficient however. In practise any file not an exact multiple of the size of the watermark will have a chunk at the end that is untouched. To get round this you can set endleave = True.. which then releaves the end data back into itself. (and therefore takes twice as long - it shouldn't be necessary where you have a short watermark.) data2 ought to be the smaller string - or they will be swapped round internally. This could cause you to get them back in an unexpected order from binunleave. binunleave(data) Given a chunk of data woven by binleave - it returns the two seperate pieces of data. For the binary operations of binleave and binunleave, version 1.0.0 used a bf (bitfield) object from the python cookbook. class bf(object) the bf(object) from activestate python cookbook - by Sebastien Keim - Many Thanks http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/113799 Version 1.1.0 replaced these with specific binary AND & and OR | operations that are about 2.5 times faster. They are 'inline' in the functions for speed (avoiding function calls) but are available separately as well. def bittest(value, bitindex) This function returns the setting of any bit from a value. bitindex starts at 0. def bitset(value, bitindex, bit) Sets a bit, specified by bitindex, in in 'value' to 'bit'. bit should be 1 or 0 There are also the 'private functions' which actually contain the substance of binleave and binunleave, You are welcome to 'browse' them - but you shouldn't need to use them directly. Any comments, suggestions and bug reports welcome. Regards, Fuzzy michael AT foord DOT me DOT uk """