find - simple Python example inspired by Tom Christiansen's BYTE

Steven D. Majewski (sdm7g@elvis.med.Virginia.EDU)
Wed, 6 Apr 1994 07:13:54 GMT

The latest issue of Byte magazine had an article on Perl by
Tom Christiansen, in which he used some 'find' like problems
( find big files, sort files by modification date, etc. )
as perl programming examples.

This is not functionally the same exact program - I didn't
have a copy of his Perl program on hand when I wrote it,
but I thought it would be a good demonstration to Python
novices to show how to do the same sort of problems in Python.

Some notes:

Python modules can define functions and classes,and can
also execute any arbitrary statements the first time they
are imported ( or anytime they are reload(module) -ed. )
When python modules are imported, their __name__ attribute
is usually equal to the name of the module, However, when
a python script is run as a (stand-alone) program, it's
name is equal to "__main__". This allows modules to both
contain common function definitions imported and used by
other modules, and to contain a stand-alone main routine.
( If it is not a stand-alone program, I will often include
test code that exercises the modules functions. )

Python module os.path, has a function 'walk', that recursively
walks thru a directory and applies a function to the filelist
for that directory. The filelist can be modified to control
further processing, so you can prune the tree at a certain
point. The function in this version doesn't do anything fancy-
it just appends files to a filename list.

Later processing and selection can also be done by selecting
items from the list with 'filter', a builtin Python function
that takes a function that returns a boolean value (0 or 1),
and a list, and returns the elements of the list for which
the function is true.

filter(lambda f: os.stat(f)[ST_SIZE] > size, apply(find, args))

The test here is a simple one line expression, so a lambda
expression funcion is used. For a more complicated test,
a multi-line function can also be defined with 'def' .

'apply' is used because args may be a variable number of arguments.
If it was defined to take a single arg ( wich could be a list or
a tuple of items), then "find( arg )" would be sufficient.

The default sort function in Python can sort different types of
objects ( the order of different types is arbitrary but consistent)
and composite objects. One sorting trick is to create tuples
that have the proper indices as initial values for the tuple.
A ( number, string, anything-else ) tuple will always sort by
first, the number, then the string, and then, the anything-else.
So sort-by-date/size is done by turning each file name in the list
into a ( date|size, filename ) tuple.

Python's builtin map function is like Lisp's map. It takes a
function of N args, followed by N sequences:

files = map( lambda f: (os.stat(f)[ST_MTIME], f), files )

Just an example: Not complete, not completely debugged, and
not checked for portability ( In fact, I just noticed that
I used '.' as the default, instead of getting the pwd
( which on a Mac, probably has a different literal designation.))

Usage:
find.py [-D|-N|-S] [-R] [-s size] [pathnames...]
-D : sort by modification date/time
-N : sort by filename
-S : sort by size
-R : reverse order of sort
-s size : include only files >= size
size can be an expression using +/-*
and ending in a character code [KkMm][Bb] multiplier:
0.5Mb, 100K, 8*2048+512, etc.
pathnames can include ~user or $VAR expansions.
No pathname indicates the current directory.

Maybe I'll try a more complete program later, but my interest
is not so much in duplicating the functionality and interface
of find ( as find2perl does ) : I find often find myself struggling
to figure out how to get find to do what I want - I would prefer
to try to figure out a *better* interface.

- Steve Majewski (804-982-0831) <sdm7g@Virginia.EDU>
- UVA Department of Molecular Physiology and Biological Physics

#!/usr/local/bin/python
# find.py
#
import os,sys,string
from stat import ST_SIZE,ST_MTIME
from time import ctime
expanduser = os.path.expanduser
expandvars = os.path.expandvars

def expand( name ): # handle $HOME/pathname or ~user/pathname
return expandvars( expanduser( name ))

if os.name in ( 'posix', 'dos' ):
DO_NOT_INCLUDE = ( '.', '..' ) # don't include these for dos & posix
else:
DO_NOT_INCLUDE = ( ) # I don't know what this should be for Mac

def find( *paths ):
list = []
for pathname in paths:
os.path.walk( expand(pathname), append, list )
return list

def append( list, dirname, filelist ):
for filename in filelist:
if filename not in DO_NOT_INCLUDE:
filename = os.path.join( dirname, filename )
if not os.path.islink( filename ): list.append( filename )

def evalsize( sizestr ):
mult = 1
# this allows "10k", "1M" as abbreviations:
if sizestr[-1] in ('b', 'B' ) : sizestr = sizestr[:-1]
if sizestr[-1] in string.letters :
if sizestr[-1] in ( 'k', 'K' ): mult = 1024
elif sizestr[-1] in ( 'm', 'M' ) : mult = 1024*1024
else: raise RuntimeError, 'No mutiplier for: '+ arg[-1]
sizestr = sizestr[:-1]
# using 'eval()' rather than 'string.atoi()'
# allows "10*1024", "0.25*1M" expressions:
size = eval( sizestr ) * mult
# also "1024*1024/2" or "1/2.0m" but NOT "1M/2"
# multiplier letter must be last char!
# 1/2.0m is (1/2.0)*1m , NOT 1/(2m)
# and 1/2m is integer division 1/2 = 0 !
return size

from Getopt import Getoptd

by_date = '-D'
by_size = '-S'
by_name = '-N'
reverse = '-R'
opt_str = 'DSNRs:'
if __name__ == '__main__' :
opt = Getoptd( sys.argv[1:], opt_str )
if opt.has_key('-s'): size=evalsize( opt['-s'])
else: size = 0
args = ()
for each in opt['']: args = args + ( each, )
if not args : args = ( '.', ) # current directory
if size:
# select files > size
files = filter(lambda f: os.stat(f)[ST_SIZE] > size, apply(find, args))
else:
files = apply( find, args )
if opt.has_key(by_date):
# turn each filename into a ( MTIME, filename ) tuple pair
files = map( lambda f: (os.stat(f)[ST_MTIME], f), files )
files.sort()
if opt.has_key(reverse): files.reverse()
for f in files:
print string.ljust(f[1],52), ctime( f[0] )
elif opt.has_key(by_size):
# turn each filename into a ( SIZE, filename ) pair
files = map( lambda f: (os.stat(f)[ST_SIZE], f ), files )
files.sort()
if opt.has_key(reverse): files.reverse()
for f in files:
print string.ljust(f[1],52), '%9d' % f[0]
else:
if opt.has_key(by_name) :
files.sort()
if opt.has_key(reverse): files.reverse()
for f in files: print f

#
# Getopt.py:
#
# getopt returns a tule-air of list of tuple-pairs and list of args.
# The results are easier to handle if they are returned as a
# dictionary.
# Use Getoptd().has_key() to check for existance of a switch
# and Getoptd()[] to retrieve it's value.
# non-option arguments are entered with a empty string '' key.
#
from getopt import getopt

def Getoptd( args, argstr ):
opts, args = getopt( args, argstr )
d = { '':args }
for opt in opts:
d[opt[0]] = opt[1]
return d