find - simple Python example inspired by Tom Christiansen's BYTE

Steven D. Majewski (sdm7g@elvis.med.Virginia.EDU)
Wed, 6 Apr 1994 07:13:54 GMT

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Tim Peters: "Re: Unique string storage"
Previous message: Tim Peters: "Re: Invertible id(), or maybe hash the world, or ?"

The latest issue of Byte magazine had an article on Perl by
Tom Christiansen, in which he used some 'find' like problems
( find big files, sort files by modification date, etc. )
as perl programming examples.

This is not functionally the same exact program - I didn't
have a copy of his Perl program on hand when I wrote it,
but I thought it would be a good demonstration to Python
novices to show how to do the same sort of problems in Python.

Some notes:

Python modules can define functions and classes,and can
also execute any arbitrary statements the first time they
are imported ( or anytime they are reload(module) -ed. )
When python modules are imported, their __name__ attribute
is usually equal to the name of the module, However, when
a python script is run as a (stand-alone) program, it's
name is equal to "__main__". This allows modules to both
contain common function definitions imported and used by
other modules, and to contain a stand-alone main routine.
( If it is not a stand-alone program, I will often include
test code that exercises the modules functions. )

Python module os.path, has a function 'walk', that recursively
walks thru a directory and applies a function to the filelist
for that directory. The filelist can be modified to control
further processing, so you can prune the tree at a certain
point. The function in this version doesn't do anything fancy-
it just appends files to a filename list.

Later processing and selection can also be done by selecting
items from the list with 'filter', a builtin Python function
that takes a function that returns a boolean value (0 or 1),
and a list, and returns the elements of the list for which
the function is true.

filter(lambda f: os.stat(f)[ST_SIZE] > size, apply(find, args))

The test here is a simple one line expression, so a lambda
expression funcion is used. For a more complicated test,
a multi-line function can also be defined with 'def' .

'apply' is used because args may be a variable number of arguments.
If it was defined to take a single arg ( wich could be a list or
a tuple of items), then "find( arg )" would be sufficient.

The default sort function in Python can sort different types of
objects ( the order of different types is arbitrary but consistent)
and composite objects. One sorting trick is to create tuples
that have the proper indices as initial values for the tuple.
A ( number, string, anything-else ) tuple will always sort by
first, the number, then the string, and then, the anything-else.
So sort-by-date/size is done by turning each file name in the list
into a ( date|size, filename ) tuple.

Python's builtin map function is like Lisp's map. It takes a
function of N args, followed by N sequences:

files = map( lambda f: (os.stat(f)[ST_MTIME], f), files )

Just an example: Not complete, not completely debugged, and
not checked for portability ( In fact, I just noticed that
I used '.' as the default, instead of getting the pwd
( which on a Mac, probably has a different literal designation.))

Usage:
find.py [-D|-N|-S] [-R] [-s size] [pathnames...]
-D : sort by modification date/time
-N : sort by filename
-S : sort by size
-R : reverse order of sort
-s size : include only files >= size
size can be an expression using +/-*
and ending in a character code [KkMm][Bb] multiplier:
0.5Mb, 100K, 8*2048+512, etc.
pathnames can include ~user or $VAR expansions.
No pathname indicates the current directory.

Maybe I'll try a more complete program later, but my interest
is not so much in duplicating the functionality and interface
of find ( as find2perl does ) : I find often find myself struggling
to figure out how to get find to do what I want - I would prefer
to try to figure out a *better* interface.

- Steve Majewski (804-982-0831) <sdm7g@Virginia.EDU>
- UVA Department of Molecular Physiology and Biological Physics

#!/usr/local/bin/python
# find.py
#
import os,sys,string
from stat import ST_SIZE,ST_MTIME
from time import ctime
expanduser = os.path.expanduser
expandvars = os.path.expandvars

def expand( name ): # handle $HOME/pathname or ~user/pathname
return expandvars( expanduser( name ))

if os.name in ( 'posix', 'dos' ):
DO_NOT_INCLUDE = ( '.', '..' ) # don't include these for dos & posix
else:
DO_NOT_INCLUDE = ( ) # I don't know what this should be for Mac

def find( *paths ):
list = []
for pathname in paths:
os.path.walk( expand(pathname), append, list )
return list

def append( list, dirname, filelist ):
for filename in filelist:
if filename not in DO_NOT_INCLUDE:
filename = os.path.join( dirname, filename )
if not os.path.islink( filename ): list.append( filename )

def evalsize( sizestr ):
mult = 1
# this allows "10k", "1M" as abbreviations:
if sizestr[-1] in ('b', 'B' ) : sizestr = sizestr[:-1]
if sizestr[-1] in string.letters :
if sizestr[-1] in ( 'k', 'K' ): mult = 1024
elif sizestr[-1] in ( 'm', 'M' ) : mult = 1024*1024
else: raise RuntimeError, 'No mutiplier for: '+ arg[-1]
sizestr = sizestr[:-1]
# using 'eval()' rather than 'string.atoi()'
# allows "10*1024", "0.25*1M" expressions:
size = eval( sizestr ) * mult
# also "1024*1024/2" or "1/2.0m" but NOT "1M/2"
# multiplier letter must be last char!
# 1/2.0m is (1/2.0)*1m , NOT 1/(2m)
# and 1/2m is integer division 1/2 = 0 !
return size

from Getopt import Getoptd

by_date = '-D'
by_size = '-S'
by_name = '-N'
reverse = '-R'
opt_str = 'DSNRs:'
if __name__ == '__main__' :
opt = Getoptd( sys.argv[1:], opt_str )
if opt.has_key('-s'): size=evalsize( opt['-s'])
else: size = 0
args = ()
for each in opt['']: args = args + ( each, )
if not args : args = ( '.', ) # current directory
if size:
# select files > size
files = filter(lambda f: os.stat(f)[ST_SIZE] > size, apply(find, args))
else:
files = apply( find, args )
if opt.has_key(by_date):
# turn each filename into a ( MTIME, filename ) tuple pair
files = map( lambda f: (os.stat(f)[ST_MTIME], f), files )
files.sort()
if opt.has_key(reverse): files.reverse()
for f in files:
print string.ljust(f[1],52), ctime( f[0] )
elif opt.has_key(by_size):
# turn each filename into a ( SIZE, filename ) pair
files = map( lambda f: (os.stat(f)[ST_SIZE], f ), files )
files.sort()
if opt.has_key(reverse): files.reverse()
for f in files:
print string.ljust(f[1],52), '%9d' % f[0]
else:
if opt.has_key(by_name) :
files.sort()
if opt.has_key(reverse): files.reverse()
for f in files: print f

#
# Getopt.py:
#
# getopt returns a tule-air of list of tuple-pairs and list of args.
# The results are easier to handle if they are returned as a
# dictionary.
# Use Getoptd().has_key() to check for existance of a switch
# and Getoptd()[] to retrieve it's value.
# non-option arguments are entered with a empty string '' key.
#
from getopt import getopt

def Getoptd( args, argstr ):
opts, args = getopt( args, argstr )
d = { '':args }
for opt in opts:
d[opt[0]] = opt[1]
return d

Next message: Tim Peters: "Re: Unique string storage"
Previous message: Tim Peters: "Re: Invertible id(), or maybe hash the world, or ?"