Re: Another request for comments on beginner code

Tim Peters (tim@ksr.com)
Sun, 06 Mar 94 15:01:08 EST

Guido and Steve make more good points, and Jeffrey should be encouraged
to hear that-- unlike as in other languages --style in Python is an
objective question of right versus wrong <grin>.

> [guido]
> ... my taste for loops is always
>
> while 1:
> get next value
> if not more values: break
> process value
>
> rather than your
>
> get next value
> while more values:
> process value
> get next value
>
> If you're ever going to change the way you get the next line, you're
> less likely to forget that you have to modify *two* identical
> statements.

In general, I agree! Another advantage becomes apparent if you ever want
to put a "continue" statement in the "process value" part (the way I
wrote the loop, you have to remember to duplicate the "get next value"
part yet another time for each continue).

Still, if a loop is very small and the trip count is very large, I'll
sometimes do it the latter way deliberately: if "process value" and "get
next value" are _very_ cheap, I've seen the latter way run about 10%
faster. I suspect this is simply because there's one less line in the
loop body in the latter way (an interpreter burns _some_ time just to
move from one line to the next, and the less work the lines do the more
significant that overhead). People should not in general worry about
that, though (I've run some number-crunching Python programs for _days_,
and at that extreme 10% here-and-there can add up to hours ...).

> ...
> Also I don't think that binding psfile.readline to getline will save
> much given that there is real I/O as well as other processing going on
> as well; a simple name lookup can't cost *that* much...

No argument; the "time savings" in this program is swamped by I/O. I
just wanted to pass on a perhaps non-obvious use for instance method
objects. Their use can improve clarity and speed in some cases.

> [guido & steve both package the program as a function, and avoid
> the name "string"]

Again, you're both objectively correct <harmonious smile>.

Now, for Jeffrey's sake, I'm going to push this one more level: the
_real_ value of Python isn't grasped until you take some aspect of your
program and abstract it into a reusable module. The thing that struck me
as most irritating about the "ps" program is the reliance on magic
integers to index into fields. E.g., your magic integers weren't quite
right for Steve's version of ps, and in any case you'd have to laboriously
figure out another set of magic integers if you wanted to extract a
different set of fields.

So, attached is a module (FixedParse.py) that supplies a class
(FixedParse) that understands how to extract info from fixed-format
lines. You create a "parsing object" by instantiating the class with a
string containing the field names, and an ASCII "picture" of where the
fields are. The class does the work of figuring out the string indices,
and class instances supply methods for extracting fields by field name,
and for creating a function that parses wrt a fixed set of field names
(& the latter is very efficient).

Of _course_ this is rabid overkill for the specific ps example. The
point is that _truly_ idiomatic Python consists largely of building
reusable modules, and that vital point can't be illustrated by solving a
specific problem in isolation.

Modifying Steve's solution to use the module yields:

#!/usr/local/bin/python
from os import popen
from FixedParse import FixedParse

psparser = FixedParse(
'user pid cpu mem sz rss tt stat start time command',
'uuuuuuuu pppppnnnnnmmmmmsssssrrrrr tt ssss xxxxxttttttt' + 'x'*200 )
parse = psparser.fieldsfunc('user','cpu','mem','command')
del psparser

def ps( *host ):
cmd = 'ps uax | grep -v root'
if host and host[0]:
cmd = 'rsh ' + host[0] + ' ' + cmd
for line in popen( cmd, 'r' ).readlines():
for field in parse(line): print field,

if __name__ == '__main__' :
import sys
if sys.argv[1:] and sys.argv[1]:
ps( sys.argv[1] )
else: ps( )

carry-on-abstracting-enough-and-you'll-have-no-idea-what-it-does-
anymore<grin>-ly y'rs - tim

Tim Peters tim@ksr.com
not speaking for Kendall Square Research Corp

class FixedParse:
def __init__(self, fieldnames, template):
import string
names = string.split(fieldnames)
indices = {}

column, templen = 0, len(template)
for name in names:
# deduce name's slice indices from template
if indices.has_key(name):
raise ValueError, 'fieldname ' + `name` + ' not unique'
# advance to next template field
try:
while template[column] == ' ':
column = column + 1
startchar = template[column]
except IndexError:
raise ValueError, 'no template field corresponding ' + \
'to fieldname ' + `name`
# consume template field
startcolumn = column
while startchar == template[column]:
column = column + 1
if column >= templen: break
indices[name] = startcolumn, column

# ensure the template is used up
leftover = template[column:]
if string.strip( leftover ):
raise ValueError, 'no field name(s) given for template ' + \
'field(s) ' + `leftover`

# remember the important stuff
self.names, self.indices = names, indices

# parse 'str', returning a list of the fieldnames' values
def fields(self, str, *fieldnames):
answer = []
for name in fieldnames:
try:
a, b = self.indices[name]
except KeyError:
raise ValueError, 'unknown fieldname ' + `name`
answer.append( str[a:b] )
return answer

# return a _function_ f such that f(str) == x.fields(str,fieldnames);
# this is more efficient if the same fields are going to be extracted
# many times
def fieldsfunc(self, *fieldnames):
f = None # will be bound to the return function
code = 'def f(s): return ['
for name in fieldnames:
try:
indices = self.indices[name]
except KeyError:
raise ValueError, 'unknown fieldname ' + `name`
code = code + 's[%d:%d],' % indices
# note that a trailing comma doesn't hurt
exec code + ']\n'
return f

# end of module