Re: Another request for comments on beginner code

Tim Peters (tim@ksr.com)
Fri, 04 Mar 94 21:51:13 EST

> [jeffrey t, begging for abuse <wink>]
> ...
> # rps.py - remote ps script
> # prints owner, size, CPU time used, and command name
> #
> # give as first argument the remote machine name
> #
> import regsub, os, sys
>
> string = 'rsh '+ sys.argv[1] + ' ps uax | grep -v root'
>
> psfile = os.popen(string,"r")
> lines = psfile.readlines()
>
> for i in range(0,len(lines)):
> lines[i][:10] + lines[i][24:30] + ' ' + regsub.sub('\012','',lines[i][50:])
>
> psfile.close()

If it works, leave it alone <grin>. Nevertheless, here are things that
caught my Python-sensitive eye:

1) "range(0,N)" is always equivalent to "range(N)". So
for i in range(len(lines))
is more common.

2) But since you're just iterating over the elements of a list, there's
no need to index into it. So more idiomatic is ("idiomatic" is often
synonymous with "less work" &, when the sun is shining, "clearer"
too):

for line in lines:
line[:10] + line[24:30] + ' ' + regsub.sub('\012','',line[50:])

3) "lines" is now more noise than documentary, so toss the assignment and
change the "for" to:
for line in psfile.readlines():

4) .readline and .readlines break input _based on_ examining the input
for newlines, so there's no need to search for the newline (there's
exactly one occurrence of '\n' per string, and it's the last
character). So the newline can be chopped off as part of the slice:
for line in psfile.readlines():
line[:10] + line[24:30] + ' ' + line[50:-1]

[note: in general, the last line may not have a newline, but in this
particular application your system is buggy if ps doesn't terminate
each line properly, so there's little point in guarding against that
here]

5) The quotes in the output are ugly, and can be suppressed by using
"print":
print line[:10] + line[24:30] + ' ' + line[50:-1]

6) Now that "print" is in the picture, it's more efficient to let print
space output than it is to catenate the strings yourself:
print line[:10], line[24:30], ' ', line[50:-1]

That changes the spacing a little, so it's a tradeoff.

7) Only an irritatingly nice person would bother to close the file at the
end <grin>. So, in all, I'm left with:

import os, sys

string = 'rsh '+ sys.argv[1] + ' ps uax | grep -v root'
psfile = os.popen(string,"r")

for line in psfile.readlines():
print line[:10], line[24:30], ' ', line[50:-1]

8) "psfile" is also mostly noise now, so a fanatic might go on to change
the "for" to
for line in os.popen(string,"r").readlines():

I'm not sure why, but that rubs me the wrong way. Then again, my
inability to count above 2 is well-known here <0.9 grin>.

One higher-level thing to think about: popen sets up a _pipe_, and by
using .readlines() we're stalling the program until the producing end of
the pipe is entirely finished. Depending on your system (this really
isn't a Python issue), it may or may not be a throughput win to read from
the pipe incrementally. E.g., on a multi-processor system, "ps" and
"grep" and the Python script could all _potentially_ being doing useful
work concurrently.

This requires a few changes:

line = psfile.readline()
while line:
print line[:10], line[24:30], ' ', line[50:-1]
line = psfile.readline()

Now we're only reading one line at a time, so the Python program can
start printing as soon as the psfile pipe produces its first line of
output, and so on.

I actually do this kind of thing a slightly different way in practice,
but make sure you understand how the preceding works first:

getline = psfile.readline # an instance method object
line = getline()
while line:
print line[:10], line[24:30], ' ', line[50:-1]
line = getline()

I find this clearer, and at one time talked myself into believing that it
runs faster too (because it doesn't need to do the method lookup
"psfile.readline" each time thru the loop).

no-accounting-for-tastes-ly y'rs - tim

Tim Peters tim@ksr.com
not speaking for Kendall Square Research Corp