more on awk vs. python example

Lou Kates (louk@research.teleride.on.ca)
Sun, 13 Sep 1992 19:02:00 -0400

This is a reworked Python vs. awk (actually nawk) comparison
using the improvement suggested by Guido. (Note that I have an
older version of Python and the exec function in the program
below is now apparently called match in more recent versions of
Python). The Python program is now only 50% larger than the awk
program and the part that does the real work is actually smaller.

The size advantage of awk comes entirely from its automatic
option processing, argument processing, input loop processing and
field splitting. The Python algorithm actually beats the awk one
in the portion that does the real work.

The following is the breakdown:

Python Awk
Declaration statements 1 stmt 0 stmts
Default options 1 stmt 1 stmt
Option processing 3 stmts 0 stmts
Argument processing 3 stmts 0 stmts
Reading & looping over input 3 stmts 0 stmts
Field splitting 1 stmt 0 stmts

Get list of whitespace fields 0 stmts 3 stmts
Check number of fields 1 stmt 1 stmt
Swap & print fields 6 stmts 6 stmts (4 lines)
End of block stmts 0 stmts 1 stmt

Total 18 stmt 12 stmts (10 lines)

Also to be mentioned is that the the Python approach eliminates
the loop over the arguments which is nice (although it does
require an explicit loop over the input lines which awk does
not).

On the other hand, the line:

line = line[:a] + line[c:d] + line[b:c] + line[a:b] + line[d:]

in the Python program is a bit difficult to follow and what one
really wants to say (to be closer to how you might think when
developing pseudocode) is:

swap line[a:b] and line[c:d]

or

line[a:b], line[c:d] = line[c:d], line[a:b]

but one can't because strings are immutable.

Extrapolating the advantages listed for awk, we see that the
savings as a percentage of code size would tend to be relatively
smaller for larger programs. It is likely that past a certain
size Python has smaller code size while under a certain size awk
does.

There really is no reason that with flags to allow automated
processing of arguments, automatic processing of options,
automatic processing of the input loop and automatic field
splitting plus mutuable strings that the Python code could not be
made to beat the awk code even for programs of this size.

It would also be nice and certainly could not hurt to have the
sub and gsub functions of awk in the Python string library.

=== awk program ===

BEGIN { if (x == 0) x = 1; if (y == 0) y = 2 }
NF >= x && NF >=y {
line = $0;
gsub(/[^ \t]+/, ":", line);
split(line, s, ":");
t = $y; $y = $x; $x = t;
for(i=1; i<=NF; i++)
printf("%s%s", s[i], $i);
print "";
}

=== Python program ===

import getopt, regexp, string, sys
options, files = getopt.getopt(sys.argv[1:], 'x:y:')
x, y = 1, 2
for (a, b) in options:
if 'x' in a: x = string.atoi(b)
if 'y' in a: y = string.atoi(b)
if x > y: x, y = y, x
fp = sys.stdin
if files: fp = open(files[0], 'r')
while 1:
line = fp.readline()
if not line: break
if len(string.split(line)) >= y:
t = (regexp.compile('[ \t]*([^ \t]+)' * y)).exec(line)
(a, b), (c, d) = t[x], t[y]
line = line[:a] + line[c:d] + line[b:c] + line[a:b] + line[d:]
print line[:-1]