The size advantage of awk comes entirely from its automatic
option processing, argument processing, input loop processing and
field splitting. The Python algorithm actually beats the awk one
in the portion that does the real work.
The following is the breakdown:
Python Awk
Declaration statements 1 stmt 0 stmts
Default options 1 stmt 1 stmt
Option processing 3 stmts 0 stmts
Argument processing 3 stmts 0 stmts
Reading & looping over input 3 stmts 0 stmts
Field splitting 1 stmt 0 stmts
Get list of whitespace fields 0 stmts 3 stmts
Check number of fields 1 stmt 1 stmt
Swap & print fields 6 stmts 6 stmts (4 lines)
End of block stmts 0 stmts 1 stmt
Total 18 stmt 12 stmts (10 lines)
Also to be mentioned is that the the Python approach eliminates
the loop over the arguments which is nice (although it does
require an explicit loop over the input lines which awk does
not).
On the other hand, the line:
line = line[:a] + line[c:d] + line[b:c] + line[a:b] + line[d:]
in the Python program is a bit difficult to follow and what one
really wants to say (to be closer to how you might think when
developing pseudocode) is:
swap line[a:b] and line[c:d]
or
line[a:b], line[c:d] = line[c:d], line[a:b]
but one can't because strings are immutable.
Extrapolating the advantages listed for awk, we see that the
savings as a percentage of code size would tend to be relatively
smaller for larger programs. It is likely that past a certain
size Python has smaller code size while under a certain size awk
does.
There really is no reason that with flags to allow automated
processing of arguments, automatic processing of options,
automatic processing of the input loop and automatic field
splitting plus mutuable strings that the Python code could not be
made to beat the awk code even for programs of this size.
It would also be nice and certainly could not hurt to have the
sub and gsub functions of awk in the Python string library.
=== awk program ===
BEGIN { if (x == 0) x = 1; if (y == 0) y = 2 }
NF >= x && NF >=y {
line = $0;
gsub(/[^ \t]+/, ":", line);
split(line, s, ":");
t = $y; $y = $x; $x = t;
for(i=1; i<=NF; i++)
printf("%s%s", s[i], $i);
print "";
}
=== Python program ===
import getopt, regexp, string, sys
options, files = getopt.getopt(sys.argv[1:], 'x:y:')
x, y = 1, 2
for (a, b) in options:
if 'x' in a: x = string.atoi(b)
if 'y' in a: y = string.atoi(b)
if x > y: x, y = y, x
fp = sys.stdin
if files: fp = open(files[0], 'r')
while 1:
line = fp.readline()
if not line: break
if len(string.split(line)) >= y:
t = (regexp.compile('[ \t]*([^ \t]+)' * y)).exec(line)
(a, b), (c, d) = t[x], t[y]
line = line[:a] + line[c:d] + line[b:c] + line[a:b] + line[d:]
print line[:-1]