Sugar for regular expression groupings.

Tracy Tims (tracy@snitor.sni.ca)
Fri, 19 Feb 1993 16:59:27 -0500

I find myself using regex.compile() frequently for parsing lines from
various data-files. I write a regular expression that contains a
number of \( and \) groupings, and then I use slice notation to fetch
the tokens I want from the line. Here's an example:

format = regex.compile( a_pattern)

if format.match( data) != -1:
old_ver = data[format.regs[1][0]:format.regs[1][1]]
new_ver = data[format.regs[2][0]:format.regs[2][1]]
user = data[format.regs[3][0]:format.regs[3][1]]
date = data[format.regs[4][0]:format.regs[4][1]]
time = data[format.regs[5][0]:format.regs[5][1]]
host = data[format.regs[6][0]:format.regs[6][1]]
dir = data[format.regs[7][0]:format.regs[7][1]]

I use the slice notation/regex register code over and over again, in
many programs. This is so common that perhaps there should be a
clearer way to do this. (I've seen the idiom in fpformat.py, and it
doesn't make me completely happy.)

What if a compiled regular expression object had an attribute which
was a reference to the last string on which it sucessfully matched (I
think this is a good idea anyway--I have always been uncomfortable
with the fact that a regular expression object only contains half of
the information needed to extract groups), and if it had a simple
method for returning groups out of the matched string?

I could reduce the example above to the following:

if format.match( data) != -1:
old_ver, new_ver, user, date, time, host, dir \
= format.groups(1,2,3,4,5,6,7)

Code using the groups() method is easier to modify and maintain
because it doesn't have the internal interdependencies that the first
example has (and the fpformat.py idiom also has).

Tracy Tims