I also figured that Guido would be more likely to add a simpler
solution, because it will be less work for most of the benefits. I
can live with integer group names. I'll either symbolically name
them, use them as is, or store complete argument lists for groups()
along with my regular expressions.
If necessary, symbolic group names can be added later without
obsoleting 'regs' or the additions I propose.
Improvement 1:
Add an optional dictionary attribute to regex objects.
Ex: regex.groupnames
Add varargs to the compile() method that will automatically
initially define the optional dictionary.
Ex: regex.compile( re, 'group_1_name', 'group_2_name')
Modify groups() so that if a string is given as a
group-selector argument it is used to index the dictionary to
obtain the group number.
Ex: decode.groups( 1, 'group_2_name')
This ends up looking like your solution, but the relationship
between 'regs', groups(), and 'groupnames' is explicit. This
is useful because it increases the number of "fruitful
interactions".
Improvement 2:
Add syntax to regular expressions so that groups can be named
in place, yielding the group dictionary. (This is a *big*
advantage over perl.)
For example:
re = '[^0-9]*\(<number>[0-9]+\)[ \t]+\(<label>[A-Za-z_-.]+\)'
decode = regex.compile( re)
n, l = decode.groups( 'number', 'label')
I like this idea, because then I can build complicated regular
expressions in substrings, and then catenate them together
into the final regular expression before compiling. It also
completely eliminates group-counting, and it provides a visual
indication of which groups are just for grouping, and which
are for substring extraction.
But what python really needs are LALR(1) parser objects, don't you
think?
Tracy Tims