Sugar for regular expression groupings.

Tracy Tims (tracy@snitor.sni.ca)
Mon, 22 Feb 1993 10:07:13 -0500

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Jaap Vermeulen: "Re: Sugar for regular expression groupings."
Previous message: Tim Peters: "Re: Sugar for regular expression groupings."
In reply to: Guido.van.Rossum@cwi.nl: "Re: Sugar for regular expression groupings."

I ended up not writing my own regex class with symbolic group names
for two reasons: if the feature is supplied in the standard python
module it will be faster, and it will be more portable (or
distributable).

I also figured that Guido would be more likely to add a simpler
solution, because it will be less work for most of the benefits. I
can live with integer group names. I'll either symbolically name
them, use them as is, or store complete argument lists for groups()
along with my regular expressions.

If necessary, symbolic group names can be added later without
obsoleting 'regs' or the additions I propose.

Improvement 1:
Add an optional dictionary attribute to regex objects.
Ex: regex.groupnames

Add varargs to the compile() method that will automatically
initially define the optional dictionary.
Ex: regex.compile( re, 'group_1_name', 'group_2_name')

Modify groups() so that if a string is given as a
group-selector argument it is used to index the dictionary to
obtain the group number.
Ex: decode.groups( 1, 'group_2_name')

This ends up looking like your solution, but the relationship
between 'regs', groups(), and 'groupnames' is explicit. This
is useful because it increases the number of "fruitful
interactions".

Improvement 2:
Add syntax to regular expressions so that groups can be named
in place, yielding the group dictionary. (This is a *big*
advantage over perl.)

For example:
re = '[^0-9]*\(<number>[0-9]+\)[ \t]+\(<label>[A-Za-z_-.]+\)'
decode = regex.compile( re)
n, l = decode.groups( 'number', 'label')

I like this idea, because then I can build complicated regular
expressions in substrings, and then catenate them together
into the final regular expression before compiling. It also
completely eliminates group-counting, and it provides a visual
indication of which groups are just for grouping, and which
are for substring extraction.

But what python really needs are LALR(1) parser objects, don't you
think?

Tracy Tims

Next message: Jaap Vermeulen: "Re: Sugar for regular expression groupings."
Previous message: Tim Peters: "Re: Sugar for regular expression groupings."
In reply to: Guido.van.Rossum@cwi.nl: "Re: Sugar for regular expression groupings."