Cool!
> I don't think the interface to access substrings by name instead of by
> number buys you much
Not initially, no ... it's a year later when the format changes that the
pain begins <smile>. Still, I wouldn't _recommend_ people generally use
a name interface either, cuz no matter how it's done it's gonna be pretty
slow. For that reason, I don't use a name interface myself for regexp-
crunching on large volumes of data.
> (except an advantage over Perl :-).
Having a regexp _object_ is an advantage over Perl already; the Perl
folks ask for "something like that" regularly. But getting the effect of
named fields is already easy in Perl. E.g., for the fpformat.py example:
>>> fpre = '^\([-+]?\)0*\([0-9]*\)\(\(\.[0-9]*\)?\)\(\([eE][-+]?[0-9]+\)?\)$'
>>> decoder = regex2.compile( fpre, 'all','sign','int','frac','junk','exp')
>>> decoder.match('-2.3e45')
7
>>> decoder.matches_by_name('sign','int','frac','exp')
('-', '2', '.3', 'e45')
In idiomatic Perl that looks like:
$fpre = '^([-+]?)0*(\d*)((\.\d*)?)((e[-+]?\d+)?)$';
($sign,$int,$frac,$junk,$exp) = '-2.3e45' =~ /$fpre/io;
print "$sign $int $frac $exp\n";
which prints "- 2 .3 e45". A pretty close equivalent in Python would be
if a compiled regexp's search method returned a tuple with a number of
elements equal to the number of meta-parentheses in the regexp:
>>> sign,int,frac,junk,exp = decoder.hypothetical_search('-2.3e45')
where `sign' etc are bound to None if the search fails. This way has
attractions too.
Lots of ways to skin this cat <smile>! I hope someone who does a lot of
regexp crunching (I really don't) tries out several approaches & says what
they like best.
> You can always define constants to name the substrings near the place
> where you write down the pattern.
That's fine by me, although there's a little danger from unintended
namespace collisions.
A suggestion for people who intend to do that: Instead of defining the
"constants" like this:
ALL = 0
SIGN = 1
INT = 2
FRAC = 3
JUNK = 4
EXP = 5
Do it like this:
[ALL, SIGN, INT, FRAC, JUNK, EXP] = range(6)
You'll be glad you did when things change ...
agreeably y'rs - tim
Tim Peters tim@ksr.com
not speaking for Kendall Square Research Corp