Re: Is this a regex bug or just me?

Martin Green (martin.a.green@hydro.on.ca)
Tue, 7 Feb 1995 09:45:08 -0500

On February 7 at 12:45:47 you (Tatu Ylonen) wrote:
[...]
>
> In general it is not very clear what should be in the register in an
> expression such as \([a-z]\)+. There are several possible semantics:
> - make it the first thing that the expression matched
> - make it the last thing that the expression matched
> - some intermediate thing it matched
> - empty because the last time it did not match
> - concatenation of all the things it ever matched
> - all characters from beginning of first match to end of last match
>
> Remember, the regexp can also be like \(\([a-z]\)[0-9]\)+.
> What should now be in register 2? Suppose you match it against
> "d7f7a9f6g8sdd". Should \2 be "7", "9", "6", "8", "", or "7f7a9f6g8"?
> Should \1 be "d7", "f7", "g8", "d7f7a9f6g8", "", or something else?

I would expect the register to contain the maximal length first match
to the parenthesized regular expression -- and nothing more.
Numbering of registers is, however, perhaps not as obvious. Should
register numbers not be assigned according to the order of the opening
parentheses? In your example, I would expect:

\1 = "d"
\2 = "d7"

In GNU regex, the "+" repeats the smallest possible preceding regular
expression as many times as possible (at least once) to yield the
maximum length match. In your example, the smallest possible
preceding regular expression is defined, by the parentheses, to be the
two character expression "[a-z][0-9]", so \(\([a-z]\)[0-9]\)+ matched
against "d7f7a9f6g8sdd" should return "d7f7a9f6g8".

>
> I don't have a clear opinion what should be in either \1 or \2.
> Currently, in my regexp package the value of a register is not
> well-defined if it is inside '+', '*', '?', or '|'.
>
> I am open to suggestions.
>
> Tatu

Martin

--
Martin A. Green                     Net :  green@rd.hydro.on.ca
Ontario Hydro Technologies          Tel :  (416) 207-5745
800 Kipling Ave,  KR236             FAX :  (416) 207-6216
Toronto, Ontario, CANADA, M8Z5S4