Re: Overloading function calls

Tim Peters (tim@ksr.com)
Thu, 02 Jun 94 04:06:09 -0400

> [guido]
> (Only responses to points where we don't already fully agree or don't
> all know it yet :-)

Ditto!

> ...
> (We may decide that < > <= >= should fail for objects where they aren't
> explicitly defined, but == must certainly be defined on all type.)

I suspect we can finish off _almost_ this whole sub-area now:

1) == maps to __eq__, else __cmp__, else default comparison
2) <= maps to __le__, else __cmp__, else default comparison
3-5) similarly for < > >=
6) (the oddball) != maps to "not __eq__", else "__cmp__ != 0", else
default comparison

There are no special cases to support objects that want <, <= etc to
fail: The base language is happy to apply any comparison operator to any
two objects, so, for consistency, it's best for that to apply to user-
defined objects too.

OTOH, it's legitimate-- but unusual --for a class to _want_ to support
only e.g. "==", "!=", and "<". The way to do that is to define __eq__
and __lt__ to do what you want, and also define a __cmp__ method that
raises an exception. By the uniform operation of the rules above, ">",
">=" and "<=" will then be caught by the __cmp__, and raise the
exception.

Special support from the language simply isn't needed, and an explicit
exception-raising __cmp__ makes clear to the reader of the class that the
class is doing something unusual with relational operators.

Alternative for the user: Because of #A and #B below, a user who wants
">", ">=" and "<=" to fail may nevertheless want __cmp__ to work
normally. In that case they need to define __gt__, __ge__, and __le__,
and make _them_ raise an exception instead.

Red Flag: Getting this (IMO) clean little scheme to work requires
removing the current implementation restriction that exceptions raised
during comparisons are lost. Is that a show-stopper? I'm not clear on
where the difficulty lies.

Best stabs at the immediately-related issues (not all of which are
documented today):

A) When [].sort() finds a user-defined object, it applies its __cmp__
method if it exists, else uses default comparison. __eq__, __le__,
__lt__, __ge__ and __gt__ methods, if any exist, are all ignored by
.sort().

Debatable: Insert a hairier version of the "Debatable" comment in the
next section.

B) Ditto max and min: they invoke __cmp__ if it exists, else do default
comparisons; ignore the individual relational methods; and won't work
on objects that define an exception-raising __cmp__.

Debatable: It's possible to define max & min solely in terms of
any fixed one of the relational methods other than __eq__; e.g.,
__lt__ is enough. A full-blown __cmp__ isn't logically necessary.
I happen to think a full-blown __cmp__ is a reasonable requirement
for this purpose, though; anyone disagree?

C) Boolean "x in y": if y.__contains__ isn't defined, determines
membership using the "==" strategy. I.e., applies __eq__ when it's
defined, else __cmp__, else default comparison.

Implication: Unlike #A & #B, default "in" will work for a class that
supports only == (i.e., defines all of __[lt,le,gt,ge,cmp]__ to raise
exceptions).

Light Yellow Flag: Insert a milder version of the "Yellow Flag"
comment in the next section.

D) Dictionary lookup: As in #C, dict[x] uses the "==" strategy on x
(ignoring that it uses a hash strategy first, just cuz that's not
relevant to the relationals).

Yellow Flag: This implies dict lookup works for classes that support
only __hash__ and __eq__ (even if all the other relationals, including
__cmp__, raise exceptions). But if that's advertised, it constrains
the implementation of dicts (e.g., the implementation couldn't be
changed to use skip lists, or any other organization that requires
comparisons beyond equality).

Are there any other contexts where relationals are invoked by magic?

Open issues: Given e.g. "x < y", what happens if e.g. x.__lt__ is not
defined but both x.__cmp__ and y.__lt_rev__ (didn't really expect that
one, did you <grin>?) are defined? There are a lot of variations on
this. To make it even ickier, suppose x.__cmp__(y) raises an exception
but y.__lt_rev__(x) wouldn't? I.e., is the user entitled to assume that,
if there's more than one way to do an operation, Python will (if
possible) find a way that doesn't gripe? Or suppose x doesn't define any
comparisons, but y does define __cmp__: then in "x < y" does Python use
the default comparison (the best that x can do), or does it promise to
yield to y's presumably-superior knowledge?

Similar questions pop up in "x any_binary_operator y": if both x and y
define the operation, rules are needed to resolve who gets it. I think
comparisons appear more complicated still only because resolving a
relational may go through a sequence of decisions (__lt__ defined? else
__cmp__ defined? else use the default comparison); if so, the "extra"
complication is more apparent than real.

Today's coercion rules are complicated, but they do often save us from
the new complications above. I.e., today most binary operators require
__coerce__ to be defined, and __coerce__ is usually invoked magically
before the binary operator method is invoked.

So what to do about all this _seems_ to be tied to what to do about
__coerce__. In the absence of coercions, I favor the simplest rule: in
"x binary_operator y", x handles it if possible (and if that raises an
exception even though y wouldn't, tough luck; and "use default
comparison" counts as x "handling it"), else y handles it if possible,
else exception. The problem with a more-complex approach is that code
that _relied_ on a more-complex approach would probably be
incomprehensible. A dumb left-to-right rule is at least predictable.

> [about the implementation of __contains__]
> The latter operation should of course be implemented as if '==' was
> used.

Agree, and believe it needs to be explicitly spelled out (& took a stab
at that in #A-#D above). Of course you believe that too <wink>!

> The rest of Tim's message is about how to handle things like
> "3/complex_number". ... Tim mentions the possibility of defining
> separate methods for reversed operators (I imagine with names like
> __div_rev__) but doesn't like it.

I didn't say why, but will now: (A) There's so _many_ of them! (B) I
suspect that the bodies of "__method__" and "__method_rev__" will usually
be nearly identical, and (B1) duplication is tedious, and (B2) *near*-
duplication (unlike exact duplication) is error-prone ("cut-and-paste-
in-haste" mistakes).

> But I don't see any other solution that would really work.

One that seemed "obviously better", but also obviously incompatible with
the current scheme, is just to tell the (single) method the ordering of
the arguments. E.g., if x is an instance of class X,

x - 3.1 invokes X.__sub__(x, 3.1, 0)
3.1 - x invokes X.__sub__(x, 3.1, 1)

class X:
def __sub__(x, y, swapped):
answer = x.value - y
if swapped: answer = -answer
return answer
# or
# return (x.value,y)[swapped] - (y,x.value)[swapped]
# or
# return pow(-1,swapped) * (x.value - y)
# for Steve <grin>

A backward-compatible variation you'd _really_ hate is to make
"__swapped__" a magic variable installed in the method's local NS as part
of the binop method invocation <groan>.

> [notes that putting all the smarts in __coerce__ instead pretty
> much amounts to putting _all_ the smarts in __coerce__;
> considers a __coerce_rev__ and develops the notion for a full
> sentence before pausing to vomit <wink>; and explains how coerce
> came to be]
> ...
> I would almost concur that __coerce__ is not such a good idea after
> all -- each individual __add__, __mul__ etc. could easily call it
> explicitly if necessary.

This is what I was getting at with the "simplification" I withdrew; but
then I was laboring under the misconception that __coerce__ was somehow
"more necessary" for relational operators than the others, and somehow
_needed_ to exist to supply magic support for .sort()/max/min/in/
dict-key-lookup. I'm less (or more <wink>) confused now. Dropping it
altogether is much more attractive than the sorry mess I withdrew!

> On the other hand, if you are implementing a regular numeric type
> (e.g. Rational or Complex), then having __coerce__ can save you a line
> per function. (Having __coerce__ called automatically may also be a
> little faster but I'm less sure there.)
>
> As long as user-defined types are at the top of the numerical type
> hierarchy, I'm not sure that having __coerce__ is more than a
> convenience. How about junking it?

I want to take some time out to try recoding some "regular numeric"
classes as if __coerce__ didn't exist; it's the best way to find the
surprises <0.3 grin>. _Some_ of its functionality seems to be needed;
e.g. in
complex + matrix

and assuming Complex is a class that existed long before Matrix,
Complex.__add__ has no idea what to do, and needs to pass control on to
Matrix.__add__. Actually, we have that same problem today, because
__add__ doesn't go thru __coerce__ today. The way I've been doing that
is

class Complex:
def __add__(x, y):
# code deleted that handles the types for y I know about

if type(y) is type(self):
# y is some instance type but I don't know about it;
# reissue the operation and let Python figure it out
return y+x

else:
raise TypeError, 'what kind of crap you feeding me?!'

And that works out well -- old classes don't have to learn about new
classes, unless they want to run a little faster. OTOH, __add__ is
commutative in this example, so re-dispatching in a clear way is easier
than in the general case.

This needs more thought; not surprising, though, since it is the most
complicated part!

when-we-don't-sort-out-the-types-until-runtime-i-think-that-implies-
runtime-code-needs-to-sort-out-the-types<wink>-ly y'rs - tim

Tim Peters tim@ksr.com
not speaking for Kendall Square Research Corp