Re: Fast union/intersection of two sequences?

Guido.van.Rossum@cwi.nl
Tue, 24 Jan 1995 22:53:05 +0100

Jim Roskind writes:

> IF you want to use the address (`cause you make one-and-only-one
> object for each "thing" in the list) then you'd use:

> def __hash__(self):
> return id(self)
>
> def __cmp__(self, other):
> return id(self) - id(other)

and then goes on explaining what to do if you *don't* want this.

I'd just like to point out that the default behaviour is exactly the
example shown above, so you don't actually need to define these two
methods.

As an additional rule, if __cmp__ is defined, the default __hash__ is
disabled, and the object effectively cannot be used as a key unless a
__hash__ is also defined. This is done to prevent a casual user from
combining the default hash function with an explicit comparison: the
lack of __hash__ means that the class designer probably didn't think
about this possibility, and the presence of __cmp__ means that the
objects should not be compared trivially by comparing their id (==
address).

Sure, you can still mess things up (as Jim explained) by mis-designing
your hash or cmp function -- but then no force in the world can
*prevent* broken code :-)

As a final warning, *if* a class defines a __hash__, it has the
responsibility to be immutable. If you violate this rule, it is
possible to use an object as a dictionary key, and then change it,
after which it may become impossible to find it back in the
dictionary...

--Guido van Rossum, CWI, Amsterdam <mailto:Guido.van.Rossum@cwi.nl>
<http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>