Moron hashing (rehashing hashing again)

Aaron Watters (aaron@funcity.njit.edu)
Fri, 10 Feb 1995 13:09:57 GMT

(the original title was "more on hashing" but I thought
"moron hashing" was funnier.)

I've been using some "bugs" in my kjbuckets module with
great success and it opened a train of thought I wanted to
air.

1) First, you can hash, for example a kjSet, which is a mutable
type.

>>> from kjbuckets import *
>>> x = kjSet([1,2,3])
>>> hash(x)
485380

this means x can be a key to a dict, but if it's subsequently
modified, you'll never find it again in the dict (as Guido points
out). I currently endeavor to never make this mistake.

2) You can even compute the hash function of a kjGraph if one of its
map items is unhashable. Roughly, the unhashable item is ignored in
the hash calculation, which means that if two values only differ on
unhashable maps they will have the same hash value. This is "safe"
because if the top-level graph remains fixed but a mutable member
changes, the hash value remains constant:

>>> D = {"hi":"hello"}
>>> G = kjGraph( [ (1,D), (2,3) ] )
>>> hash(G)
-364114
>>> D["bye"]="seeya"
>>> G
kjGraph([(1, {'hi': 'hello', 'bye': 'seeya'}), (2, 3)])
>>> hash(G)
-364114

The down side is that if two graphs differ only on non-mutable
maps they will have the same hash value, and they can collide
as keys to dictionaries, making dict accesses slower.

These bug/features makes it possible to write some nasty,
fast, space efficient code without having to constantly worry
about what is hashable/mutable and what is not.

I'm thinking of fixing (1) by adding a "mutable" flag which is
reset for a kj[Set|Graph|Dict] whenever its hash value is computed,
causing the table to raise an error on subsequent modification
attempts. (2) I'll just keep, and add a warning about hash
collisions in the documentation.

-->I was wondering whether these features should be extended to
the "core" python types: eg, allow computation of the hash
value of a tuple that contains a dictionary, by "ignoring"
the dictionary in the hash computation. Comments?<--

[You could also add a "mutable" flag to dictionaries, allowing
them to be safely used as keys -- but I don't think this would
be worth the extra overhead (use my slower kjDicts if you want
to hash using dictionaries).]

-a
===
-- Quick, Robin, catch it in your teeth! [Throws batarang.]
-- [Later, massaging jaw] Holy molars, Batman! That was close!
-- Yes, Robin, I geuss all those years of good dental hygiene
really paid off. --from _Batman_ the TV show.