Re: Invertible id(), or maybe hash the world, or ?

Tim Peters (tim@ksr.com)
Tue, 05 Apr 94 23:39:47 -0400

> [jim]
> ...
> func_data[frame.f_code] = d1, d2, d3
>
> The perfect use of associative arrays (I was telling myself :-) ).
> Alas, the bad news came when I tried to execute the above code, and I
> found out that "code objects are not hashable." Hence, I *cannot*
> index an associative array with a code object (oh poo!). :-(
> ...
> [and several slick workarounds that are "too slow", and one (keying off
> `frame.f_code`) that's fast enough]

Nice set of tricks! Don't think you missed any. I would have stuck with
this one:

> I could bundle the inverter relation in with the basic array:
>
> func_data[id(frame.f_code)] = d1, d2, d3, frame.f_code
>
> but then every unpacking/packing of this tuple will have an extra
> assignment/re-assignment pair. :-(. Yes, even this time costs.

but would have assigned frame.f_code to a local temp first (if I was
worried about speed; a local ref is extremely cheap, and every time you
use it instead of the "frame.f_code" expression you avoid the
dramatically more expensive attribute lookup).

BTW, if code objects aren't mutable (not sure offhand), it's probably
just an oversight that they're not hashable (i.e., that they can't be
used to index dicts).

> ... [discovered too that mutable objects in general (like lists)
> can't be used as dict keys]

If you're not excessively <grin> worried about speed, you can fool a dict
into reversibly hashing anything via object identity, like so:

class Hasher:
def __init__(self, val):
self.val = val
def __hash__(self):
return hash(id(self.val))
def __cmp__(x, y):
return cmp(id(x.val), id(y.val))

Then, e.g.,

>>> d = {}
>>> myunhashableobj = [1,2,3]
>>> d[myunhashableobj] = 'ouch' # can't use list as dict key
Traceback (innermost last):
File "<stdin>", line 1
TypeError: unhashable type
>>> b = Hasher(myunhashableobj) # but can use a Hasher object
>>> c = Hasher(myunhashableobj)
>>> d[b] = 'OK!'
>>> d[c] # succeeds cuz c.val is same _object_ as b.val
'OK!'
>>> d.keys()[0].val is myunhashableobj # showing reversibility
1
>>> x = Hasher([1,2,3]) # showing that the same _value_ isn't a hit
>>> d[x]
Traceback (innermost last):
File "<stdin>", line 1
KeyError: <Hasher instance at 86b20>
>>>

That's a hack too, of course -- but it's general & it works.

> ...
> Note that the standard "id()" function can't really be inverted,
> because it does *not* bump the refcount on the object it processes.

Dollars to doughnuts sez Guido wouldn't invert it anyway, cuz there's no
(reasonably cheap) way for Python to guarantee that inverse_id's integer
argument points to a valid object (you're right that id() returns the
address of the object).

> ...
> I think IF I had an invertible id function, it would really produce a
> "integer like" object that effectively included a reference to the
> "real" class, with the ref-count bumped accordingly.

And that would also solve the "safety" problem above, presuming that
inverse_jim_id only accepts objects of the type produced by jim_id (hence
are guaranteed to be the address of a valid & still-active object).

In fact, we can write it all in Python today:

class JimIdNumber:
def __init__(self, obj):
self.obj = obj # a reference to keep obj alive
self.adr = id(obj)
self.hash = hash(self.adr) # cache for speed (ya, right ...)
def __repr__(self):
return '<safe address ' + `self.adr` + '>'
def __hash__(self):
return self.hash
def __cmp__(x, y):
return cmp(x.adr, y.adr)

def jim_id(obj):
return JimIdNumber(obj)

def inverse_jim_id(num):
if hasattr(num, '__class__') and num.__class__ is JimIdNumber:
return num.obj
raise TypeError, 'arg must be JimIdNumber'

Last step: Since it can all be done in Python now, Guido couldn't object
if you rewrote it as a C extension module <0.9 grin>.

> [amrit@xvt.com (get a real name, pal <wink>), explains the problems
> with hashing mutable types]
> ...
> I never did like or see much use to Icon's hash by ID semantics.

It favors speed, which is all anyone lately talks about <grin>.

Seriously, I can live (& have lived) happily with either flavor. But it
always struck me as a tad ironic that, when comparing structured things,
the object-oriented Python compares by value while the non-OO Icon
compares by identity ...

liking-python's-approach-better-but-not-by-a-whole-lot-ly y'rs - tim

Tim Peters tim@ksr.com
not speaking for Kendall Square Research Corp