> > -- What techniques do you use to isolate unexpected heap growth? The
>
> This question is about as general as "how do you debug a program".
> You don't need to read every module you use but you must understand
> the general data flow within your program.
Yeah, this is a tough question. The reason I asked this question was
because in my own projects, I use my own implementation of a mallocator
on top of malloc et al. It collects lots of information on
allocation/free patterns. The depth of the "checking" that this
mallocator performs is controlled by #defines. I noticed various
#defines that you use for profiling certain things, but did not grok
them all. I was sort of hoping that some of them were for tracking
memory behavior.
With respect to "reading modules" I did find some leaks in some
of the modules that I use (e.g., dbm.keys(), I think). I got this
deep fear that I should probably eye-ball all the modules that I
was using that dealt with "big" things or were used a lot in long
running processes. You impressed me so much with the rest of Python,
I was vaguely hoping that you had done something wizzy with
certifying performance profiles.
> > -- Is anyone aware of memory footprint related problems with dbm (I use it
> > rather heavily)?
>
> Hmm... Dbm is magic for me. If you suspect anything here, perhaps
> you can construct a small test program that repeatedly opens a dbm
> database, accesses it a few times (or a lot), and closes it -- and see
> if your process size changes.
As i mentioned in another posting after this first one, I tracked down
a leak in dbm.keys() -- seems to leak all the keys it builds into
the list you get back from this call. This is not so good when
the list is 100's of thou or millions long.
> > -- has anyone created iterator types for sucking the contents out of a
> > mapping type (especially something like dbm)? The 'keys()' method
> > doesn't seem to scale real well with dbm hash files that are large.
> > Am I thinking about this wrongly?
>
> This is indeed a problem with dbm. I suppose it would be an easy
> modification to provide access to the firstkey and nextkey operators
> that the dbm library provides.
I started to implement one, but I like iterators to have the
semantics that you can create more than one iterator on a list
and they are unrelated to one another (dealing with deletions
is another matter, but for read only...).
The dbm(3) api has about enough hooks to do this (e.g., nextkey() takes
a 'key' as an arg to be used as the "base for finding next"), but dbm
only allows one instance per app. ndbm(3) allows multiple instances
per app, but forces only one iterator (e.g., nextkey() does not take
the 'last-key' arg). Bummer.
So, I stopped for the moment. Generalizing to multiple iterators
is going to be harder or less efficient than I would have liked.
Cheers,
lef