jprofile.py and pstats.py -- a new profiler by JAR

Guido.van.Rossum@cwi.nl
Mon, 06 Jun 1994 16:16:36 +0200

Jim Roskind has finally gotten permission from his company to release
his new profile module. A gzipped tar file containing his code and
documentation can be found on ftp.cwi.nl in pub/python as
jprofile.tar.gz.

This will probably end up in standard Python library in the next
release. Would people prefer to use the new name "jprofile" or is is
OK to replace the original "profile" module with Jim's version?

>From Jim's documentation:

HOW IS jprofile DIFFERENT FROM THE CURRENT profile MODULE?

The big changes from standard profiling module are that you get more
information, and you pay less CPU time. It's not a trade-off, it's a
trade-up.

To be specific:

bugs removed: local stack frame is no longer molested, execution time
is now charged to correct functions, ....

accuracy increased: profiler execution time is no longer charged to
user's code, calibration for platform is supported, file reads
are not done *by* profiler *during* profiling (and charged to
user's code!), ...

speed increased: Overhead CPU cost was reduced by more than a factor of
two (perhaps a factor of five), lightweight profiler module is
all that must be loaded, and the report generating module
(pstats) is not needed during profiling.

recursive functions support: cumulative times in recursive functions
are correctly calculated; recursive entries are counted; ...

large growth in report generating UI: distinct profiles runs can be added
together forming a comprehensive report; functions that import
statistics take arbitrary lists of files; sorting criteria is now
based on keywords (instead of 4 integer options); reports shows
what functions were profiled as well as what profile file was
referenced; output format has been improved, ...

INSTANT USERS MANUAL

This section is provided for users that "don't want to read the
manual." It provides a very brief overview, and allows a user to
rapidly perform profiling on an existing application.

To profile an application with a main entry point of "foo()", you
would add the following to your module:

import jprofile
jprofile.run("foo()")

The above action would cause "foo()" to be run, and a series of
informative lines (the profile) to be printed. The above approach is
most useful when working with the interpreter. If you would like to
save the results of a profile into a file for later examination, you
can supply a file name as the second argument to the run() function:

import jprofile
jprofile.run("foo()", 'fooprof')

When you wish to review the profile, you should use the methods in the
pstats module. Typically you would load the statistics data as
follows:

import pstats
p = pstats.Stats('fooprof')

The class "Stats" (the above code just created an instance of this
class) has a variety of methods for manipulating and printing the data
that was just read into "p". When you ran jprofile.run() above, what
was printed was the result of three method calls:

p.strip_dirs().sort_stats(-1).print_stats()

The first method removed the extraneous path from all the module
names. The second method sorted all the entries according to the
standard module/line/name string that is printed (this is to comply
with the semantics of the old profiler). The third method printed out
all the statistics. You might try the following sort calls:

p.sort_stats('name')
p.print_stats()

The first call will actually sort the list by function name, and the
second call will print out the statistics. The following are some
interesting calls to experiment with:

p.sort_stats('cumulative').print_stats(10)

This sorts the profile by cumulative time in a function, and then only
prints the ten most significant lines. If you want to understand what
algorithms are taking time, the above line is what you would use.

If you were looking to see what functions were looping a lot, and
taking a lot of time, you would do:

p.sort_stats('time').print_stats(10)

to sort according to time spent within each function, and then print
the statistics for the top ten functions.

You might also try:

p.sort_stats('file').print_stats('__init__')

This will sort all the statistics by file name, and then print out
statistics for only the class init methods ('cause they are spelled
with "__init__" in them). As one final example, you could try:

p.sort_stats('time', 'cum').print_stats(.5, 'init')

This line sorts stats with a primary key of time, and a secondary key
of cumulative time, and then prints out some of the statistics. To be
specific, the list is first culled down to 50% (re: .5) of its
original size, then only lines containing "init" are maintained, and
that sub-sub-list is printed.

If you wondered what functions called the above functions, you could
now (p is still sorted according to the last criteria) do:

p.print_callers(.5, 'init')

and you would get a list of callers for each of the listed functions.

If you want more functionality, you're going to have to read the
manual (or guess) what the following functions do:

p.print_callees()
p.add('fooprof')

--Guido van Rossum, CWI, Amsterdam <Guido.van.Rossum@cwi.nl>
URL: <http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>