speeding up python

Doug Moen (dmoen@io.org)
Thu, 16 Dec 93 07:52 WET

Python is a beautiful language, but it's too slow. Sending a message to an
instance is *really* slow, so that people who use a lot of data abstraction
are heavily penalized. We can do better.

This article is basically a brain dump of strategies for speeding up Python.

First, here are the language changes:

1. Variables must be declared before they are used. Suggested syntax:
var x
var y = 2
Declarations are required for local variables, for (module level) global
variables, and for instance variables in classes.

2. Module level variables (module attributes) can neither be created nor
destroyed once a module has completed initialization.

3. By default, function definitions and class definitions create *read only*
bindings. Since these names cannot be reassigned, we can better optimize
references to classes and functions from within the module where they are
defined.

4. We introduce named constants, which can't be reassigned. Suggested
syntax:
const pi = 3.1416
References to named constants from within the module where they are defined
are very efficient (we can even do constant folding in expressions where
they occur).

5. Classes are immutable. The set of attributes of an instance is fixed
at instance creation time. Instance variables can neither be created nor
destroyed (which is why they must be explicitly declared in the class).

Now, here are the performance optimizations that these changes make possible.

a. Once a module has been initialized, a code optimization pass is run that
resolves all variable references, which means that no dictionary lookups
are required for variable refs. Calls to primitive functions like 'range'
can be optimized. Eg, efficient code can be emitted for:
for i in range(1, 10)
References to read-only bindings (like consts, function names, class names)
are further optimized so that a pointer to the object being named is
inserted directly into the code as an immediate operand. Constant folding
is performed when possible.

Another way to implement a subset of these optimizations is to do lazy
optimization at run time. A 'global variable reference' opcode (which
requires a dictionary lookup) patches itself at runtime, replacing
itself with an alternative opcode that loads or stores a fixed slot in
an array that represents all of the module's variables; this patching
only occurs if the instruction is executed after the module has finished
initializing itself.

b. Classes & instances are reimplemented. An instance is represented by a
block of memory containing a pointer to a class followed by a fixed size
array of instance variables. This makes instances as storage efficient
as tuples.

c. Since classes are immutable, we can cache superclass information in
subclasses to speed up method calls. A class contains a single dictionary
which lists *all* of the attributes of its instances: this dictionary lists
instance variable names, method names, and it lists all inherited
attributes. An instance attribute lookup requires a *single* dictionary
lookup, instead of a search through a chain of dictionaries.

Some other performance optimizations:

A. A new opcode is introduced for expressions of the form
foo.bar(x)
Right now, if 'foo' is an instance, then
foo.bar
is evaluated to create a method object, and this temporary object is
then called with the argument list, and thrown away. The new 'attribute
call' opcode eliminates the need to create a temporary method object.

B. An attribute call instruction contains some extra space that is used to hold
a pointer to the 'class' of the last object referenced by this instruction,
and a pointer to the 'method' that was found by dictionary lookup in the
'class' the last time the attribute call instruction was executed. These
'class' and 'method' slots are a cache that is used to avoid dictionary
lookups if the 'class' of the object in the attribute call remains the same
on successive executions of the same attribute call instruction. This
technique has been used to obtain significant performance improvements in
several implementations of Smalltalk (I can provide references on request).
I think it can be successfully applied to Python too, particularly if we
unify classes with types, and instances with objects, and adopt a more
uniform, dictionary based internal mechanism for looking up attributes
within objects.

C. We put a state of the art garbage collector into Python. Reference
counting is the *slowest known* technique for automatic storage reclamation,
and cannot deal with circular references. By contrast, modern storage
management algorithms provide allocation and deallocation that is much
faster than malloc and free. Several of the superstars in the GC world
(including Hans Boehm and Paul Wilson) are giving away unencumbered
implementations of high quality garbage collectors. [Guido has expressed
concerns about the portability of Boehm's 'conservative' garbage collector.
We can maintain a high degree of portability by having the storage manager
carve up large blocks obtained using malloc.]

The language changes I am proposing can be justified on other bases as well.

Mandatory variable declarations in modules & classes makes it easier to read,
understand and maintain large Python programs.

Mandatory declarations of local variables provides a stepping stone to
supporting genuine, Scheme-like nested scopes. I want nested functions to
have access to the local variables of the enclosing function. This meshes
very nicely with the new 'lambda' keyword, and the new higher order functions
like 'map', which encourage people to write nested functions.

Named constants (including read-only function and class names) provide
module implementors with more control over the interface that is exported
to clients. They eliminate a class of bugs, typified by the bugs that
get introduced in math code when 'pi' is reassigned.