The advantage of wrapping things into a function

Guido.van.Rossum@cwi.nl
Mon, 07 Mar 1994 15:26:01 +0100

> I noticed over and over, here and in the manual, this tendency to
> write everything as one big function and then call it - now I see
> the answer - optimization of local variables. Can someone tell me
> exactly what this means (what precisely is the difference in the way
> the memory is allocated between calling it as a main() function or
> just executing the statements?)

Some background: Python has a "compiler" which turns Python statements
into instructions for the "Python virtual machine". The Python
virtual machine is a high-level abstract stack machine which knows
about Python objects and operations. Compilation is done
automatically, for all Python code, and does not generate real machine
language. The result of the compilation for a module is saved on the
".pyc" file for that module, in order to save parsing and compilation
time the next time the module is imported (by a different process --
importing the same module twice in the same process only increments
the reference count to the already imported module).

The Python compiler currently performs only very limited
optimizations, the most important one being that for functions, local
variables (including arguments) are treated specially if at all
possible. The default, unoptimized model for variables is that per
(dynamic instantiation of a) scope there is a dictionary (a standard
Python object!) which holds the variables defined in that scope.

E.g. if I have a module containing the following statements:

import sys
limit = 1000
def factorial(n):
if n <= 1: return 1
else: return n * factorial(1)

then this module's dictionary has a contents like the following:

{'sys': <module 'sys'>, \
'factorial': <function factorial at 100fb23c>, \
'limit': 1000}

This means that a statement like

limit = limit + 1

later in the module requires the interpreter to look up the key
'limit' in the dictionary, add one to it, and place the resulting
value back in the dictionary under the key 'limit'. In total this
means two dictionary lookup operations, which probably takes more time
than the actual addition, and will take more time if there are more
variables in the scope (even though dictionaries are implemented using
hash tables).

For functions, the optimizer analyzes the entire function body of the
function, and determines the names of the local variables. This is
done by looking at all assignment statements (and statements that
imply a local name binding, such as "import" and class and function
definitions). It then assigns numbers to the variables and instructs
the interpreter to load and store the variables in an array nidexed by
this number. The array is allocated with the proper length when the
function is called. Thus local variable accesses in functions don't
require dictionary lookups but only array indexing, which is much
faster and independent of the size of the array.

In some cases, this names of the local variables cannot be determined:
if a function contains an "exec" statement or "from <module> import *"
then the complete set of variables is only known at runtime. Theses
situations are detected by the optimizer and then the dictionary
approach is used. (There are some subtle semantic differences -- I
leave it to Steve Majewski to point these out. :-)

The same kind of optimization cannot be performed for modules, since
a module can have global variables added dynamically from other
modules -- e.g.

import foo
foo.bar = 12

adds the variable 'bar' to module 'foo'. This is a feature.

A totally different reason why it's often a good idea to place
(almost) all code inside a function is that sooner or later, when it
grows, you will want to split it up in multiple functions anyway. If
it's already a function, you don't have to fiddle the indentation.

Actually, that sounds like a lousy excuse. One think I like about
functions is that you can use top-down programming using forward
references to functions, as in the following:

import string, sys

def main():
for file in sys.argv[1:]:
process(file)

def process(file):
try:
fp = open(file, 'r')
except IOError, msg:
print "%s: can't open (%s)" % (file, str(msg))
return
lineno = 0
for line in fp.readlines():
lineno = lineno + 1
handle(line, lineno, file)

def handle(line, lineno, file):
if string.find(string.lower(line), '__') >= 0:
print "Dubious code found at line", lineno,
print "of file", file
print "Contents:", string.strip(line)

main()

--Guido van Rossum, CWI, Amsterdam <Guido.van.Rossum@cwi.nl>
URL: <http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>