Jonathan Riehl
United Space Alliance
July 16, 1998
PyFront is a system for the conversion of Python modules into C extension modules. PyFront is related to the Paths static analysis tool for Python, but employs a higher fidelity data flow model. In building these higher fidelity models, PyFront bridges the gap between the interpreted Python language, and the compiled C language. The C extension modules generated by PyFront will provide faster execution times but identical results to the original Python source. PyFront has the potential to offer an intermediate, but automatic, step in the optimization of Python modules and routines.
PyFront is designed to be a system for increasing the Python language's utility as a rapid application development tool. Using PyFront, working Python code can be translated into faster C code. The primary objective of PyFront is to minimize or eliminate any overhead added to execution times due to the interpretation of Python byte code and the frame stack. Additionally, PyFront has been designed with the following objectives in mind:
PyFront was conceived after some discussion of a "Python compiler" took place on the USENET. The brunt of the discussion was over the feasibility of a static Python compiler. Near the end of the thread, some had concluded that the most feasible approach to Python translation and analysis was something that was not quite a compiler, but not an interpreter (termed then as a "space potato.") The implicit use of such a system would be for generating faster executables from Python code.
The PyFront system is also related to work done for the GRAD and Paths systems that were under development during PyFront's inception [Fly]. PyFront borrows many of the objectives in the previous section from GRAD. GRAD, or Grammar-based Rapid Application Development, uses language grammars to drive automatic interface generation for extending Python. Acting like GRAD in reverse, PyFront uses Python's grammar to automatically build C extension modules. The Paths test automation system, a testing tool and direct descendant of the GRAD system, was designed to perform the majority of the analysis chores that PyFront uses. While Paths builds lower fidelity models (computation is abstracted to a set of "ideal" operations, causing the model to lose some accuracy,) it also served as a proof of concept for static data flow analysis of Python code. What would be required was the application of Paths technology to build higher fidelity models. These high fidelity mod! els would then be used in the gene ration of similar or equivalent C code, with the added bonus of code optimization, a by-product of the data flow model employed [Weise, et. al.]
The primary methodology used in the development of PyFront consisted of analysis of the Python interpreter. Once the C basis of Python was understood, control and data flow models could be associated with Python operation. The most obvious model for control flow analysis centers around partitioning input code into basic blocks and building a control flow graph [Aho, et. al.] However, the data flow models used in prior systems would be more applicable to optimization, an intended use of the PyFront system. The following subsections discuss inquiry made into the Python interpreter's operation, as well various representations considered for use in Python translation.
Figure 1 illustrates the exact routines used by Python and the intermediate data structures generated during the first time execution of a Python module (during subsequent executions, the byte code object is saved and reused, saving the time spent parsing the code.) A system to generate C code would need to mimic these routines, first generating an abstract syntax tree, then determine the byte code for the given syntax tree, and finally generate code that would carry out the sequence of instructions represented in the byte code.
One particular aid in the analysis of the interpreter process is the exposure of some of these routines in the Python standard library. In an introspective fashion, the parse tree and the byte code of most Python constructs may be examined in the interpreter itself. Specifically, the parser module allows the generation of abstract syntax trees, while the dis module allows Python byte code objects to be "disassembled" and their constituent byte code to be viewed on an instruction by instruction basis.
Since Python is capable of building abstract syntax trees, and handling byte code objects directly, all that remains to be analyzed is the code in the byte code interpreter. The interpreter (located in the Python/ceval.c source module) runs as a loop, handling byte code instructions one at a time.
Like a modern CPU, the interpreter loop keeps an instruction pointer, and has exception and error logic that halts script execution when a problem occurs. The evaluation loop isolates each instruction as a literal byte value and then takes an action based on the byte code encountered. The actions taken are to be found in a switch statement that references the entire Python byte code set as individual case statements. From these case statements are calls to the Python API [Van Rossum], with arguments to the functions either to be found in a stack or as an additional set of byte codes, following the operation's coded value.
Looking at the Python world from "in the loop" shows that as long as the Python data elements (which are always PyObject pointers in the C code) and byte code arguments are managed correctly, there is no difference between running the byte codes and making the Python API calls. If one were to walk Python byte code and emit the code found in the Python evaluation loop for the given code, there would be no difference to Python. The Python API was designed to easily interface with C code, allowing extension modules to be written in the faster C language, compiled into shared libraries and then imported into the Python interpreter.
There is a catch to an "evaluation loop inlining" strategy. There are byte codes that jump to different locations in the byte code string, not just the next byte code. There is a byte code to return from a function, returning control to a higher level Python function or the interpreter itself. There is an op code that raises exceptions which break out of the evaluation loop. Exceptions may also occur after any given API call, and require an extension routine in C to stop what it is doing, deinitialize any data being used in the function, and return immediately.
In order to handle these problems, the inlining routine would simply have to keep track of where the jump and raise instructions were (essentially partitioning the code into a set of basic blocks,) and where their targets were. In cases where the function being converted has no try-finally or try-except constructs, the following procedures would apply:
The generated C code would take the interpreter logic and copy parts of it for each byte code instruction. It would only remove the necessity of regenerating and maintaining byte code (remember that this must now be handled by the developer using the inline processor and then a C compiler, which would be slower,) in addition to the minimal overhead of looping over the byte codes. Furthermore, it still ties us to the object stack, with inlined code pushing and popping values, unaware of the PyObject structures used by neighbor instructions (this is illustrated in figure 2 .)
Figure 2 shows various aspects of the data flow within a Python session, in addition to depicting the "evaluation loop inlining" process described above. The diagram also shows a third process called "symbolic execution." Simply put, symbolic execution is a bridge between interpreter and compiler. Symbolic execution acts like an interpreter for parsed code, but instead of using real data, data flow is captured and place holders for real information are substituted. These place holders and their interconnection through the various operations in code, constitute a data flow graph.
An initial benefit of symbolic execution would be the removal of the stack from the generated code. The stack would not be required since symbolic execution, in order to create data flow graphs, must simulate the values being pushed on and popped off the stack. The principles of symbolic execution may be viewed in the following process:
The above process is not trivial, but it may be more intuitively shown in the following example:
Figure 3 shows an illustration of three things: a sample function (1), a byte code disassembly of the function (2), and a data flow graph (3). For the purpose of brevity, the sample function is very simple. There are no jumps in the function, and there is only one operation. Shown in the second area is an actual "disassembly" of the byte code Python generates for the given function. The byte code was obtained using the dis.dis routine, as illustrated in the flow. The final window shows a data flow graph generated from symbolic execution of the byte code, with two operations, two constants, and a value output.
To begin symbolic execution in this example, the byte code string for the given routine is analyzed (a function's byte code string is found in the co_code attribute of the actual function object's func_code attribute, or `a.func_code.co_code' in the example.) In this case, the byte code has no jumps, exceptions, and only a single return at the end (actually there is a second return that is automatically placed at the end of a function's byte code that will return None, but since this code will never execute, it was omitted from the disassembly in figure 3 .) Since the execution order of the entire function is linear, the byte code shown is the only basic block to be considered.
Shown beside the byte code listing for the example function is a stack list. The stack shown shows the result of symbolic execution of each byte code instruction. The following would occur during symbolic execution of the byte code:
The data flow graph, with its inputs (downward pointed triangles) and outputs (upward pointed triangles) would then represent how data flows through the given function. At this point, the graph would be walked in a second pass of symbolic execution. The second pass would model value assignments and references. Ideally the second pass would extract such operations as `LOAD_NAME' and `STORE_NAME', by internally maintaining data flow aliases for stores, and building edges to these aliases for later load operations.
The end result of PyFront is a completed C extension module. Figure 4 shows the template used for the output extension code. The Python extension module is broken into four sections: the file prologue, the module functions, the method map, and the module initialization function. The models discussed in the methodology section are used to generate C code for module functions, and the initialization routine, while top level analysis is used to integrate the functions with the module name space and the method map.
The file prologue consists of a comment header, an include directive for the PyFront API, and a module constant section. The comment header simply identifies the source Python code, the version of PyFront used to generate the extension module, and the date of the conversion. PyFront generated C modules require that only the PyFront API header file is included. The PyFront API includes the Python API, and will be discussed later. Finally, a static array of Python object pointers is declared, and a constant initialization routine is generated.
In the Python interpreter, constants are instantiated during the byte code conversion process, and serialized as a part of the compiled module. PyFront initially generated C code that created constants during function initialization, and then destroyed the constants at the end of the function. The overhead was quickly determined to be undesirable, and constant handling was moved to the module level. All module constants encountered during PyFront's code analysis process are assigned an index in an array of Python object pointers. PyFront generates a static routine for instantiation of the constant array as a part of the module prologue. The constant handler routine is then called by the module initialization function.
After the module prologue, functions corresponding to function definitions in the source module are created. The C code for each function declares a set of Python object pointers, a call to the tuple argument handler, C code for the function, and code for handling various exit conditions. The function variables replace the object stack, the local variable array, and add the functionality of two registers. Then, the function arguments are loaded into local variables using the tuple parsing routine, and the function code is generated.
Function code is generated using symbolic execution of the byte code instructions. The byte code instructions may be divided into roughly four categories: control flow, processors, operations, and store operations. PyFront does not use the control flow op codes, but rather builds control flow models that are translated directly into C flow constructs. Such byte codes include the "JUMP_IF_FALSE" and "JUMP_FORWARD" instructions. The second category of instructions are processors. These op codes do not return a Python object, but rather return an integer error code. A good example of a processor operation is the "PRINT_ITEM" byte code. Next, there are operations. Operations will pop objects off the stack, perform a function on the popped objects, and push an object (or several) back on the stack. The "BINARY_ADD" instruction is an operation. Finally, there are store operations that provide interaction between the program name s! pace and the stack, or solely modi fy the stack. These operations include the load and store prefixed op codes and stack modifiers such as "POP_TOP."
During symbolic execution, instructions are dispatched to handler routines that simulate the instruction's stack actions, and build data flow representations. Currently, C code is generated at this phase of program analysis, generating roughly one function call in C for each instruction. Since several instructions make one or more Python API calls, and may involve special control logic, PyFront uses its own API. The PyFront API simply wraps the Python evaluation actions with some minor modifications made to stack dependent code, and includes the Python API. The PyFront API provides a single function for each Python op code, simplifying code generation, and reduces the size of the C extension code.
Error handling is done for each API call emitted, based on the type of the instruction. Processor instructions return a non-zero value to indicate an error, and are handled as follows:
if ((err = PyFront_Process (s0, ...)) != 0) goto exit_sx;
The goto is a reference to an exit label that is determined based on the simulated stack size. From the target exit handler on, the stack replacement variables are properly deallocated using the DECREF macro. Operation instructions return a Python object that must be non-NULL, hence a call to an operation instruction handler will appear as follows:
if ((x = PyFront_Op (s0, ...)) != NULL) goto exit_sx;
The object returned from the API call is placed first in a "register" variable, which is then moved to the stack simulator variables upon determination of a non-NULL result. Figure 5 shows an example of the C code generated by PyFront revision 0.4 for the demonstration code in figure 3 .
The current implementation of PyFront is in its fourth revision (0.4,) and consists of a 5,500 line Python to C code generator, and a 1,000 line C extension API. This version of PyFront was informally benchmarked against Python version 1.5 for performance in simple iterative loops and recursive loops. The benchmarks were run on an IBM 43P 604e at 166 Mhz with 64MB RAM under AIX 4.2.1. The benchmark functions were implemented in a single 29 line Python module and converted to a 804 line C file (about 27.7 times larger.) In this case, PyFront was in a debug mode that generates comments about the corresponding Python code line number and op code (similar to those shown in figure 5.) The extension module generated was compiled using IBM C Set++ with the same optimization level as was used for the Python interpreter. Times were measured using the Python time module and averaged over ten trials.
While loop performance was first tested, using the following routine:
def while_test (i): while i > 0: i = i - 1
Figure 6 shows the results of executing the while_test routine over a range of iteration magnitudes. The data for a thousand iterations shows a very large maximum time trial. The varied data may be the result of sensitivity to system resources. Both Python and PyFront call the PyNumber_FromLong API routine extensively in this test. At numbers beyond 100, for the Python build used, memory must be allocated for the new integer objects being created by PyNumber_FromLong. In greater iteration sizes, the variance disappears as the iteration time is larger than the memory allocation overhead. Ultimately, the while loop proves to be PyFront's strong point, converging at speeds over 2.5 times faster than Python.
The next loop tested was the for loop, which employed the following routine:
def for_test (i): y = 0 for x in xrange (0,i): y = y + 1
Figure 7 shows the results of running the for_test routine over the same number of iteration magnitudes used for the while loop. The PyFront implementation of "for" loop logic employs a comparable number of Python API calls to the interpreted version. The PyFront for loop shows a higher variance in timing data when compared to the while loop, and sometimes exceeds the Python measurements. Yet at higher iteration counts, the PyFront "for" loop converges at speeds roughly 1.6 times faster than Python.
The final test used was a recursive loop, that ran the following function:
def recursive_test (i): if (i > 0): recursive_test (i - 1)
The data taken for the recursive trials was not available for higher recursive call sizes due to a frame stack limitation built into Python. Figure 8 shows that PyFront currently introduces a larger function call overhead, running over two times slower than the Python recursion trials. Recursion exposes an implementation trade-off in the current version of PyFront. PyFront builds a Python code object to wrap the generated C extension functions. By generation of a native Python wrapper, function arguments (specifically list and keyword arguments) are handled correctly, and a frame object is built, providing proper traceback information and global name space resolution. It also creates code that effectively makes two function calls instead of one (one to the Python wrapper function, and one to the C extension function. Note that calls from PyFront generated code to other Python functions do not add this overhead. The overhead only occurs for calls to PyFront generated functions.)
Numerical results for the time benchmarks are available in the Benchmark Data section.
Since the current implementation of PyFront is only a prototype, future efforts will focus on building a system that fully meets the objectives of non-intrusion, compatibility, modularity, and optional typing. Specifically, the current prototype implements correct exception passing, but not exception handling (using the "try-except" and "try-finally" constructs.) The next goal is to increase the utility of PyFront as an optimization tool. Finally, as an enhancement to the optimization utility, a proposed typing system will be implemented and tested for compatibility and a low level of intrusion.
Completion of Python semantics will require that exception handling code be added to the extension modules generated. Complicating this objective, "return" and "break" statements may be interrupted by "finally" clauses before they exit their target subroutine or loop. Therefore, emission of "finally" clauses will require the code generated for "break" and "return" to jump to any "finally" clause rather than the exit point of a given loop or function. The "finally" exit point will require the use of an exit flag similar to the Python evaluation loop's "why" variable. The implications for symbolic execution mean that the frame stack must be simulated in addition to the value stack.
Once full compatibility is achieved, the second goal is to optimize the PyFront API and generated code. The current version of PyFront includes the use of local variables for the local name space (which was required to get loop times below Python 1.5 speeds,) and module wide constant initialization. Future optimizations may include constant folding, common sub-expression elimination, local variable aliases, and type based optimization. Constant folding would evaluate constant expressions, and generate the resulting constant instead of generating the Python API call. Common sub-expression elimination would use an operation hashing method to determine when a value may be reused, generating a temporary holding variable in the C code which would be used instead of performing redundant computation. Local variable aliases would simulate the local variable name space during symbolic execution, allowing any constant assignments to propagate through and be folded where appropri! ate.
Finally, type based optimizations will be prototyped in PyFront. The data flow graph built by PyFront provides a means for propagation of constant types, but would most likely be unable to consistently resolve the type of inputs and variables. PyFront would address the type resolution problem by providing a type extension scheme. The proposed type extension scheme would involve the insertion of a uniquely named dictionary at the top level of a module. The dictionary would map from regular expressions to strings containing type information. When a name is referenced during symbolic execution, the type system would attempt to match the label to one of the regular expression keys in the type dictionary. If type information was available, it would be used to select type specific operations, circumventing Python API calls. Otherwise, the standard Python API call would be emitted as already done by the current PyFront.
The PyFront type dictionary would offer the following features:
For example, Hungarian notation uses a "n" prefix for integers. In PyFront, adding a line such as "__pyf_type_dict = {`n[A-Za-z0-9_]*' : `int'}" would implement the integer prefix association. The prime drawback to this system (as any similar type system) is that legacy Python code would have to be modified to take advantage of type optimizations.
A final note about the type system: it could be used to express such concepts as static module variables (allowing global variables to be optimized in a similar fashion to local variables,) or even constance (allowing global constant assignments to be treated like macros.) If these type extensions were employed, recursive calls and calls to other functions in the module could be replaced by direct calls to the C function, eliminating the recursive call overhead seen in the Results section.
By translating Python control flow into C control flow, replacing the Python value stack with C variables, and inlining Python API calls normally made by the Python interpreter, PyFront achieves modest increases in Python script performance. While certain design trade-offs in the current implementation add overhead to calling converted functions, they are also needed to assist PyFront with its goal of compatibility. As these trade-offs are weighed and future plans are implemented, PyFront performance should continue to improve. Further development of such features as the proposed type system will enable PyFront to provide a non-intrusive, and mostly automatic path from Python modules to faster and statically compiled C code.
Aho, A.V., Sethi, R., and Ulman, J.D. Compilers: Principles, Techniques, and Tools. Computer Science Series. Reading, Massachusetts: Addison-Wesley, 1986.
Fly, Charles. "Grammar-based Rapid Application Development." http://www.python.org/workshops/1996-11/papers/GRAD/html/GRADpaper.book.html
Simonyi, Charles. "Program Identifier Naming Conventions." http://www.strangecreations.com/library/c/naming.txt
Van Rossum, Guido. Python/C API Reference Manual. Reston, Virginia: Corporation for National Research Initiatives, December 31, 1997.
Weise, D., Crew R., Ernst, M., and Steensguard, B. "Value Dependence Graphs: Representation Without Taxation." In Principles of Programming Languages, pages 297-310. Portland, Oregon: ACM Press, January 1994.