python byte code disassembler - example of a module using redirect

Steven D. Majewski (sdm7g@elvis.med.Virginia.EDU)
Tue, 29 Mar 1994 19:51:11 GMT

And - as an example ( and in a further effort to raise the
comp.lang.python content of this group ), here is an
example of the use of redirect.tolines().

Python statements and functions compile into a byte-code that is
interpreted by the Python-interpreter virtual machine.
[ When a source code module is imported for the first time, a
cache file of compiled byte-code is written as module.pyc.
'import' checks the date of this file against the source file
to determine if the source file needs to be read in and the
cache file rewritten. ]
I have been looking at the possibility of applying further
optimizations to the byte-code of the compiler.

[ Looking at the code gives the impression that there are plenty
of standard optimization techniques that can be applied, but
python is a very dynamic object oriented language, and not only
does it have the usual method lookup problems of a language like
smalltalk, but it is further complicated by the fact that bindings
that don't usually change, or not disallowed to change. It is
possible ( though I haven't seen it used ) for objects to change their
class and/or methods on the fly. I don't think that there is even a
firm guarantee that object.method does NOT either rebind method or
even rebind object before it's return. This would be bizarre
obfuscated Python, not not, as far as I can tell, illegal.
That is one reason I'm considering a post compiler optimizer separate
from the interpreter's compiler. It may be possible to be more
agressive if other constraints/promises about the code can be made.
Adding optional declarations to function arguments may be another
possibility, but I'm not sure how much that can win, in the absense
of other dynamic restrictions. ]

There is a byte-code disassembler in the standard python library,
but it appears to have been written before some other internal
changes were made that make it easier to get at the corresponding
source code lines. I didn't want to try to figure out how to
modify the existing disassembler to insert source code lines, so
I used redirect.tolines to grab the output from the python/lib/dis.py
disassembler, and whereever there is a LINENO code, I output the
corresponding source code lines ( using another module from python/lib
- linecache.py to fetch the line. )

import dis
from linecache import getline
import string
from redirect import tolines

def disfun( func ):
disco( func.func_code )

def disco( fcode ):
# disco( function.func_code )
# disassemble code with source statements.
print fcode
for m in fcode.__members__ :
if m == 'co_code' :
print m+':',
for icode in getattr( fcode, m ):
print ord(icode),
print
else: print m+':', getattr( fcode, m )
print
# capture the output from dis.disco ...
for line in tolines( dis.disco, fcode ):
field = string.split( line )
if 'SET_LINENO' in field:
# and if it's a SET_LINENO, print the source line first
print
lineno = string.atoi( field[-1] )
print '|', getline( fcode.co_filename, lineno )[:-1],
print '\t #',lineno,'\n'
if field: print line,

And if you are interested in the internals, here is the disassembled
output of disco:
( dumped to a file with tofile( "disco.tmp", disfun, disco )" )

<code object disco at 200768b8, file "../Lib/mydis.py", line 10>
co_code: 127 10 0 123 9 0 94 1 0 125 0 0 127 13 0 124 0 0 71 72 127 14 0 120 132
0 124 0 0 105 1 0 100 1 0 127 14 0 114 116 0 125 1 0 127 15 0 124 1 0 100
2 0 106 2 0 111 68 0 1 127 16 0 124 1 0 100 3 0 23 71 127 17 0 120 43 0 101
3 0 124 0 0 124 1 0 102 2 0 26 100 1 0 127 17 0 114 20 0 125 2 0 127 18 0
101 5 0 124 2 0 102 1 0 26 71 113 93 0 87 127 19 0 72 110 27 0 1 127 20 0
124 1 0 100 3 0 23 71 101 3 0 124 0 0 124 1 0 102 2 0 26 71 72 113 35 0 87
127 21 0 72 127 23 0 120 165 0 101 6 0 101 7 0 105 8 0 124 0 0 102 2 0 26
100 1 0 127 23 0 114 139 0 125 3 0 127 24 0 101 10 0 105 11 0 124 3 0 102
1 0 26 125 4 0 127 25 0 100 4 0 124 4 0 106 6 0 111 77 0 1 127 27 0 72 127
28 0 101 10 0 105 13 0 124 4 0 100 5 0 11 25 102 1 0 26 125 5 0 127 29 0 100
6 0 71 101 15 0 124 0 0 105 16 0 124 5 0 102 2 0 26 100 5 0 11 32 71 127 30
0 100 7 0 71 124 5 0 71 100 8 0 71 72 110 1 0 1 127 31 0 124 4 0 111 11 0 1
127 31 0 124 3 0 71 110 1 0 1 113 187 0 87 100 0 0 83
co_consts: [None, 0, 'co_code', ':', 'SET_LINENO', 1, '|', '\011 #', '\012',
{'m': 1, 'line': 3, 'field': 4, 'lineno': 5, 'fcode': 0, 'icode': 2}]
co_filename: ../Lib/mydis.py
co_name: disco
co_names: ['fcode', '__members__', 'm', 'getattr', 'icode', 'ord', 'tolines',
'dis', 'disco', 'line', 'string', 'split', 'field', 'atoi', 'lineno',
'getline', 'co_filename']

| def disco( fcode ): # 10

0 SET_LINENO 10
3 RESERVE_FAST 9 ({'m': 1, 'line': 3, 'field': 4, 'lineno': 5, 'fcode': 0, 'icode': 2})
6 UNPACK_ARG 1
9 STORE_FAST 0

| print fcode # 13

12 SET_LINENO 13
15 LOAD_FAST 0
18 PRINT_ITEM
19 PRINT_NEWLINE

| for m in fcode.__members__ : # 14

20 SET_LINENO 14
23 SETUP_LOOP 132 (to 158)
26 LOAD_FAST 0
29 LOAD_ATTR 1 (__members__)
32 LOAD_CONST 1 (0)

| for m in fcode.__members__ : # 14

>> 35 SET_LINENO 14
38 FOR_LOOP 116 (to 157)
41 STORE_FAST 1

| if m == 'co_code' : # 15

44 SET_LINENO 15
47 LOAD_FAST 1
50 LOAD_CONST 2 ('co_code')
53 COMPARE_OP 2
56 JUMP_IF_FALSE 68 (to 127)
59 POP_TOP

| print m+':', # 16

60 SET_LINENO 16
63 LOAD_FAST 1
66 LOAD_CONST 3 (':')
69 BINARY_ADD
70 PRINT_ITEM

| for icode in getattr( fcode, m ): # 17

71 SET_LINENO 17
74 SETUP_LOOP 43 (to 120)
77 LOAD_NAME 3 (getattr)
80 LOAD_FAST 0
83 LOAD_FAST 1
86 BUILD_TUPLE 2
89 BINARY_CALL
90 LOAD_CONST 1 (0)

| for icode in getattr( fcode, m ): # 17

>> 93 SET_LINENO 17
96 FOR_LOOP 20 (to 119)
99 STORE_FAST 2

| print ord(icode), # 18

102 SET_LINENO 18
105 LOAD_NAME 5 (ord)
108 LOAD_FAST 2
111 BUILD_TUPLE 1
114 BINARY_CALL
115 PRINT_ITEM
116 JUMP_ABSOLUTE 93
>> 119 POP_BLOCK

| print # 19

>> 120 SET_LINENO 19
123 PRINT_NEWLINE
124 JUMP_FORWARD 27 (to 154)
>> 127 POP_TOP

| else: print m+':', getattr( fcode, m ) # 20

128 SET_LINENO 20
131 LOAD_FAST 1
134 LOAD_CONST 3 (':')
137 BINARY_ADD
138 PRINT_ITEM
139 LOAD_NAME 3 (getattr)
142 LOAD_FAST 0
145 LOAD_FAST 1
148 BUILD_TUPLE 2
151 BINARY_CALL
152 PRINT_ITEM
153 PRINT_NEWLINE
>> 154 JUMP_ABSOLUTE 35
>> 157 POP_BLOCK

| print # 21

>> 158 SET_LINENO 21
161 PRINT_NEWLINE

| for line in tolines( dis.disco, fcode ): # 23

162 SET_LINENO 23
165 SETUP_LOOP 165 (to 333)
168 LOAD_NAME 6 (tolines)
171 LOAD_NAME 7 (dis)
174 LOAD_ATTR 8 (disco)
177 LOAD_FAST 0
180 BUILD_TUPLE 2
183 BINARY_CALL
184 LOAD_CONST 1 (0)

| for line in tolines( dis.disco, fcode ): # 23

>> 187 SET_LINENO 23
190 FOR_LOOP 139 (to 332)
193 STORE_FAST 3

| field = string.split( line ) # 24

196 SET_LINENO 24
199 LOAD_NAME 10 (string)
202 LOAD_ATTR 11 (split)
205 LOAD_FAST 3
208 BUILD_TUPLE 1
211 BINARY_CALL
212 STORE_FAST 4

| if 'SET_LINENO' in field: # 25

215 SET_LINENO 25
218 LOAD_CONST 4 ('SET_LINENO')
221 LOAD_FAST 4
224 COMPARE_OP 6
227 JUMP_IF_FALSE 77 (to 307)
230 POP_TOP

| print # 27

231 SET_LINENO 27
234 PRINT_NEWLINE

| lineno = string.atoi( field[-1] ) # 28

235 SET_LINENO 28
238 LOAD_NAME 10 (string)
241 LOAD_ATTR 13 (atoi)
244 LOAD_FAST 4
247 LOAD_CONST 5 (1)
250 UNARY_NEGATIVE
251 BINARY_SUBSCR
252 BUILD_TUPLE 1
255 BINARY_CALL
256 STORE_FAST 5

| print '|', getline( fcode.co_filename, lineno )[:-1], # 29

259 SET_LINENO 29
262 LOAD_CONST 6 ('|')
265 PRINT_ITEM
266 LOAD_NAME 15 (getline)
269 LOAD_FAST 0
272 LOAD_ATTR 16 (co_filename)
275 LOAD_FAST 5
278 BUILD_TUPLE 2
281 BINARY_CALL
282 LOAD_CONST 5 (1)
285 UNARY_NEGATIVE
286 SLICE+2
287 PRINT_ITEM

| print '\t #',lineno,'\n' # 30

288 SET_LINENO 30
291 LOAD_CONST 7 ('\011 #')
294 PRINT_ITEM
295 LOAD_FAST 5
298 PRINT_ITEM
299 LOAD_CONST 8 ('\012')
302 PRINT_ITEM
303 PRINT_NEWLINE
304 JUMP_FORWARD 1 (to 308)
>> 307 POP_TOP

| if field: print line, # 31

>> 308 SET_LINENO 31
311 LOAD_FAST 4
314 JUMP_IF_FALSE 11 (to 328)
317 POP_TOP

| if field: print line, # 31

318 SET_LINENO 31
321 LOAD_FAST 3
324 PRINT_ITEM
325 JUMP_FORWARD 1 (to 329)
>> 328 POP_TOP
>> 329 JUMP_ABSOLUTE 187
>> 332 POP_BLOCK
>> 333 LOAD_CONST 0 (None)
336 RETURN_VALUE

- Steve Majewski (804-982-0831) <sdm7g@Virginia.EDU>
- UVA Department of Molecular Physiology and Biological Physics