Re: re-creating objects and security ( & "safe" Python )

Steven D. Majewski (sdm7g@elvis.med.virginia.edu)
Wed, 31 Aug 1994 17:19:33 -0400

On Aug 31, 19:07, Dave Brennan wrote:
>
> I'm considering storing data that needs to be used in a Python
> program as eval-able or marshalled Python objects. The only
> problem is that I need to make sure that when the objects are
> constructed from the store that no methods are called. This
> prevents people from sticking virus code in the data files.
>
> Is this possible with the current version of Python? If not
> is it possibly an easy change to have a boolean flag that would
> make Python raise an error on an attempt to call a method?
>

> [ ... ] as eval-able or marshalled Python objects.

unmarshalling ( marshal.load[s] ) a object doesn't execute any code.
( except for marshal.load[s] ). So that is no problem.

( i.e. marshal.loads( marshal.dumps( [1,2,3] )) just gives you [1,2,3].
marshal.loads( code-object ) gives you the code object - it doesn't
execute anything. 'marshal.loads( anything )' is safe, but
'exec marshal.loads( anything )' is dangerous. )

However, not every Python object can be dumped (or loaded) with marshal.
Numbers, strings, and some lists and tuples can.
Classes and instances and functions, for example, can't.
( and: x = [1] ; x.append(x); marshal.dumps(x); for example, dies! )

[ Here I go again, Guido - I guess it's about time for me to write this
up and submit it to the Python FAQ! ;-) ]

So, how do you save unmarshal-able objects ?

The .pyc file actually contain a single marshaled code object which
is executed to create the defined functions, classes, and other objects
defined in a module.

So now we are away from safe marshalling to unsafe exec-ing.

However, note that what is in the .pyc file is a code-object, not
a function. In execing a code object, you can optionally indicate
the namespace the code should be executed in. ( If you don't expressly
indicate the namespace, the current scope is used. )

It would be nice to be able to use that namespace control to limit
the side effects of possibly unsafe modules. However:
(1) in Python's three level scoping, the __builtin__ module is
always at the root, and it contains open(), and lots of other
possibly unsafe functions.
(2) And there's no restriction on 'import', or modifying sys.path,
so anything is potentially accessable. [ and if you try to
mask open() with another definition, they can still do:
'import __builtin__; __builtin__.open( ... )' ]

I've mostly ignored 'access' in this discussion, because I don't recall
in detail how exactly it is proposed to ( or does currently ) work.
But that certainly seems like a likely place for hooks to restrict
function.

[ The reference manual states that 'access' is a reserved work for
the parser whose syntax and semantics are currently undefined.
Actually, there is an (incomplete?) implementation of 'access'
in Python now - so what that note really means is that Guido is
not guaranteeing that future syntax and semantics will be at all
compatible with what's implemented now. ]

Allowing another module than __builtin__ to be the root of the
default namespace would be another useful safety feature.
( A fourth optional argument to 'exec' ? - or a way to save,
modify, and restore __builtin__ ? I think it was a goal of
the implementation that you couldn't accidently step on or
wipe out __builtin__, so the it was impossible to loose the
base functionality of the language. But for 'safe python'
that's the wrong goal. )

Python byte code has a nice regular format, so it would be
simple to scan a code object for IMPORT_NAME & IMPORT_FROM
opcodes, for example. Excluding EXEC_STMT would also be a
good idea. But restricting BINARY_CALL, UNARY_CALL ( any
other unsafe ones ? ) would be too restrictive for a general
"safe" application, although it might work for your case.

( Or a modified and restricted version of eval_code that
raises an exception for unsafe instructions. That might
be better than a flag that changes it's behaviour. The
problem with a flag is how do you restrict access to the
flag. A 'safe_eval_code' would recursively call itself
instead of (unsafe) 'eval_code', and return to the procedure
that called safe_eval_code would restore the full function
eval. )

The proposal for a meta-object protocol for Python ( which
someone mentioned, could be used to implement 'access' more
generally and extendable ) sounds like a great idea(*). But how
do we ensure that this powerful construct isn't used to produce
more powerful "backdoors" ?

[(*) If I've been conspicuous by my absense in that thread, it's
because I *thought* I had discovered a "fatal flaw" in the idea,
and then, in the middle of writing a followup, discoverd the fatal
flaw in my analysis of the idea - "Nevermind!". So I have to reread
some of that thread, but I'm all for the general concept. Glad to
hear that Gregor's talk is still resonating in peoples heads! ]

- Steve Majewski (804-982-0831) <sdm7g@Virginia.EDU>
- UVA Department of Molecular Physiology and Biological Physics