Python tries to hide the problems with threads as much as it can, but
on the one hand there are problems inherent to multi-threaded
programming that it can't (and shouldn't!) hide, and on the other hand
(unfortunately!) it doesn't get all the benefits.
The main benefit missed only becomes apparent on multi-CPU machines:
Python has a global "interpreter lock" which means that most of the
time only one thread at a time can proceed with a computation. All
other threads are waiting. This is fine if you only have a single CPU:
multi-threaded programming is then mostly used to allow one thread to
progress while another thread is waiting for I/O, either on the
network (e.g. a server waiting for a client or vice versa) or on a
disk (waiting for a seek operation to complete). However a benefit of
multiple CPUs is lost: a CPU-intensive calculation can't be speeded up
in Python by using multiple threads.
Fortunately, the majority of systems has only a single CPU; parallel
hardware is still a relative rarity, so there's still a lot of benefit
to be gained from multiple threads in Python.
Now to the problems of parallel programming. Although only one Python
thread can run at a time, Python has full pre-emptive scheduling (at
least to the extent the underlying OS has it): at any time, even in
the middle of calculating the value of an expression, the scheduler
can suspend the current thread and run any other thread that's ready
to run. Python's primitive operations are all atomic, but anything
more complicated is at risk. For instance, if two threads
simultaneously execute "a = a+1", where a is a global variable, a may
be incremented by one or by two depending on how the scheduler happens
to choose the threads for execution.
Thus, whenever there's shared writable state, there's a problem, and
all changes to such state should be protected by locks. Sounds simple
enough? Unfortunately, shared state is often implied by the semantics
of the library or even the interpreter. Since threads are an optional
feature, most library code written in Python doesn't use locks to
protect shared state. (All C code, on the other hand, is protected by
a lock, so thread problems become apparent as Python stack traces, not
as core dumps.)
[While I'm writing this, another mail from Jeff arrives...]
> Luckily, I do plenty of multithreaded programming on other platforms,
> so I'm familiar with the issues and pitfalls. And yep, you sure do
> have to be fully awake to write a correct program.
>
> What interests me is the python IMPLEMENTATION of threads. For example,
> Guido mentioned in passing that the "import" statement isn't
> thread-safe in python. That was news to me, and it got me wondering
> about the thread-safety of other features of the python language.
The particular problem I was referring to is the following. It's
easily avoided once you know about it. When you import a module
(written in Python) for the first time, the Python code in the module
is executed to initialize the module. As soon as the Python
interpreter is invoked (recursively), it allows other Python threads
to execute as well. Now, for a number of reasons (mostly to do with
modules mutually importing each other), a module is installed in
sys.modules before execution of the module's code is complete, and
when the import statement finds that the module to be imported already
exists in sys.modules, it immediately retrieves that module. Thus
when a second thread also imports the same module, it will find that
the module already exists, but since the module's initialization
hasn't completed yet, the use of functions imported from that module
may fail.
I don't think there's an easy fix in the interpreter (e.g., waiting
for the initialization to complete would break mutually recursive
imports). However the fix in Python code is trivial: if you put all
your imports at the (global) module level instead of inside functions,
all imported modules are initialized before the threads are started.
> And, yes, I'm disappointed that python doesn't have better support for true
> threads. I imagine it traces back to Guido's well-founded concern for
> platform independence. But I'm at the point in life where I'll write
> off operating systems that don't support threads.
Python threads are now supported on Solaris, IRIX, Windows NT, and
probably OS/2 (and anything supporting pthreads). What support is
missing in your eyes?
--Guido van Rossum, CWI, Amsterdam <mailto:guido@cwi.nl>
<http://www.cwi.nl/~guido/>