Re: Why are intepreters so slow today

Wayne Throop (throopw%sheol@concert.net)
Sat, 16 Apr 1994 18:14:32 GMT

: From: nagle@netcom.com (John Nagle)
: I need something that can do computation reasonably fast, say no worse
: than 1/10 of the speed of compiled C code. Interpreters have been
: written in that speed range quite often in the past. [.. But when
: interpreters commonly available now are tested, they ..] are slower than
: the C version by factors of greater than 1000. This is excessive.

I think there's a very good reason why such interpreters don't
exist anymore. Their "niche" has eroded away to almost nothing.

Consider: the language families most often implemented in interpreters
nowdays, like the awk family, lisp, perl, python, smalltalk, and so on,
most often provide things like dynamic typing, with automatic
string/numeric conversions, bignums, generic types, and so on. Even
*compiled* code in such languages is generally 1/3 the speed of
floating-point-crunching-in-C, because of all the bounds, type,
conversion, and storage bookkeeping going on that is generally absent in
C (for small-looping-benchmarks anyhows).

Thus, if you are going for speed, it is *very* hard to get within 1/10
the speed of C without going to a bytecoded interpreter, or at least a
threaded interpreter, and probably some pretty sophisticated
translations to get from the textual form of the language to the
interpreted code. It's very hard to get fine-grain performance of
within 1/10 when the stored form of the program is as easy to produce as
a parse tree. As long as you need to go to the trouble to produce a
linear istream anyways, go the rest of the way and *compile* the silly
thing.

The niche for such interpreters used to be a much larger one. Back when
computers didn't have floating point units, and even
more-than-16(or-even-8) bit binary integers were a library call away
(especially for multiplication or division), even the best compilation
didn't give much better results than a threaded interpreter. Once you
linearized the istream into bytecodes, there wasn't as large a gain
to be had by the extra work of native compilation, and further, the
extra work was harder, and involved writing a set of runtime routines
almost as big as a clever interpreter anyways.

Thus, numerical calculations just aren't what interpreters are good at
nowadays. They are as fast as they need to be, but if you want speed,
you go to a compiled strategy.

Not that todays interpreters are to be sneezed at. Simple string
processing operations in (say) perl can be as fast or even *faster*
than typical C programs to do similar things. This is because manipulating
strings and such is not built into the hardware (just as arithmetic
operations were not, once upon a time), and hence the
niche between interpreting and compiling such operations is still large
enough to make interpretation speed-competitive. Even tcl, a notoriously
slow interpreter, can be speed-competitive with C programs for X windows
operations, because the primitive operations (the widgets, essentially)
are doing all the work.

For a quick example, consider an oosp expression to concatenate
together 1000 1000-character strings, and then print the length
of the result, compared to a C program to do the same thing:

length
for set'i'0 int< val'i'1000 set'i + val'i'1
right'1000 ""

#include <stdio.h>
#include <string.h>
#include <malloc.h>
int main(void) {
char *p=malloc((size_t)1000001);
char *q=malloc((size_t)1001);
int i;
sprintf( p, "" );
for( i=0; i<1000; i += 1 ){
sprintf( q, "%1000s", "" );
strcat( p, q );
}
printf( "%d\n", strlen(p) );
return 0;
}

This naively coded C program (as sohpisticates probably realize right off)
has N^2 behavior. The oosp expression was 1000 times *faster* than the
compiled C code. Even when one fixes the strcat to eliminate the
N^2 problem:

#include <stdio.h>
#include <string.h>
#include <malloc.h>
int main(void) {
char *p=malloc((size_t)1000001), *r=p;
char *q=malloc((size_t)1001);
int i;
sprintf( p, "" );
for( i=0; i<1000; i += 1 ){
sprintf( q, "%1000s", "" );
strcat( r, q ); r += 1000;
}
printf( "%d\n", strlen(p) );
return 0;
}

the oosp code was *still* within a factor of 3 of the speed of the C.
Thus, it isn't just interpreters written in the "good old days" that
were within 1/10 of the speed of compiled code. This is still true
today. It's just that a typical target application for interpreters is
different today. Within their target applications, interpreters are
still competitive.

The bottom line is, if you want within 1/10 of the speed of compiled
code for arithmetic calculations nowadays, you need a *very* carefully
crafted interpreter (say, a forth threaded interpreter, which might get
you pretty close to 1/10, sometimes better on older hardware), or you
need to compile those portions of your code that need to run fast (say,
by developing in interpreted Scheme, at a 20 or 100 -to-1 speed hit, and
then compile for a 3-to-1 production version).

--
Wayne Throop   throopw%sheol@concert.net
               throop@aur.alcatel.com