a buffered input stream class

Steven D. Majewski (sdm7g@elvis.med.virginia.edu)
Wed, 8 Sep 1993 15:54:04 -0400

This is a somewhat awkward example of an input stream class.

The first half was written for a practical purpose.
The main function I needed was the readuntil( delimiter )
method. I'm sure that some of you would just slurp in
the whole file and then process it as a string. Well -
I may need to process vary large files, I may need to
test it on smaller files on my 640K PC at home, but mostly,
I would like to keep a sequential flow to the processing.
Still, for that requirement, it could be simpler (and faster),
but I wanted to play around with python classes.

Which leads me to comment on the 2nd half - which was tacked
on later as an experiment. One result of the experiment was
discovering that my attempt at a "string index" for a sequence
didn't work : the index can only be an int!

Initially, I was going to add a version of readuntil where delimiter
could be a regexp. Then, after I started playing around with getslice,
I thought it would be neat to be able to write "stream[0:regexp]"
and have it return from the current position thru till the end of
the regexp. But now that I know stream[0:'\n'] won't work, I'll go
back to stream.readuntil( )

No effort has been made to optimize the reads or the copying
assignments, except some slight effort to avoid reading in the
whole file. ( And in one place I break that rule too - I was in
a hurry to finish and use it! )

But it's more fodder for discussion of Classes and Types in python.

It (mostly) works on posix.popen input pipes, but it doesn't yet
work exactly the way I want it to. That's 'cause it isn't yet
written to do what I want yet. ( I probably need to read with
a timeout, so that it will only read in what's available. I don't
know if I will pursure this - it's not necessary for me at the
moment. ) So you will hang on len( pipestream ), for example, if
the output is not complete. I would like to process only up to
what is immediately available.

But I *have* noticed an anomaly with how the read method handles EOF
from the terminal:

>>>sys.stdin.read( 2 )
\n
\n
returns => '\012\012'

>>>sys.stdin.read(2)
^D
returns immediately => ''

>>>sys.stdin.read(2)
\n
^D
\n
finally returns with => '\012\012'

Maybe this is a 'feature' of gnu-readline ?

And - while on the subject of 'read' - I was considering trying the new
array module if I make an attempt to optomize the istream module, and I
noticed that the read and write methods in the array module are
"backwards" compared to the other read methods:
array_obj.read( file_obj, count )
vs.
string = file_obj.read( nchars )

The reason this is so is clear. But maybe array_obj.read() should
be array_obj.readfrom() ? ( Am I pursuing a foolish consistency here? )

-- Steve Majewski

----------
#!/usr/local/bin/python
#
import string

numeric_types = ( type(0), type(0L), type(0.0) )

class IStream:
def __init__( self, source ):
self.buffer = []
self.indx = 0
self.source = source
self._bsize = 1024
def next( self ):
if ( self.indx == len( self.buffer )) :
self.buffer = self.source.read( self._bsize )
self.indx = 0
self.indx = self.indx + 1
return self.buffer[self.indx-1:self.indx]
def readuntil( self, delim ):
temp = ''
c = None
while( '' <> c <> delim ):
c = self.next()
temp = temp + c
return temp
def pushback( self, oldstring ):
self.buffer = oldstring + self.buffer[self.indx:]
self.indx = 0
return None
# note: all of the "sequential" methods will "sequentialize" the stream -
# i.e. set start of buffer to indx, and possibly read in more ( or the
# entire stream ).
def _normal_( self ):
if len(self.buffer) == 0 :
self.buffer = self.source.read( self._bsize )
self.indx = 0
elif ( self.indx <> 0 ):
self.buffer = self.buffer[self.indx:] + self.source.read( self._bsize )
self.indx = 0
def _readall_( self ):
self._normal_()
tmp = self.source.read( self._bsize )
while ( tmp <> '' ):
self.buffer = self.buffer + tmp
tmp = self.source.read( self._bsize )
def __len__( self ):
self._readall_()
return len( self.buffer )
def __getitem__( self, key ):
self._normal_()
if ( 0 <= key > len(self.buffer) ):
self.buffer = self.buffer + self.source.read( key - len(self.buffer) )
else:
self._readall_()
return self.buffer[key]
def _string_index_( self, k ):
if type(k) in numeric_types : return k
if type(k) <> type('') : raise IndexError
i = string.find( self.buffer, k )
if ( i < 0 ) :
self.__readall__() # this should be incremental, not READALL.
i = string.find( self.buffer )
if i < 0 : raise IndexError
return i

def __getslice__( self, i, j ):
self._normal_()
limit = len( self.buffer )
if type(i) == type('') : i = self._string_index_( i )
if type(j) == type('') : j = self._string_index_( j )
if ( i < 0 ) or ( j < 0 ):
self._readall_()
m = max( i, j )
if m > limit:
m = m - limit
self.buffer = self.buffer + self.source.read( m )
return self.buffer[i:j]


def goodtest( filename ):
S = IStream( open( filename, 'r' ) )
print 'S[0]: ', S[0]
print 'S[0:10]: ', S[0:10]
print 'for I in range( 1,20 ): S.next(): '
for i in range( 1, 20 ):
print S.next(),
for i in range( 1, 3 ):
print i,': S.readuntil(":")'
print S.readuntil(':')
x = S.readuntil( '\n' )
print 'x = readuntil( newline ): '
print x
S.pushback( x[-3:] )
print 'S.pushback( x[-3:] ); x = readuntil( newline ): '
x = S.readuntil( '\n' )
print x
print 'S[0:120]: ', S[0:120]
print 'S[-120:]: ', S[-120:]
return S

def alltest( filename ):
S = goodtest( filename )
print '\n*This test will *FAIL*... '
print 'S[0:\'n\']: ... '
print S[0:'\n']

import sys
if ( len(sys.argv) > 1 ) :
for F in sys.argv[1:] :
print ' Filename: ', F
alltest( F )