Re: Ideas about enhancements to fileobjects

Guido.van.Rossum@cwi.nl
Tue, 23 Nov 1993 15:59:37 +0100

(Key: "> >>" is me in response to Tracy Tims' suggestions for changes
to file objects. ">" Is J. Redford in response to me.)

> >> filename: good. It's saved already so should be accessible.
>
> There are no guarantees that this file has not been moved, or removed,
> replaced with another file or link, or otherwise modified.
>
> This is data you had when you made the file. I think if you want it
> around you should keep it from then. Pass around a tuple with the file
> name & the file object, dont try to put the name & other application
> data into the object. Next someone will want the address & port of a
> socket to be part of the object.

Well actually the file object *already* keeps the filename around (for
repr(file)) but it doen't make it accessible. Of course there are no
guarantees that reopening the name will give you the same name but
that's well understood. It's just a convenience which means that when
you are passing a file to a subroutine and the subroutine finds a
problem with the file, you don't have to also pass it the filename
(which I can tell you happens a lot in code that passes files around).

> >> lineno: useful, but has one problem: it can't always be correct.
> >> Keeping it up-to-date after read() is possible but may slow read() of
> >> large files down a bit; keeping it up-to-date after seek() is
> >> (realistically speaking) impossible. And a minor detail: should it
> >> represent the number of lines read so far or the number of the next
> >> line?
> >>
> >> Suggestion: make it a writable attribute, initialized to 0; set to -1
> >> by seek(); if it's -1, it's left unchanged by read() and readline();
> >> if >= 0, readline() bumps it by 1, read() bumps it by the number of \n
> >> characters in the string read. Well, why not do the same for
> >> writeline() and write()... Finally, initialize it to -1 when the file
> >> is opened with mode 'rb' or 'wb'. I suggest that the filename be made
> >> a writable attribute as well -- might be useful to cheat etc.
>
> This is comepletely untrustable. If you want to count the number of
> '\n's you have read, thats fine, but that dosent prevent someone from
> inserting more into the top of the file. If you want a number that
> equals the number of times you call readline(), thats easy enough to
> keep on your own.

Of course you can't trust it as the state of the universe, but neither
is a count you maintain separately, and if the semantics are
understood, it's a convenience that can often avoid having to pass a
more complicated object than a file around.

> >> peek functions: I'm less convinced that this is worth the additional
> >> complexity -- and I've a feeling that it might encourage bad style (oh
> >> there he goes again I hear some of you thinking :-). On the other
> >> hand it might be a good idea. I've a suggestion for a slightly
> >> different style of interface: f.peekline() would return the next
> >> unpeeked line and f.peekline(n) would return the n'th line (counting
> >> from 0, obviously). I don't see when f.peekreset() would be necessary
> >> -- for definiteness, code should always use f.peekline(n) if there may
> >> be different pieces of code peeking in the same file. Maybe
> >> f.peekline() should mean f.peekline(n+1) when called after
> >> f.peekline(n) if I understand correctly how you would use this most of
> >> the time.
>
> I dont think this has any redeeming aspect. 'peek' semantics are not
> gaurenteed past 1 character. Peeking a regular file makes no sense. If
> you want to read the next 2 lines then seek back to where you are, do
> that. Or open 2 file descriptors & use one for read ahead. Using these
> peek function on a file that represented a socket would be a minor
> nightmare, as it would break any other dup'd readers of the socket.

Well that's why I have some difficulties with it myself. I also have
a feeling that it wouldn't be very expensive to implement as a Python
class which would act as a wrapper around a file object and used only
when needed.

> Oh, and this would definitely encourage bad style. _using_ it is bad
> style.

Well if you write a certain kind of parsing code it may be very good
style.

> This mostly look like cruft that would slow down files just to make
> some applications minorly easier. Parsers arent really the kind of
> thing one expects to write more than once, if that, and it isnt
> supposed to be trivial even then.

Mind your language! Keeping the filename is zero overhead, keeping
the lineno would be one instruction per readline call. (The overhead
for read() could be reduced to that by setting it to -1 then.)
Anyway, "parser" is a word that can mean a lot of things -- in a sense
any program that scans a file can be called a parser, even though it
may not have a notion of a grammar. And what begins as a program to
search for very simple patterns may over time evolve to a fairly
complex parser. Python will be used during the transition phase from
grep job to C program...

> Speaking of such things, is there or has someone considered adding a
> M3-style Sx module?

Eh, not everybody on the list may know what you mean. I presume with
M3 you mean Modula-3. But what is Sx?

--Guido van Rossum, CWI, Amsterdam <Guido.van.Rossum@cwi.nl>