Re: Ideas about enhancements to fileobjects

jredford@lehman.com
Tue, 23 Nov 93 17:51:50 -0500

>> Text with >> is by Guido, responding to me.
>> Text with > is John Redford, jredford@lehman.com, responding to Guido.
>> Text with no prefix is me, responding to John Redford. Most of my reponse
>> follows the included message text.
>>
>> Actually, the address and port ARE part of a socket "object".
>> If you look at the file-descriptor/socket/TCP/IP connection
>> implementation, you will find something very much like
>> object-oriented programming, with methods in the derived
>> classes calling methods in their superclasses. The address
>> and port are in the "instance" of a socket object. And
>> certainly you must agree that the information associated
>> with the address and port are public, even though their
>> storage is not. -TT

AF_UNIX.
SO_REUSEADDR.

>> Actually, the peek function isn't what causes the problem. Any
>> buffered i/o causes the problem. Another strawman burns merrily. -TT

Peek is the one that gaurentees it wont cause a problem though.

>> >Oh, and this would definitely encourage bad style. _using_ it is bad
>> >style.
>>
>> Wrong. (I've always wanted to say that.) -TT

Everyones entitled to have your opinion.

>> >This mostly look like cruft that would slow down files just to make
>> >some applications minorly easier. Parsers arent really the kind of
>> >thing one expects to write more than once, if that, and it isnt
>> >supposed to be trivial even then.
>>
>> I have proposed a natural, useful enrichment of an existing (de facto) python
>> abstract data type. You have objected to my proposal by saying:
>>
>> a) since my implementation can be subverted, it is wrong
>> to do it.
Somewhat, yes. Ever hear of 'safe' programming?

>> b) enriching the line-oriented abstraction will reduce
>> python i/o performance.
Subclass and decrease your own.

>> c) that instead of using an appropriate abstract data-type
>> to build parsers that I should use implementation specific
>> features of files. Part of your reasoning seems to be that
>> parsers are supposed to be difficult to build.
For some value of parser, yes. Subclass. Put your happy generic 'peek'
nonsense in its own module. Peek files, peek sockets, peek anything
you like, but dont pretend that peek is a file semantic.

>> I'll take these points in order.
>>
>> So what if the implementation can be subverted by calling related
>> functions that are not part of the abstraction? There is no
>> guarantee that a private data-attribute of a python instance or
>> module won't changed by a programmer. Does this mean that classes
>> and modules should be removed from python? It is also possible to
>> subvert stdio by calling system i/o calls. Does this mean that
>> buffering should be removed from stdio?
Get real.

>> You were worried that a file's name could change, invalidating the
>> saved copy of the name. But most programs don't have files open
>> long enough to make it worth designing them for expected concurrent
>> access to their files. Let's take a poll: how many people write
>> text-processing scripts that check that to see if a file has been
>> renamed, so that they can always issue a correct error message if
>> an error occurs? Furthermore there are more important properties
>> (like the data) of a file which can be changed concurrently with
>> a program that is using the file. This whole argument is completely
>> irrelevant.

"more important properties (like the data) of a file which can be
changed concurrently". Like the Number Of Lines?

Its irrelevant to lie to the user of the interface more than
necessary? If the user has to save the filename in a variable he will
know damn well why it is the same when he prints it out. When the file
claims to tell him the name of itself, he might rightly expect the
current name.

>> As for a performance disadvantage in i/o, I doubt you will be able
>> to find any significant disadvantage within the existing overhead
>> of a python program. I have been doing some python profiling, so
>> I'm pretty confident about this. Calling a method in python swamps
>> many other performance issues. Checking an int for -1 on each i/o
>> call is not something you'll notice. If and when methods get
>> faster, we can make the i/o system faster (one way is to have python
>> incorporate its own buffering, so that we can guarantee that we
>> only check for -1 once per buffer fill).
Its slow now, so we can make it slower for free. I dont care if you
make your programs slow. Subclass.

>> Now the meat of your argument seems to be your disagreement with how
>> I want to program, and I must admit that it took me a little while to
>> figure out that you think that good style is to program by using
>> operating-system specific features of Unix files, sockets and pipes.

Depends on what one has to do, of course.

>> Let me see if I can understand how you would have me write a parser.
>> I should use seek() or two file descriptors if I want to read ahead.
>> I assume I am out of luck if I can't seek the fd, because you don't
>> like readahead mechanisms ("cruft", remember). So that means that
>> I can only write parsers for seekable fd's, unless I want to keep
>> all of the state internally. And I have to write my parser with
>> an OS-dependent mechanism. Not only that, but Unix doesn't let me
>> find out if an fd is seekable, so I can't even have the parser
>> check its arguments for correct type.
>>
>> This is total nonsense.

Subclass yourself. make your fake-peeking. Yes, I believe in giving
the programmer access to the _real_ internals, not to fake ones.

"Unix doesn't let me find out if an fd is seekable"

fstat -> regular files, /dev/null are. pipes, ports, sockets & other
character devices arent.

PROCEDURE New (fd: INTEGER): T RAISES {Rd.Failure} =
VAR statbuf: Ustat.struct_stat;
BEGIN
IF Ustat.fstat (fd, ADR (statbuf)) # 0 THEN Fail (IOFailure.fstat); END;
CASE Word.And (statbuf.st_mode, Ustat.S_IFMT) OF
| Ustat.S_IFCHR =>
IF IsDevNull (statbuf)
THEN RETURN NewDiskReader (fd, statbuf.st_size);
ELSE RETURN NewTerminalReader (fd);
END;
| Ustat.S_IFPIPE, Ustat.S_IFPORT, Ustat.S_IFSOCK =>
RETURN NewTerminalReader (fd);
| Ustat.S_IFREG =>
RETURN NewDiskReader (fd, statbuf.st_size);
ELSE
RETURN NewDiskReader (fd, statbuf.st_size);
END;
END New;

>> As for placing the filename and the fileobject into a tuple and
>> passing that around--a tuple is NOT a structure. You seem to be
>> arguing that unnamed aggregates of different data types are a good
>> idea. This runs counter to my own experience and most of modern
>> programming practice. I think that the performance advantage of
>> python tuples over class instances is one of the python's greatest
>> weaknesses: it makes people comfortable with this kind of programming
>> style and argument. (Hey Guido, I've just given you advance notice
>> of a gripe I was going to make one of these days!)

So use a different language.

>> So here we have this de facto abstract data-type in python: the
>> sequential, line-oriented input source. It has some operations
>> already implemented (f.readline(), f.readlines()), and what's more
>> important, it's an ADT which harmonizes with the python view of the
>> world. Python lists, python strings, and python pattern-matching
>> encourage a line-oriented view of file-scanning. So it makes complete
>> and utter sense to recognise it as a crucial python ADT, and to add
>> important functionality to it. (I recognise that this ADT could in the
>> future be superseded by a wonderful generalization of sequences, but
>> that won't happen immediately.)
>>
>> The interesting questions are "what should the data type do?~ and "where
>> should it live?"
>>
>> A file handle, and the name used to create it, are (I would argue)
>> sufficiently closely related that they belong in the same object. If
>> it helps you, think of the file name as a printable comment on the
>> origin of the file handle. That certainly belongs in the same object.

Be sure to put all of the 'stat' data there too, as well as the time
of day and the name of the polity leader of your choice.

>> The line number can be computed more easily, and more reusably, in
>> the sequential line object than any where else. It is the line object's
>> equivalent of a file pointer. A clue: programmers all over the world
>> keep rewriting the same code for tracking the line number of an input
>> file. They shouldn't have to.

They can always quit their jobs. I know how tough that extra code is.
Its too bad I dont even have to do a method call to get at it.
line := f.GetLine(); INC(lineno);

>> As for pushback/readahead, I always thought it was pretty
>> non-controversial, until now. Parsers need to backtrack. The
>> question is, do they do it on their input, or do they do it with
>> their own data structures? And here I disagree with the thesis
>> that parsers are supposed to be hard. Let me produce a concrete
>> example. With an apropriate API and with lookahead over the input
>> stream, it is possible to write recursive-descent parsers in which
>> you can use other people's parsers to parse some of your non-
>> terminals. The nice thing about a pushback or readahead model is
>> that a function doesn't have to consume input unless it understands
>> it. I fail to see how writing a program where the state of the
>> input stream is easily characterised (only consume what you recognise)
>> and where the backtrack state is easily understood (it's your input
>> language) is "bad style~. Has there been a revolution in programming
>> that has passed me by?

Yeah. Read buffering.

Lastly: You seem to think that all code must either be written over &
over again or be a low-level service provided everywhere. Wrap your
mind around subclasses & 'friend' classes.

P.S. I dont care this much about efficiency, I care about safety.
Python as is is unsafe, but it dosent need to be misleading too.

--
John Redford (AKA GArrow) | 3,600 hours of tape.
jredford@lehman.com       | 5 cans of Scotchguard.