Re: Multi-line string extension

Tim Peters (tim@ksr.com)
Tue, 19 Apr 94 04:27:11 -0400

> > [tim]
> > ...
> > Entering & reformatting text blocks written in this style is pleasant;
> > _reading_ them is pleasant in the interior, but icky at the boundaries.

> [guido]
> ...
> And so is the Perl style "here" document. The advantage of that is
> that it's less likely that a missing end quote confuses you.

Agreed, except the "here" style is a _little_ less icky at the start,
because the "open quote" (label) is on the line preceding the text block
(so doesn't disturb WYSIWYG on the 1st line). But that's a nit. Hmm!
According to the detailed rules you gave later, even that minor
distortion could be avoided pleasantly via

longmsg = """\
blah blah blort floom blam bloom
wogga wogga ding dong
"""

> One (ugly?) alternative that comes to my own mind is triple quotes,
> e.g.:
>
> err(
> """Each non-empty non-comment line in a substitution file must
> contain exactly two words: an identifier and its replacement.
> Comments start with a # character and end at end of line.
> If an identifier is preceded with a *, it is not substituted
> inside a comment even when -c is specified.
> """)

Ya, that is ugly, but in a very attractive way <smile>. I particularly
like that runaway strings would be easy to find via a dumb search for
triple quotes, and that it's easy enough to write a regexp to match it.
Good show, Guido! I like it very much.

> > ... over the long run, you'll be happier if you leave initial codegen
> > as stupid as possible ...

> I'm happy with that, since I can leave writing the optimizer to
> someone else :-)
> ...
> My own defense against premature optimization is simply that I don't
> care if something runs 30% slower, but there is a lot of pressure in
> this group from people who disagree!

I have a lot of bitter, vicious and cynical (yet entertaining and true)
things to say about the optimization business -- but am saving myself for
marriage <wink>. In the meantime, you have a tough problem deciding how
to squander the limited time (& interest <0.7 grin>) you do have for
implementing optimizations! The good news is that because we all sense
that, we're happy to _tell_ you how to squander it <wink>.

> > ...
> > print <<PLEA unless $match;
> > for '$name', please eyeball the original defn following, to
> > make sure it's compatible with its deduced size $size:
> > $origdef
> > PLEA
>
> Actually, I don't see why this is so much better than
>
> if not match():
> print "for '%s', please eyeball the original defn following, to" % name
> print " make sure it's compatible with its deduced size %d:" % size
> print origdef

I think this is, mutatis mutandis, exactly the same disagreement we had
over the desirability of accessing matched substrings (after a regexp
search) via symbolic names instead of via snaky little integers --
believe it's a question of personal style that we just won't agree on.
But that's cool! Tracy Tims bailed us out of the regexp squabble, and
I'll happily join Don in saying I don't mind doing the substitution
business (when desired) myself. We might be able to use some (non-
syntactic) help with that, though (mentioned at the end of this msg).

In fact, given your triple-quote idea, and my sense of esthetics, I'd
already be much happier with:

if not matched:
print """\
for '%s', please eyeball the original defn following, to
make sure it's compatible with its deduced size %d:
%s
""" % (name, size, origdef)

That's 90% of the battle; now the msg can be changed in a WYSIWYG way.
BTW, many of the output sections in the script I talked about were
created by pasting in existing assembler output and just replacing
selected field values with symbolic names; that's so much _easier_ if
there's a way to do it without needing to decorate each line with a
leading
<whitespace>print "
and a trailing
" % tuple_of_names_specific_to_this_line

So what would you rather hear: people whining about the speed of things
you don't care about, or people whining about extra keystrokes you don't
care about <grin>?

> ... I have a feeling that if I ignore you I will get email about this
> until the end of times (which is defined as the day the last Python
> user dies:).

I won't even dignify that abuse with a response -- but don't think E-Mail
doesn't reach to Heaven, pal <snort>.

> My problem with Perl-style 'here' documents remains that it is a very
> un-Python-like piece of syntax (how's that for an emotional argument:)
> and that I can't think of an "intuitive" operator, since << is already
> taken for left shift.

It's the _functionality_ that's important; fully agree that Python wants
a different syntax for it than Perl's.

> ...
> To be specific, I'm thinking of the following rules.
> Either single or double quotes can be tripled to start a different kind
> of quoted string.

Staring at the "matched" example above (& some others) really does seem
to make an appealing case for stripping leading whitespace in one of the
versions. How about if the tripled double-quote version stripped leading
whitespace from each line, but translated '\t' to a non-stripped tab and
'\ ' to a non-stripped blank? E.g., the "matched" example could be the
less visually jarring

if not matched:
print """\
\ \ \ for '%s', please eyeball the original defn following, to
\ \ \ \ \ \ make sure it's compatible with its deduced size %d:
\ \ \ %s
""" % (name, size, origdef)

You're not supposed to know whether the leading spaces are blanks and/or
tabs there, because both would be stripped. And this is meant to
illustrate pathologies (in that I want leading whitespace on every line
here, so would _probably_ have used the tripled single-quote version
instead), just to nail down the intent. More typical would be

err("""\
Each non-empty non-comment line in a substitution file must
contain exactly two words: an identifier and its replacement.
Comments start with a # character and end at end of line.
If an identifier is preceded with a *, it is not substituted
inside a comment even when -c is specified.
""")

One thing I don't like about this is that Python doesn't do any
translation of '\ ' today, and it's Not Good for '\ ' to mean different
things in different kinds of strings.

OTOH, if you say (as e.g. "cat <<-EOF" in Bourne shell says) that _only_
initial manifest tabs are stripped (and not any blanks after them), you
can't tell from visual inspection how much will get stripped.

On the third hand, if Python changed its rules so that '\ ' _always_ got
changed to a blank (in any kind of string), there's a good chance no
existing code would break (e.g., there's no instance of the digraph '\ '
in the Python distribution, or in any local Python, outside of accidental
juxtaposition in comment lines, like in the

# Shell quoting characters (\ " ' `) are protected by a backslash.

from Python/Lib/posixpath.py).

The fourth hand says "tough luck -- if you want any leading whitespace to
survive, you just can't use the tripled double-quote form".

I could live with any of those. Don?

> Inside such strings, backslash escapes still work, and
> <backslash><newline> is ignored, but unescaped <newline> is kept in the
> string (rather than being an error). Sequences of 1 or 2 quotes do not
> terminate the string. A sequence of three quotes can be enclosed by
> quoting at least one of them with a backslash.

All perfect!

> There is no variable substitution but of course you can use a
> triple-quoted string as format string or concatenate it with a
> back-ticked expression.

OK here too.

> [steve]
> I don't see any of the extended string syntax ideas as being a major
> gain.

Steve, do you still believe that? I see it as aiming at the same kind of
"how can I create a documentation block in a pleasant way?" problem
you've been talking about all along:

> ...
> I would like a begin/end comment character similar to C's "/* */" .
> The topics of literate programming, and online self documentation
> have been raised in the past. The inability to have blocks on
> comments without leading "#" chars get in the way for both of these.

Well, multi-line comments wouldn't address any of the concerns addressed
by multi-line strings, but it's not clear to me that holds in both
directions: if you _had_ multi-line strings, couldn't you put them to
good use in an online documentation system? I don't know beans about
literate programming, except that I see ten papers _about_ it for each
line of code that actually uses it <0.7 grin>.

> ...
> One idea I kicked around for a while, and then rejected, was to add
> an explicit program end statement, so that a file could have a data/
> text/documentation area at the end. ( The filename and line number
> are available as function attributes, so it's quite simple to make
> a function that reads in the text below itself. )

Since you're not known to be a full-fledged Perl weenie, you may not know
that in Perl you spell "explicit program end statement" as

__END__

and that all lines following that in the script are available via the
predefined file handle DATA. It's a neat hack, but I wouldn't put it in
Python either.

> ...
> In a compiled language where you are going to be running 'make' over
> the sources every once and a while, there is no problem with weaving
> the text and documentation together, and unentwining them just before
> compile, but an interpreted language like python can't depend on an
> external program like make easily.

Suppose you gave "literate Python" files a .pyl suffix. Then (a) Python
wouldn't confuse them with real Python files, so wouldn't need to know
anything about them; (b) It should be easy to define a Make rule that
knows how to apply "unentwine" to a .pyl file to get a .py file; (c)
After changing a .pyl file, I'd just need to type "make". That doesn't
sound obnoxious or even unpleasant to me ... does it go deeper than that?

> [don]
> ...
> In fact, the reason I want multi-line strings is so that I can parse
> the strings and perform the substitutions myself. But that is another
> story.

The problem I hit is that I want to write a "substitute" function that
takes the string and does the substitution, but once I cross the caller/
callee boundary I can't get at the caller's namespaces. So substituting
for names of ordinary variables becomes a real puzzler, short of
explicitly building and passing a shadow dict of "interesting" names in
the caller. Got a better idea in mind?

I hate to mention this, but I have sometimes wished for functions along
the lines of

uplocals(i)
upglobals(i)

that return dicts holding the current bindings of the local & global
namespaces "i" levels up the call stack (and where i==0 means the current
function). The i==1 case in particular would be handy for writing a
pleasant-to-use "substitute" function. BTW, in all the uses I've ever
imagined for this, I never wanted to _change_ an up-level binding via
fiddling the returned dict; just wanted to get a current _snapshot_ of
the up-level name->value maps, for inspection only. Insisting on
returning the _actual_ namespaces would probably be an implementation
nightmare (for locals, since they're not ordinarily maintained in genuine
dicts anymore, & that's a feature).

planning-guido's-summer<grin>-ly y'rs - tim

Tim Peters tim@ksr.com
not speaking for Kendall Square Research Corp