Re: Multi-line string extension

Tim Peters (tim@ksr.com)
Mon, 18 Apr 94 03:40:55 -0400

> [jim]
> ...
> I would argue that my proposal would allow for more readable code,

Unless I've gotten the threads mixed up, you only proposed to allow
eliding "+" in some contexts (& apologies in advance if that's wrong).
If so, I just don't believe that

msg = "string1" \
"string2"

is any easier to read or write than the current

msg = "string1" + \
"string2"

OTOH, Don's proposal did aim at what I thought were real improvements to
readability & writability (& more on that below).

> [nice variations on the "optimization" arguments]

Sorry, but nobody's gonna convince me I care how long it takes to
catentate string literals. Don correctly (IMO) identified that as a
side "bonus" at the start (or as a cup of soup <wink>).

> > [tim plays w/ the idea of liberalizing python's current quoting
> > methods, letting them span lines a la e.g. elisp & perl & shells]

> Hmmm... I thought you (Tim) had effectively discarded such ideas with
> your criticism of the original proposal.

Hey, if pointing out problems is rejecting, I'd never get out of bed in
the morning <wink>.

> ...
> Your example becomes:
>
> def main():
> print "a long string that
> spans three lines, and
> this is the third"
>
> Alas, *I'm* not sure what the "unsurprising result" of this print
> would be.

Following any of the elisp/perl/shell precedents, or a WYSIWYG rule, it
would contain the leading whitespace. Following the rules _implicit_ in
Don's original proposal, it would not. I prefer the former, but
certainly see the attraction of the latter.

> ... If I grant you that the above program prints:
>
> a long string that
> spans three lines, and
> this is the third
>
> (which is one reasonable possibility, amongst the surprising choices),
> then I would be left to wonder what I should write when I *really*
> want to print out:
>
> a long string that
> spans three lines, and
> this is the third
>
> 'cause I happen to like continuations of messages to be indented ;-).

You better ask Don that one <wink>. It's precisely because of these
confusions that I prefer the WYSIWYG approach (and have already owned up
to it being ugly -- it's a tradeoff).

> The alternatives that I proposed do not seem to suffer from this
> "surprising result" syndrome, 'cause they just about (or actually)
> don't change the language.

Sure, but your proposal didn't address Don's original concerns (unless
you made another proposal to the newsgroup & that didn't get here yet).

Here's a thoroughly typical piece of "informative msg" code, from
Python/Demo/scripts/fixcid.py:

err('Each non-empty non-comment line in a substitution file must\n')
err('contain exactly two words: an identifier and its replacement.\n')
err('Comments start with a # character and end at end of line.\n')
err('If an identifier is preceded with a *, it is not substituted\n')
err('inside a comment even when -c is specified.\n')

Now that _could_ have been written today as

err('Each non-empty non-comment line in a substitution file must\n' +
'contain exactly two words: an identifier and its replacement.\n' +
'Comments start with a # character and end at end of line.\n' +
'If an identifier is preceded with a *, it is not substituted\n' +
'inside a comment even when -c is specified.\n' )

As I understand your proposal, it only goes so far as changing that to

err('Each non-empty non-comment line in a substitution file must\n'
'contain exactly two words: an identifier and its replacement.\n'
'Comments start with a # character and end at end of line.\n'
'If an identifier is preceded with a *, it is not substituted\n'
'inside a comment even when -c is specified.\n' )

And that leaves it just as irritating & error-prone to create, and
especially to _modify_, as before. Multi-line informative msgs
frequently need to be reformatted as programs change (to add, remove,
delete, or rephrase information), and all the quotes, and backslashes (if
it weren't in an unclosed paren structure like the above is), and escaped
newlines make that a pain in the butt even for a measly 5-line example.
Even tools _designed_ for reformatting (like the Emacs fill-region) can't
cope with all the syntactic noise, in any of the 3 variations above.

I would like something better than that, & I believe this kind of thing
was the actual thrust of Don's original proposal too. Ugly as it is, the
following would be a major improvement (as would be Don's suggestion):

err(
"Each non-empty non-comment line in a substitution file must
contain exactly two words: an identifier and its replacement.
Comments start with a # character and end at end of line.
If an identifier is preceded with a *, it is not substituted
inside a comment even when -c is specified.
" )

Entering & reformatting text blocks written in this style is pleasant;
_reading_ them is pleasant in the interior, but icky at the boundaries.

> ...but sometimes digraphs are not the problematic element of a proposal ;-).

True! But figuring out which problem to solve appears to be the hangup
in this thread <wink>.

> [guido]
> ...
> A compile time optimizer that folds constant expressions in general
> would be a valuable addition but would probably require a major
> rewrite of the parse tree allocation code (about the oldest code in
> the Python system).

Take it from a long-time Professional Optimizer <grin>: over the long
run, you'll be happier if you leave initial codegen as stupid as
possible, leaving even obvious optimizations to a later pass. E.g., the
byte-code level looks like a good one for tackling this; the constants
are even easy to find at that level, since they're accessed via
LOAD_CONST!

BTW, constant-folders in general don't buy much unless (A) as in C, the
input to the compiler proper typically goes thru macro expansion, or (B)
you have fancier optimizations in place already that move computations
around (thus creating opportunities for constant-folding that aren't
obvious in the original text -- e.g. nobody writes "i = 1+2" by hand,
but a number of other optimization techniques _create_ things like
that; & I note that in the Python library, most constant expressions have
already been dutifully assigned to module-level variables, so get
evaluated only at module initialization time).

> ...
> Unfortunately, the quest for more convenient multi-line string
> literals is not over now ... the candidates are:
>
> 1. allow <backslash><newline> in string literals as in C
>
> 2. allow unadorned <newline> in string literals as Perl and sh
>
> 3. add a new type of string quote that allows strings to span lines
>
> 4. add one or more variant of Perl-style "here" documents (some with
> variable substitution?)
>
> To me, 1 seems the least controversial. The problem with 2 is that
> especially beginning users can be confused by the diagnostics for a
> missing quote. Number 3 will require us to invent new quotes and will
> still have the missing quote problem. And should the open and close
> quotes be different or not? Number 4 is actually not too hard to
> implement (by bypassing the tokenizer entirely) but << is already
> taken (and the games that Perl plays are off-limits for Python :) so
> would also require invention. In any case, 3 and 4 do more to make
> the language "bigger" than 1 or 2.

Agreed with all, but do look at the "err" example above: #1 doesn't
address the perceived problems there at all. You're about to have two
distinct ways to merely _continue_ long strings in Python (with and
without "+"), and surely don't need a third (heck, I doubt you needed the
second <wink>).

Donald was appalled that I threw variable-substitution into the pot, but
that was fresh on mind because last week I wrote a multi-hundred line
script that spit out some assembly code (as part of prototyping a
compiler optimization). The _pleasant_ data-structuring methods in Perl4
weren't quite up to the task, so I really wanted to use Python. But it
contains lots of little output sections that (now that it's done) look
like this:

print <<PLEA unless $match;
for '$name', please eyeball the original defn following, to
make sure it's compatible with its deduced size $size:
$origdef
PLEA

and this:

print OUT <<DEFINE;
$name: $asmop $datalist{$name}
.def $name; .val $name; .scl 2; .endef
.globl $name
DEFINE

etc.

That kind of stuff was such a large percentage of the code, and is so
much easier to create and modify in Perl than in Python (or in Icon or
elisp or C or ...), that it drove the decision to use Perl.

Now the comment I'd make in response to that is "So what? So use Perl."
Well, I did. And once it got over 400 lines and _needed_ a dict of
lists, I really wished I'd used Python instead <grin/sigh>. The point is
that the ease of creating structured output _often_ leads me to use Perl
instead, and I'm starting to suspect that I'm not alone in that.

If you agree that prototyping is a strong natural use for Python, I'd
like to suggest that (a) prototyping often involves producing structured
output, the content and structure of which often changes rapidly and/or
massively as the prototype evolves, and (b) pasting together structured
output via catenating strings mixed with backticking variables, hand-
counting embedded blanks to get things to "line up", & explicitly
inserting escaped newlines, is doing it at a level no higher than C's.

In most (all?) other respects, Python is a wonderful language for
prototyping already. I do think it falls short in this specific area,
though, and don't think it _wants_ to.

reifyingly y'rs - tim

Tim Peters tim@ksr.com
not speaking for Kendall Square Research Corp