Re: Multi-line string extension

Tim Peters (tim@ksr.com)
Sun, 17 Apr 94 03:31:24 -0400

> [jim & john discuss the merits of extending the language to say
> that adjacent string literals must be catenated at translation time, a
> la ANSI-ISO C]

If Python really needs this (let's see the profiling data demonstrating
that run-time catenation of string literals is actually a bottleneck
<wink>), I think John's arguments are compelling (can be done without new
syntax, and doing so would benefit (assuming there is a real benefit ...)
existing programs too).

> ... The approach adopted in ANSI C (and certainly will be part of ANSI
> C++) is that adjacent string literals (with only white space separating
> them) are automagically concatenated at an early phase of translation.
> This approach allowed for continuations of strings to be placed on
> consecutive lines without disrupting the indentation level.

While this _is_ useful for continuing long strings in C, as I recall it,
X3J11 actually adopted this gimmick to solve an unrelated problem, namely
how to support macros that want to embed their arguments in strings.
E.g., how to write the body of

#define TAGGED_DUMP(x)

so that TAGGED_DUMP(a[4]) expands to printf("a[4] = %g\n", a[4]).

The plausible

#define TAGGED_DUMP(x) printf("x = %g\n", x)

did the trick under _some_ older C preprocessors, but not all, & the
committee was loathe to allow argument substitutions inside string
literals.

The bitterly fought-over eventual solution was to introduce the
"stringization" preprocessor prefix operator "#", and then rely on
catenation of adjacent string literals to finish the job. I.e.,

#define TAGGED_DUMP printf(#x " = %g\n", x)

Then TAGGED_DUMP(a[4]) expands to

print("a[4]" " = %g\n", x)

in the preprocessor, and the automagic catenation of juxtaposed string
literals is necessary to complete the job. There's no other way to do it
(in std C), so this catenation business plays an essential role in C.
But ppp (the fabled Python preprocessor <wink>) doesn't have this problem
to begin with.

An interesting side note is that catenation of adjacent string literals
is _not_ a preprocessor task, so if you actually run the TAGGED_DUMP
example thru your favorite std C preprocessor, you'll find that adjacent
literals are not merged in its output.

This might be a (very) minor argument in favor of changing Python; i.e.,
else you can't run your Python source thru cpp and get tricks like the
TAGGED_DUMP trick to work.

If people feel that Python _does_ need to change (can't say I'm going to
lose any sleep over it tonight ...), I'd rather see a more ambitious
change that addressed Donald's original points too (not just a marginally
less irritating way to create large blocks of intended-to-be-read-by-
humans text, but a _pleasant_ way).

One possibility we haven't discussed is "just" to liberalize Python's
current quoted strings, so that they can slobber over multiple lines,
sucking up any newlines embedded in them.

So, e.g.,

print "a long string that
spans three lines, and
this is the third"

would print the unsurprising

a long string that
spans three lines, and
this is the third

(& where 'print' itself supplied the final newline; if "print" were
"str =" instead, str[-1] == 'd' would hold).

This is essentially the rule followed in elisp, and if you peek in
python-mode.el you'll find that huge gobs of human-readable text can be
created under this rule quite pleasantly. You _do_ have to remember to
escape the quoting character inside the string, but at least the newlines
come for free & it's WYSIWYG. Also to its credit that it's compatible
with existing Python programs (since everything new that could be done
with it is a syntax error today, and it doesn't change the meaning of any
existing strings).

Is it ugly? Oh yeah! But where the _functionality_ is desired, the
current alternatives are uglier.

On the down side, as soon as you _forget_ to escape a quoting character
in a multi-line string of this ilk, or forget the trailing quoting
character (and you will! with depressing regularity <0.9 sigh>),
arbitrarily large chunks of your program will get sucked up as if they
were part of the string. In elisp you usually get a gripe about
parentheses not balancing at the very end of the program; Perl usually
guesses that the problem is a runaway string, and can even usually tell
you the line where it started. I think the quality of error reporting is
the most important part of the implementation of this kind of gimmick.

cleverly-getting-the-digraphs-out-of-don's-idea-so-it-has-a-chance-of-
sneaking-past-guido's-robo-censor<wink>-ly y'rs - tim

Tim Peters tim@ksr.com
not speaking for Kendall Square Research Corp