Re: Multi-line string extension

Jim Roskind (jar@infoseek.com)
Sat, 16 Apr 1994 13:26:50 +0800

I probably shouldn't be in the mode of suggesting language extensions,
as it tends to go against my grain (I don't generally like the added
complexity that comes with extensions). In this case, I've decided to
bend my rules because I think that the proposal put forth was (as I
think was pointed out by Tim) fairly antithetical to the Python
indentation-based structure. Since I'm such a newbie at Python, I
won't feel bad if my proposal is also blasted away on some grounds.

C and C++ evolved away from the "backslash continuation of strings"
because it tended wreak havoc on the code indentation. As a result,
it is conceivable that the approach finally taken in those languages
might be adaptable to Python. The approach adopted in ANSI C (and
certainly will be part of ANSI C++) is that adjacent string literals
(with only white space separating them) are automagically concatenated
at an early phase of translation. This approach allowed for
continuations of strings to be placed on consecutive lines without
disrupting the indentation level. For example, in ANSI C, you can
write:

if (bad_error)
{
error_string = "A bad error was encountered during "
"the latter part of the sort and munge "
"procedure.\nPlease attempt the operation "
"in a different universe.\n";
error_abort(error_string);
}

Note that in ANSI C, this effective concatenation of the strings did
*NOT* use any sort of strcat() run-time function call. Similarly, it
would be hoped that in the Python adaptation, there would be no need
to call the string-concatenation operator "+" at run time.

The solution given above also has a nice impact in C/C++ involving the
cleanup of the embedded hex constants. Note that Python inherited
the silly (hard to terminate) hex constants in string literals. As a
result, the following C string:

"End of line can be written \xA" "as well as \n"

is hard to write in directly in Python. The problem is that embedded
hex sequences such as "\xA" includes arbitrarily many hex digits (see
the Python Reference Manual, section 2.4.1). If you tried to write
the above string in Python as:

"End of line can be written \xAas well as \n"

then you would get the wrong result, as the hex sequence "\xAa" would
be parsed as having value 10*16+10 = 170 (oops). To get the desired
result, you would have to use the run time concatenate operation:

"End of line can be written \xA" + "as well as \n"

or use the octal sequences, which terminate after 3 digits:

"End of line can be written \x012as well as \n"

So there are two reasons for this syntax to be adapted. The first is
to allow strings to be cleanly broken across lines without trashing
indentation info, and without requiring run-time support. The second
is to clean up the silly hex sequence problem (without run-time cost).

>From my experience with LALR(1) grammars, I think I can safely say
that modification of the grammar/parser to support this extension
would not be difficult. The rule (in Python) would be that adjacent
string literals in an expression are automatically concatenated at
compile time to form longer string constants. You could argue that
the "+" operator should be "optimized" to do more work at compile
time, but I kinda' like seeing the distinction between compile time
and run time operations. I could probably argue the other way equally
well :-(.

A typical long string in Python would then look like:

my_str = "I had a big error message to put here, " \
"and it didn't all fit cleanly on one line"

This seems to stay in line with the use of backslash, make it easy to
read and enter long constants, and smells a lot like C (from whence a
lot of this lexical syntax came).

Just to make it clear, I think an optimization in the Python compiler
could also supply the desired results. In such a case, the Python way
of entering the above string in a readable fashion would (continue to)
be:

my_str = "I had a big error message to put here, " \
+ "and it didn't all fit cleanly on one line"

Note that this approach does not require a language extension, but
does require a Python compiler optimization (which is a little harder
to quickly code).

As I said, I could argue either way, and perchance the latter approach
would win :-(.

Jim

Jim Roskind
408-982-4469
jar@infoseek.com