Re: python strings

Tim Peters (tim@ksr.com)
Tue, 26 Apr 94 22:12:22 -0400

> [jaap]
> [This response is not directed to Tim in particular.]

Nor is this to you <wink>.

> ...
> Unless you need three quotes, what is the escape sequence for that?

Under the rules Guido proposed, which are a generalization of those for
{',"} strings, <not-backslash><quote><quote><quote> is the only way to
close a triple-quoted string, where quote is in {',"}. So if you need
three quotes (you don't <grin>), any of these would do:

\"""
"\""
""\"
\"\""
\""\"
"\"\"
\"\"\"

The last one also has the advantage of working in regular strings.

> And what about 'A test called "test"' with triple double quotes?
>
> """A test called "test""""

"""A test called "test\""""

would do it. So would

"""A test called "test"
""" [:-1] # <wink>

> (I can see the parser complain here...)

Well, the initial part
"""A test called "test"""

will get sucked up as a single string. That leaves a lone '"', which,
thanks to the also-new implied-by-juxtaposition string literal catenation
(don't blame me <0.9 grin>), will be taken as the start of another
string, to be catenated to the leading part. So, since a plain double-
quote string can't span lines, the parser would gripe about an unclosed
string if fed that fragment on the tail end of a line. OTOH, this
wouldn't gripe:

print """A test called "test"""" + "

> | A second feature is that runaway strings are very easy to find if they're
> | introduced by the obnoxious '"""'.
>
> Moot. If you have '"""' all over the place, it's not going to matter any
> more.

I believe that plain {',"} strings will remain overwhelmingly most
common, so disbelieve the "if" part.

The other half is that it's very easy to forget to escape a '"' in a
multi-line '"' string, but very unlikely that you'll forget to escape
'"""' in a triple-quoted string. Not because remembering to _do_ an
escape is any easier in the new form, but because '"""' almost never
occurs in strings to begin with. So the most frequent _cause_ of runaway
strings is virtually eliminated.

> | [the "strip leading whitespace" argument]

> Blech! I hate that feature.

I disliked it too at first sight, and you'll be happy to hear that Guido
still dislikes it. But write some Python _as if_ the new triple-quoted
strings were already implemented, and see how you like _that_!

The example of doc-string syntax Guido posted today illustrates (perhaps
unwittingly <wink>) the real practical problem here:

def system(s):
"""send a command to the shell

This forks a child process which execs /bin/sh.
The parent waits until the child exits or dies.
The return value is either:
256 times the child's exit code (if it exits)
the signal number plus 128 (if it dies)
127 (if the fork or exec fails)

"""
pid = os.fork()
if pid: return os.waitpid(pid, 0)[1]
os.execv('/bin/sh', ['sh', '-c', s])

He's got the whole string (except for the first line) indented 16 spaces,
so that it doesn't interfere with reading the code (and should really
have the guts indented a bit more than that). That's fine -- Python
becomes an unreadable mess if these long strings are jammed against the
left margin (try it).

But what happens when the doc string is printed? Either:

A) It's printed just the way it is, so function and method doc msgs have
a bizarre amount of leading indentation, that varies according to
their nesting depth in the source code(!).

or

B) A function is developed to reformat doc strings, part of whose task
must be to answer your next questions:

> How are you going to determine the right amount of whitespace to trim?
> And what if you want whitespace?

Precisely. Refusing to answer those questions in the base language
doesn't make the _problem_ go away, it just means we'll all answer those
questions in different ways, and pay at runtime to boot.

> No, better to do that in explicit functions, such as a rewritten
> string.strip() (rewritten to strip around newlines as well).

If you can suggest a good convention for the string.strip() variant to
follow, better to enforce it at translation time and skip the runtime
bother & expense.

> I guess the three double quotes where just arbitrary; you can use any
> unique delimiter.

So long as these things act exactly like the existing strings, except for
allowing manifest newlines, """ and ''' are kinda mnemonic. If they were
"smarter" than the existing strings, e.g. by virtue of stripping leading
whitespace, or as you suggested later by virtue of allowing a user-
specified trailing delimeter, """ and ''' might be actively misleading.

Give it all some thought!

cat <<-SIG
only-thing-missing-here-is-the-will-to-try-to-address-the-problem-up-
front-ly y'rs - tim

Tim Peters tim@ksr.com
not speaking for Kendall Square Research Corp
SIG

only-thing-missing-here-is-the-will-to-try-to-address-the-problem-up-
front-ly y'rs - tim

Tim Peters tim@ksr.com
not speaking for Kendall Square Research Corp