Re: Conversion of http escape characters

Guido.van.Rossum@cwi.nl
Tue, 14 Jun 1994 20:58:46 +0200

> Is there a neat canned solution to the problem of converting the escaped
> characters used in http back to normal characters?
>
> e.g. The string 'foo%41bar' -> 'fooAbar' (0x41 == ASCII letter 'A')

The standard module urllib has a solution for this:

>>> import urllib
>>> urllib.unquote('foo%41bar')
'fooAbar'
>>>

I'll copy its definition here so you can see how this kind of thing is
approached in Python:

import regex
_quoteprog = regex.compile('%[0-9a-fA-F][0-9a-fA-F]')
def unquote(s):
import string
i = 0
n = len(s)
res = ''
while 0 <= i < n:
j = _quoteprog.search(s, i)
if j < 0:
res = res + s[i:]
break
res = res + (s[i:j] + chr(eval('0x' + s[j+1:j+3])))
i = j+3
return res

Instead of the call to eval(), it is probably better nowadays to use
string.atoi() (assuming that this is really strop.atoi()): replace

eval('0x' + s[j+1:j+3])

by

string.atoi(s[j+1:j+3], 16)

Using regular expression groups, we could simplify things slightly:
change the definition of _quoteprog by

_quoteprog = regex.compile('%\([0-9a-fA-F][0-9a-fA-F]\)')

and then replace

s[j+1:j+3]

by

_quoteprog.group(1)

which doesn't actually make the program shorter but makes its meaning
clearer and depends less on how the regular expression is spelled.

--Guido van Rossum, CWI, Amsterdam <Guido.van.Rossum@cwi.nl>
URL: <http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>