The reason for this is to force the user to think about the word size
required, and not default to int which is often 4 bytes but only
guaranteed to be at least 2 bytes. Since the wrapping/unwrapping cost
completely dwarfs an extra conversion from short to long, the 50% gain
in space for an array of shorts sounds worthwile considering. But for
consistency, I've added 'i' to the list of possibilities in 0.9.9++.
> I would like to/from network byte order, But what format character
> should we use ? The traditional unix ntohs/htons/ntohl/htonl are
> really a sort of misnomer - they refer to translating a host
> (long|short) to/from a network (long|short) where the host size is
> compiler and host variable, but a *network* long and short are assumed
> to be 32 and 16 bits respectively. i.e. they aren't really longs and
> shorts on the network end - they are octet strings of length 4 and 2.
I can think of four solutions here:
(1) Add four new format codes, meaning 2/4 bytes in little/big endian
byte order -- this is what Perl does. I can't think of any useful
mnemonics that are reasonably consistent and not already taken,
however...
(2) Add three "modifier" codes that change the semantics of following
format characters, e.g. '<' meaning "following items are in
little-endian mode", '>' meaning big-endian, and (for completeness)
'=' meaning "use host byte order". Apart from byte order, these
modifiers should also affect word size: after '<' or '>', it would be
guaranteed that 'h' is 2 bytes and 'l' is 4 bytes (and 'i', 'd' and
'f' should probably be outlawed).
(3) Like (2) but express the difference by using 3 different functions
instead of putting it all in the format. This looks cleaner but
perhaps less flexible since it makes it harder to choose dynamically
between byte order. In reality it would mean more than three
functions: three variants of struct.pack, three of struct.unpack,
three of array.array. And array objects would need an extra attribute
'byteorder'.
(4) Have a global mode which can be selected by a separate function
call. This has the disadvantage that two indepent modules may fight
over the mode (unless it's done in a more object-oriented fashion);
the disadvantage is that it would make code for reading e.g. TIFF
files easier (where there is a flag in the first few bytes of the file
telling whether the rest of the file is in little or big endian
order).
Note that arrays currently have a totally different, efficient but
less clean solution: there's a method "byteswap" which byte-swaps the
entire array. The disadvantage is that it requires the Python code to
know whether it is necessary to byteswap or not. This is OK for
incidental code, but not for robust long-lived programs.
--Guido van Rossum, CWI, Amsterdam <Guido.van.Rossum@cwi.nl>