perl pack/unpack

Steven D. Majewski (sdm7g@aemsun.med.Virginia.EDU)
Thu, 23 Jan 92 14:00:04 EST

>
> > and the python equivalent of 'h2ph' to convert ".h" files into
> > a "record description".
> > ( I don't *think* there is a way to do this in python. Correct me
> > if I'm wrong. )
>
> What's h2ph? What's a "record description"? (You can see I'm not a
> Perl hacker :-)
>

h2ph converts C ".h" files into Perl ".ph" header files.
#define's & #ifdef's are converted into perl sub-routines.

pack/unpack convert to/from a perl-list <=> (binary) string
according to a TEMPLATE that is sort of like a C or FORTRAN
format string. Something like:

[ string, int, float1, float2, float3 ] = unpack( binstr, 'a20Lf3' )

I'm not much of a Perl hacker, either. I was under the mistaken
belief that h2ph also produced pack/unpack templates and/or subroutines
that converted binary C structs into Perl lists. I poked through the
output files in my /usr/local/lib/perl and could not find such a case.
[ I was just starting to learn Perl when I discovered Python. I have
only written a dozen or so simple perl scripts ( enough to get annoyed
with some of Perl's "features" ). *Tim* : As a beginner in both languages,
I can assert that the PYTHON version is MUCH more readable - you are
obviously blinded by your experience with Perl! ) ]

>
> You can read binary integers using ord() and <<, like the example I
> gave in my mail about left shifts. We use this at CWI to read and
> write "AIFF" audio files (an EA-IFF subformat). What else do you
> need?

Thanks. I looked at lib/aiff.py.

I think pack/unpack is a convenient interface for the programmer.
( But: can anyone suggest a better way? )

I could wrap up your conversion routines into a higher-level
pack/unpack function. But it might be a reasonable thing to
implement at a lower level. ( C ) - both for effeciency and for
hiding some of the machine differences. [ I'm sure a major use
of this in perl is to build network packets of various sorts :
you need to know if network byte order is the same as native
byte order. ]

The other point I wanted to raise was whether implementing a
perl-like pack/unpack in python ( IN python OR C ) required/
suggested any restrictions or enhancements. The obvious restriction
is to require the input to be a "flat" list: i.e. not containing
other lists, tuples or dictionaries and only containing numbers
and strings ( for this use, strings, although sequences, are
considered simple types - we will not consider lists of strings as
sequences of sequences in this instance. )

[ It would not be difficult to make pack flatten it's input list arg,
or perhaps better, to define a 'format' grammar that expresses
nested-ness, but I think that those "conveniences" might prove to
be error-prone shortcuts. ]

The other thing I believe Perl has, is more elaborate output
formatting & report generation capabilities. But I haven't
actually ever got that far in my Perl programming, so I'm
taking that rumour at face value.

Some of the people in the comp.lang.perl discussion expressed a
wish for a "Perl-2". I would hope that some constructive criticism
from the hard core Perl faithful would be "this (x) is what is lacking
from Python for me to consider it a real substitute for Perl"
( and *NOT* just "I happen to like "@_" , etc. - why use 8 chars when
2 will do!", and obviously not compatability questions. )

Follows (LONG) perl pack/unpack man section:


pack(TEMPLATE,LIST)

Takes an array or list of values and packs it into a binary
structure, returning the string containing the structure. The
TEMPLATE is a sequence of characters that give the order and type
of values, as follows:

A An ascii string, will be space padded.
a An ascii string, will be null padded.
c A signed char value.
C An unsigned char value.
s A signed short value.
S An unsigned short value.
i A signed integer value.
I An unsigned integer value.
l A signed long value.
L An unsigned long value.
n A short in network order.
N A long in network order.
f A single-precision float in the native format.
d A double-precision float in the native format.
p A pointer to a string.
x A null byte.
X Back up a byte.
@ Null fill to absolute position.
u A uuencoded string.
b A bit string (ascending bit order, like vec()).
B A bit string (descending bit order).
h A hex string (low nybble first).
H A hex string (high nybble first).

Each letter may optionally be followed by a number which gives a
repeat count. With all types except "a", "A", "b", "B", "h" and
"H", the pack function will gobble up that many values from the
LIST. A * for the repeat count means to use however many items
are left. The "a" and "A" types gobble just one value, but pack
it as a string of length count, padding with nulls or spaces as
necessary. (When unpacking, "A" strips trailing spaces and
nulls, but "a" does not.) Likewise, the "b" and "B" fields pack a
string that many bits long. The "h" and "H" fields pack a string
that many nybbles long. Real numbers (floats and doubles) are in
the native machine format only; due to the multiplicity of float-
ing formats around, and the lack of a standard network represen-
tation, no facility for interchange has been made. This means
that packed floating point data written on one machine may not be
readable on another - even if both use IEEE floating point arith-
metic (as the endian-ness of the memory representation is not
part of the IEEE spec). Note that perl uses doubles internally
for all numeric calculation, and converting from double -> float
-> double will lose precision (i.e. unpack("f", pack("f", $foo))
will not in general equal $foo).
Examples:

$foo = pack("cccc",65,66,67,68);
# foo eq "ABCD"
$foo = pack("c4",65,66,67,68);
# same thing

$foo = pack("ccxxcc",65,66,67,68);
# foo eq "AB\0\0CD"

$foo = pack("s2",1,2);
# "\1\0\2\0" on little-endian
# "\0\1\0\2" on big-endian

$foo = pack("a4","abcd","x","y","z");
# "abcd"

$foo = pack("aaaa","abcd","x","y","z");
# "axyz"

$foo = pack("a14","abcdefg");
# "abcdefg\0\0\0\0\0\0\0"

$foo = pack("i9pl", gmtime);
# a real struct tm (on my system anyway)

sub bintodec {
unpack("N", pack("B32", substr("0" x 32 . shift, -32)));
}
The same template may generally also be used in the unpack func-
tion.

unpack(TEMPLATE,EXPR)

Unpack does the reverse of pack: it takes a string representing a
structure and expands it out into an array value, returning the
array value. (In a scalar context, it merely returns the first
value produced.) The TEMPLATE has the same format as in the pack
function. Here's a subroutine that does substring:

sub substr {
local($what,$where,$howmuch) = @_;
unpack("x$where a$howmuch", $what);
}

and then there's

sub ord { unpack("c",$_[0]); }

In addition, you may prefix a field with a %<number> to indicate
that you want a <number>-bit checksum of the items instead of the
items themselves. Default is a 16-bit checksum. For example,
the following computes the same number as the System V sum pro-
gram:

while (<>) {
$checksum += unpack("%16C*", $_);
}
$checksum %= 65536;