Introduction to libpack

I'm a big fan of doing things The Right Way. As such, I get rather annoyed by people doing things The Wrong Way. In particular in this instance, The Wrong Way is how to handle binary data in C programs.

There's at least 2 or 3 (depending on how you count) different situations in which binary data is handled. Each has its own pitfalls and issues to be aware of. And many people who try to deal with them often aren't aware of such issues.

In the first instance, code will be reading or writing a binary file on disk/network/etc.. (generally over some FILE* or raw int fd). Suppose they're wanting to read a header which might look like this in a specification:

Name    Size            Function
Class   8 bits          Operation class number
Reg     8 bits          Register number
Value   16 bits         The value to store

A useful struct to store such a header in might be

struct FileHeader {
 char klass; /* Keep C++ programmers happy */
 char reg;
 short value;
};

So our naïve C programmer goes and does this:

struct FileHeader h;
fread(&h, sizeof(h), 1, file);

I quiver with fear when I read this. At least 3 obvious problems come to mind:

Size issues - is short 16 bits?
Padding issues - do we have any memory holes in the struct
Endian issues - what sort of endian machine are we running on ?

Technically there's a 4th consideration, what if CHAR_BITS is not 8 on this platform. However, for the time being, my library doesn't handle this situation either - glass houses, stones :). It's on the TODO list, but for now I'm dealing with the simpler bits.

So, our slightly more experienced C programmer looks at that and says "Ah, endian. I'd better fix that" and does something around ntohs(). And while he's at it uses uint8_t and uint16_t in the struct.

Over in the perl camp, our seasoned hacker simply goes

my ( $class, $reg, $value ) = unpack( "CCn", $data );

And hey-presto. All the values pop out correctly, with no need to even stop and think on any of the 3 issues above. So why is it nothing like that exists for C?

Here's where my library comes in. Borrowing a few ideas of syntax from the perl one, but largely reinventing the main idea, I implement a family of functions similar to the printf()/scanf() family, with the aim to handle binary data like this. So now our code looks more like

struct FileHeader h;
funpack(file, "bbw>", &h.klass, &h.reg, &h.value);

I'll admit not quite as neat as the perl case, but it's C; what do you expect? :) What it does have though is a lack of any of the 3 problems outlined above.

So what of the other case? That one concerns the handling of more architecture-dependent data. Usually this will be data read/written to /dev nodes, or streamed as serialisation over a UNIX socket or pipe. Here we don't care so much about endian or size issues, but still in structs we care about padding/alignment.

Here again, libpack can help us; such as in the following code on how to read a /dev/input/event* record:

struct evevent {
 int time_sec, time_nsec;
 short major, minor;
 int value;
};

funpack(f, "iissi",&ev->time_sec, &ev->time_nsec,
 &ev->major, &ev->minor, &ev->value);

Of course, much like its perl cousin, this library isn't just limited to handling simple integer types. Suppose we wish to read from a file an array of integer values.

int values[32];
funpack(f, "i[32]", values);

In some cases of course, we don't know at compiletime how many elements we'll get. We need to deal with it more dynamically.

int count = ...;
int *values = malloc(count * sizeof(int));
funpack(f, "i[]", count, values);

Or maybe the number is embedded in the stream itself.

int count = 32;
int values[32];
funpack(f, "i[c]", &count, values);

As an aid to making sure we don't do silly things with pointers, guard against buffer overflows, and other issues (see for example the eternal problem of scanf()'s %s format), we can even have *unpack() allocate the buffer for us, and tell us how big it is.

uint32_t count;
uint32_t *values;
funpack(f, "d<[d]#", &count, &values);

Marvelous.