Tuesday, December 26, 2006

Lists and Queues

Linked lists are often among the first data structures we learn about in computer science. They're simple in concept as well as implementation; An implementation could be developed in the time allotted to a technical interview, for example. However, computer scientists and engineers are often lazy, meaning that they don't want to do more work than is needed. Thankfully, with 4.4BSD came the sys/queue.h header file which contains many useful functions for dealing with singly- and doubly-linked lists, tail queues, and circular queues. These functions (actually, they are implemented as C macros) are described in the QUEUE(3) man page.

In this post, we'll take a quick look at a few of the functions required to use these structures. We'll work through a small (and virtually useless) example that uses a doubly-linked list to store even numbers. The first thing we'll do is focus on the central data structure that we want to be "linked" together. In our example, we will create a simple data structure for holding one even number.

struct EvenNumber {
int val;
};


In order to add this structure to a list, we need to modify it slightly by adding an entry for holding pointers to the next and previous EvenNumber entries.

struct EvenNumber {
int val;
LIST_ENTRY(EvenNumber) numbers;
};


LIST_ENTRY is a macro that expands to a structure holding the pointers for the next and previous entries. We can see this by having gcc display the output from the preprocessor. (We won't look at the preprocessor output at every step in this post, but studying the output of gcc -E on your own can be very enlightening.)

$ gcc -E list.c | grep "int val" -A2 -B1
struct EvenNumber {
int val;
struct { struct EvenNumber *le_next; struct EvenNumber **le_prev; } numbers;
};


OK, we've now modified our EvenNumber structure so that it can be inserted into a list simply by adding a LIST_ENTRY field to our structure. Next, we need a way to reference the list as a whole. We do this using the LIST_HEAD macro that declares a new structure which holds a pointer to the first element in the list.

LIST_HEAD(EvenNumbersHead, EvenNumber) even_numbers_head;


The preprocessor will expand this to the following (reformatted for readability).

struct EvenNumbersHead {
struct EvenNumber *lh_first;
} even_numbers_head;


We can see that the first argument to LIST_HEAD is the name of the new structure that we're creating, and the second argument is the name of the structure that will be linked together in the list.

Before we can use the list, we need to initialize the head by calling LIST_INIT with a pointer to our "head" instance, which effectively creates an empty list.

LIST_INIT(&even_numbers_head);


At this point we have a doubly-linked list that is referenced by the variable even_numbers_head. We may add to the list instances of the EvenNumber structure using the macros LIST_INSERT_HEAD, LIST_INSERT_BEFORE, or LIST_INSERT_AFTER.

The basic structure of our program is outlined here.

struct EvenNumber {
int val;
LIST_ENTRY(...) numbers;
};

LIST_HEAD(...) even_numbers_head;

int main(void) {
LIST_INIT(&even_numbers_head);

for (int i = 0; i < 100; i++) {
...
LIST_INSERT_HEAD(&even_numbers_head, ...);
}

LIST_FOREACH(..., &even_numbers_head, numbers) {
...
LIST_REMOVE(theNum, numbers);
...
}

return 0;
}


The full source for this example can be seen in list.c. We can compile and run the program as follows.

$ gcc -o list list.c -std=c99 -Wall
$ ./list
[...output omitted...]
10
8
6
4
2
0


We see that the output is in reverse order. This is because we are using LIST_INSERT_HEAD, which adds the new entry at the beginning of the list. If we want to display the list in increasing order (ignoring that we could simply pipe the output through sort -n) we could add each new entry to the end (or tail) of the list rather than the head. However, for this we should use a different data structure; a TAILQ. TAILQs work very similarly to the LISTs already discussed, so here's the full program source using a TAILQ rather than a LIST.

For more information about these convenient functions, see the comments in /usr/include/sys/queue.h as well as the QUEUE(3) man page. Also note that these functions are available (and heavily used) within the kernel. However, when used in the kernel the header file to include is /System/Library/Frameworks/Kernel.framework/Headers/sys/queue.h.

...now, to see if my wife is finished getting ready yet...

Friday, December 15, 2006

A Kernel Extension... by hand

I generally like using Xcode. It typically does the job, and usually even makes the job easier. It shields the user from having to worry about mundane details that are all too common when building software. However, for the same reason it can increase a developer's productivity, it can also make it more difficult to understand what's actually going on. For example, Xcode has a template to create a Generic Kernel Extension. The template compiles with no modifications. The resulting kernel extension (KEXT) can then be loaded and unloaded without having to write one line of code. This is super cool, but it also hides some of the details of what a KEXT really is. So, today we'll write a "Hello, World" KEXT from scratch without the help of Xcode

(note the issues above do not apply to Xcode alone, rather they apply to almost all IDEs)

Well, let's just jump right in. Basically, a KEXT is a bundle on Mac OS X (it's also a "package"), which means it is a directory structure with some predefined form and an Info.plist file. Our sample KEXT will be named "MyKext.kext", and it will be in a directory structure that looks like this:

$ find MyKext.kext
MyKext.kext
MyKext.kext/Contents
MyKext.kext/Contents/Info.plist
MyKext.kext/Contents/MacOS
MyKext.kext/Contents/MacOS/MyKext


To start off, we need to make the basic directory structure.
$ cd
$ mkdir -p MyKext.kext/Contents/MacOS
$ cd MyKext.kext/Contents/MacOS


Now we can start writing our code. Our code will contain two main routines: MyKextStart() and MyKextStop(), which are called when the KEXT is loaded and unloaded respectively. It will also contain some required bookkeeping code that's needed in order to make our compiled binary proper. Our start and stop routines look like:
// File: mykext.c
#include <libkern/libkern.h>
#include <mach/mach_types.h>

kern_return_t MyKextStart(kmod_info_t *ki, void *d) {
printf("Hello, World!\n");
return KERN_SUCCESS;
}

kern_return_t MyKextStop(kmod_info_t *ki, void *d) {
printf("Goodbye, World!\n");
return KERN_SUCCESS;
}

... more to come in a minute


After these two methods (in the same mykext.c file) we need to put the required bookkeeping stuff.
extern kern_return_t _start(kmod_info_t *ki, void *data);
extern kern_return_t _stop(kmod_info_t *ki, void *data);

KMOD_EXPLICIT_DECL(net.unixjunkie.kext.MyKext, "1.0.0d1", _start, _stop)
__private_extern__ kmod_start_func_t *_realmain = MyKextStart;
__private_extern__ kmod_stop_func_t *_antimain = MyKextStop;
__private_extern__ int _kext_apple_cc = __APPLE_CC__;


This stuff basically declares some needed structures, and it also sets up our routines (MyKextStart() and MyKextStop()) so that they're called on load and unload, by assigning them to the _realmain and _antimain symbols respectively.

OK, now comes the tricky part: the compile. KEXTs are compiled statically, they can only use certain headers that are available in the kernel, and they can't link with the standard C library. These requirements basically translate into a gcc command like the following:

$ gcc -static mykext.c -o MyKext -fno-builtin -nostdlib -lkmod -r -mlong-branch -I/System/Library/Frameworks/Kernel.framework/Headers -Wall


If the planets are properly aligned, you won't get any errors or warnings, and you'll end up with a Mach-O object file in the current directory named MyKext. This is your actual compiled KEXT. (At this point, it's fun to inspect this file using otool. For example, otool -hV MyKext, and otool -l MyKext. Read the man page for otool(1) for more details here.)

Now, the last thing we need to do (before we actually load this thing up) is to give our KEXT an Info.plist. The easiest way to do this is to copy another KEXT's Info.plist file, and change the names of a few things. For this example, I'm going to copy /System/Library/Extensions/webdav_fs.kext/Contents/Info.plist.
$ cd ..
$ pwd
/Users/jgm/MyKext.kext/Contents
$ cp /System/Library/Extensions/webdav_fs.kext/Contents/Info.plist .


Now, you'll need to edit the file and change the value of the "CFBundleExecutable" key to MyKext, and the value of "CFBundleIdentifier" to net.unixjunkie.kext.MyKext (or whatever you set that value to in your mykext.c file).

Okay, it's show time. To load any KEXT, all files in the KEXT must be owned by root and be in group wheel. The files must also have certain permissions in order to load. Here's the steps to load the KEXT.

$ cd /tmp
$ sudo -s
# cp -rp ~/MyKext.kext .
# chown -R root:wheel MyKext.kext
# chmod -R 0644 MyKext.kext
# kextload -v MyKext.kext
kextload: extension MyKext.kext appears to be valid
kextload: loading extension MyKext.kext
kextload: sending 1 personality to the kernel
kextload: MyKext.kext loaded successfully
# tail -1 /var/log/system.log
Dec 15 20:15:47 jgm-mac kernel[0]: Hello, World!


We can see that our MyKextStart() was called. Now let's unload it and see what happens.

# kextunload -v MyKext.kext
kextunload: unload kext MyKext.kext succeeded


Wahoo! It looks like we made a kernel extension with our own 10 fingers, and it worked! :-)

That was fun. Check out Amit Singh's Mac OS X Internals book for more cool bits about what that required "bookkeeping" stuff was in our source file, and why it's required.