Monday, October 29, 2007

Better compile-time assertions

In a previous post I talked about one way to do compile-time assertions in C and Objective-C. The example used works fine, but it has some drawbacks. Specifically, each call to COMPILE_ASSERT* needs to have a unique message string, otherwise an error is given due to the attempt to redefine a typedef.

$ cat -n compile_assert.c 
1 #define COMPILE_ASSERT(test, msg) typedef char _COMPILE_ASSERT_ ## msg [ ((test) ? 1 : -1) ]
2 int main(void) {
3 COMPILE_ASSERT(1 == 1, blah);
4 COMPILE_ASSERT(2 == 2, blah);
5 return 0;
6 }
$ gcc -Wall compile_assert.c
compile_assert.c: In function ‘main’:
compile_assert.c:4: error: redefinition of typedef ‘_COMPILE_ASSERT_blah’
compile_assert.c:3: error: previous declaration of ‘_COMPILE_ASSERT_blah’ was here


One obvious and easy solution to this problem is to put each typedef in its own lexical scope by wrapping it in a do { ... } while (0). This would work, but then we would lose the ability to use the compile-time assertions in global scope or in header files. With regular runtime assertions this probably isn't a big deal, but having compile-time assertions in a header can be incredibly useful. For example, your code may expose some tweakable knobs by #defineing constants, but it might be important that one of the constants is always less than another. This is a perfect place to use a compile-time assertion. Having the assertion itself right in the header file will help ensure the code's correctness and can also serve as a form of documentation.

Since we want to retain the ability to use these assertions anywhere, including in headers, we need to find another solution. Well, another solution is to make sure the typedef'd identifier is unique. We could simply put this burden on the caller and tell them that their message strings must be unique within a given scope (which probably isn't that big of a burden in reality), but we can do better.

We can use the C preprocessor symbol __LINE__ to include the current line number in the typedef identifier name. That should guarantee that the identifiers are unique in most cases (there are some corner cases where this is not exactly true). The only trick here is rigging up the preprocessor macros to do what we want. Here are the macros that I came up with:

$ cat -n compile_assert_NEW.c 
1 #define _COMPILE_ASSERT_SYMBOL_INNER(line, msg) __COMPILE_ASSERT_ ## line ## __ ## msg
2 #define _COMPILE_ASSERT_SYMBOL(line, msg) _COMPILE_ASSERT_SYMBOL_INNER(line, msg)
3 #define COMPILE_ASSERT(test, msg) \
4 typedef char _COMPILE_ASSERT_SYMBOL(__LINE__, msg) [ ((test) ? 1 : -1) ]
5
6 COMPILE_ASSERT(1 == 1, foo);
7 COMPILE_ASSERT(2 == 2, foo);
8
9 int main(void) {
10 return 0;
11 }
$ gcc -Wall compile_assert_NEW.c


We can see that the usage of COMPILE_ASSERT worked two times in a row, with the exact same message string, and it worked in the global scope. This is just what we wanted.

The weird part is that we need 3 levels of macros, and one of them doesn't look like it actually does anything at all (the one on line 2). The macro on line 2 is needed because of the way in which the preprocessor expands macros. Macros are expanded by doing multiple passes over a given line until all the macros have been evaluated. However, once a macro is expanded the resulting tokens are not again checked for more macros until the next pass. This is explained in section 12.3 of The C Programming Language, Second Edition.

Also, when writing and debugging macros, it's very useful to use gcc -E which stops after the preprocessing stage and dumps the preprocessed file to standard output.

$ gcc -E compile_assert_NEW.c 
# 1 "compile_assert_NEW.c"
# 1 ""
# 1 ""
# 1 "compile_assert_NEW.c"

typedef char __COMPILE_ASSERT_6__foo [ ((1 == 1) ? 1 : -1) ];
typedef char __COMPILE_ASSERT_7__foo [ ((1 == 1) ? 1 : -1) ];

int main(void) {
return 0;
}


* in the previous blog post I called it "STATIC_ASSERT", but I'm now calling it "COMPILE_ASSERT" because the word "static" is too overloaded

No comments: