Sunday, February 26, 2006

Google Page Creator (pages.google.com)

Wow! If you haven't checked out Google's new Page Creator product, drop what you're doing and check it out now. It's a WYSIWYG web page creator, done online in AJAX, that generates XHTML strict sites very easily. It's really cool and fun to play with. And again, it amazes me what Google can do with JavaScript.

Sunday, February 19, 2006

Change __MyCompanyName__ in Xcode

If you're sick of seeing __MyCompanyName__ in the header comments of all your Xcode files, you can set the default company name in Xcode with:


defaults write com.apple.Xcode PBXCustomTemplateMacroDefinitions
-dict ORGANIZATIONNAME "Blah, Inc"

(all entered on one line, of course)

Thursday, February 16, 2006

Squashing a Real Bug on Darwin

I previously talked about a really cool bash trick called process substitution, which allows you to use a process almost anywhere you can use a file. For example, diff <(ls dir1) <(ls dir2) would allow me to diff the contents of dir1 and dir2.

The Problem



For the most part, process substitution works great on Mac OS X, but there are cases where it doesn't work. For example:

$ diff <(echo foo) <(echo bar)

produces no output, when clearly the string "foo" differs from the string "bar". But why?


Troubleshooting



Well, let's check out the tools in our toolbox: gdb, gcc, vm_stat, vmmap, nm, otool, stat, etc. Hmm, let's try a few more experiments first.

$ diff <(echo foo) <(echo bar)
$ diff <(echo foo) <(echo barX)
1c1
< foo
---
> barX

$ diff <(echo foo) <(echo bar)
$ diff <(echo foo) <(sleep 1; echo bar)
1c1
< foo
---
> bar

Interesting...

$ stat <(echo foo) <(echo bar)
520093697 0 prw-rw---- 1 jgm jgm 0 4 "Feb 16 21:12:21 2006" "Feb 16 21:12:21 2006" "Feb 16 21:12:21 2006" 512 8 0 /dev/fd/63
520093697 0 prw-rw---- 1 jgm jgm 0 4 "Feb 16 21:12:21 2006" "Feb 16 21:12:21 2006" "Feb 16 21:12:21 2006" 512 8 0 /dev/fd/62

Very, interesting! According to stat(1) the two named pipes created have identical attributes except for the file name. What's more is that the two files have the same inode number!? But what's that you say? How can a two different files (i.e. not hard-links) on the same filesystem have the same inode number? According to POSIX this isn't allowed. So maybe diff is trying to do a quick short circuit and saying "hey, the files have the same inode number and are on the same filesystem, so they must be the same". Maybe.

Well, let's get the source code for diff and check it out.

$ curl http://darwinsource.opendarwin.org/tarballs/other/gnudiff-13.tar.gz > gnudiff-13.tar.gz
$ tar -zxvf gnudiff-13.tar.g
$ cd gnudiff-13
$ make

OK, now let's test our freshly built diff.

$ /tmp/gnudiff/Build/src/diff <(echo foo) <(echo bar)
$

Yep, our new version has the same problem that we want to fix.

OK, so let's take a look at some source. gnudiff/src/diff.c looks like a good place to start. Just search for the word "main" and we can quickly check out the main function to get an idea of how diff starts to do what it does. Around line 713 we see the call
int status = compare_files ((struct comparison *) 0, from_file, argv[optind]);
which looks very promising. We find the definition of this function at line 1047. Now just skim through this function to get an idea what it does. Around line 1214 we see a comment that looks very promising!

if ((same_files = (cmp.file[0].desc != NONEXISTENT
&& cmp.file[1].desc != NONEXISTENT
&& 0 < same_file (&cmp.file[0].stat, &cmp.file[1].stat)
&& same_file_attributes (&cmp.file[0].stat, &cmp.file[1].stat)))
&& no_diff_means_no_output) {
/* The two named files are actually the same physical file.
We know they are identical without actually reading them. */

}

Oh, I bet this has something to do with the problem! What do those two "same_file*" functions do?

Armed with grep we find them defined as macros in gnudiff/src/system.h around line 361. These two macros basically check some file data returned by stat(2) to see if two files are identical. The attributes checked are things like inode number, uid, gid, size, mtime, ctime, etc. All attributes that were identical when we checked the stat output of our two named pipes. Take a second and glance back up at the output from stat <(echo foo) <(echo bar). I'll wait... back? OK. So, it sorta makes sense why that diff may have failed. And it also makes sense why diff <(echo foo) <(sleep 1; echo bar) would have worked. Can you guess why? (hint: think about the modification times for each fifo)


A Fix



What's the best fix for this? One could argue that the problem is that the HFS+ filesystem allows two files on the same filesystem to have the same inode number, but HFS+ really isn't an inode based filesystem. On HFS+, inode numbers are really just the volume's catalog node ID. Plus, it's probably a big pain to modify the Darwin Kernel or the HFS+ filesystem code.

Maybe we can instead fix the problem in diff. According to this technote a CNID of zero is never used and indicates nil, so maybe diff should not shortcut any files with an inode of 0? Let's try it. In gnudiff/src/system.h, make the following modification to the same_files(s, t) macro:

# define same_file(s, t)
((((s)->st_ino == (t)->st_ino)
&& ((s)->st_dev == (t)->st_dev))
&& ((s)->st_ino != 0) && ((t)->st_ino != 0) \
|| same_special_file (s, t))

Then recompile with make (if necessary, type make clean; make). Now, let's see if we fixed the problem:

$ /tmp/gnudiff/Build/src/diff <(echo foo) <(echo bar)
1c1
< foo
---
> bar


YAY! That seems to have fixed the problem!


Conclusion



I possibly skipped the most important first step here, and that is use Google to see if someone else already figured out my problem! I'll probably go do that now! ;-)

In the meantime, I haven't tested this solution thoroughly, but I imagine it's safe. Hopefully, this (or some other) fix will make it into the diff code soon. G'nite.

Tuesday, February 14, 2006

The char *apple[] Argument Vector

We're all familiar with the arguments passed to the main function by the OS:


  1. int argc

  2. char *argv[]

  3. char *envp[]


But programs started on Mac OS X (i.e. Darwin) actually have access to another argument - the apple vector. The apple vector is defined as char *apple[] and it's passed as the 4th argument to the main() function (it's actually stored right after envp on the stack).

But what is it used for? Well, Apple can use the apple vector to pass whatever "hidden" parameters they want to any program. And they do actually use it, too. Currently, apple[0] contains the path where the executing binary was found on disk. What's that you say? How is apple[0] different from argv[0]? The difference is that argv[0] can be set to any arbitrary value when execve(2) is called. For example, shells often differentiate a login shell from a regular shell by starting login shells with the first character in argv[0] being a -. For example:

$ ps aux | grep -- -bash
jgm 262 0.0 0.1 27820 752 p1 S 5Feb06 0:01.58 -bash

So, we can see that the bash login shell on my Mac was started with a dash in its name. In this example, bash's argv[0] would equal -bash, but its apple[0] would contain the path to where the bash binary was actually found (likely apple[0] would be /bin/bash).

Let's write a simple program to see all this in action:

// Compile with: gcc -o apple apple.c
#include <stdio.h>
int main(int argc, char *argv[], char *envp[], char *apple[]) {
printf("argv[0] = %s\n", argv[0]);
printf("apple[0] = %s\n", apple[0]);
return 0;
}

And here's a few runs:

$ ./apple
argv[0] = ./apple
apple[0] = ./apple


$ PATH=. apple
argv[0] = apple
apple[0] = ./apple


$ PATH=/Users/jgm apple
argv[0] = apple
apple[0] = /Users/jgm/apple


So, we can see that apple[0] is not the same as argv[0] and that it contains the path to where the executing image was found on disk (taking into account the $PATH).

Now, if want to test the bash example above (where argv[0] doesn't match the binary name), we can write another small test program:

// Compile with: gcc -o exec_apple exec_apple.c
#include <unistd.h>
int main() {
char *theArgv[] = {"-apple", NULL};
execve("./apple", theArgv, NULL);
return 1;
}

And a run:

$ ./exec_apple
argv[0] = -apple
apple[0] = ./apple

So, just as we expected; argv[0] can really be set to anything by execve(2) but apple[0] should always contain the real path to the executing binary image.

Pretty neat huh?

UPDATE 10/30/2006 here

Monday, February 13, 2006

Nil and nil

Objective-C has some very interesting data types that often are misunderstood. Many of them can be found in /usr/include/objc/objc.h, or other files in that same directory. Below is a snippet taken from objc.h that shows the declaration of some of these types:


// objc.h
#import <objc/objc-api.h>

typedef struct objc_class *Class;

typedef struct objc_object {
Class isa;
} *id;

typedef struct objc_selector *SEL;
typedef id (*IMP)(id, SEL, ...);
typedef signed char BOOL;

#define YES (BOOL)1
#define NO (BOOL)0

#ifndef Nil
#define Nil 0 /* id of Nil class */
#endif

#ifndef nil
#define nil 0 /* id of Nil instance */
#endif


Let's cover some of them in a little more detail here:



id

This is not equivalent to void *. As the snippet from the header above indicates, id is a pointer to a struct objc_object, which is basically a pointer to any class derived from the Object (or NSObject) base class. Notice, that id is a pointer, so you do not need the asterisk when using id. For example: id foo = nil declares a nil pointer to any subclass of NSObject, whereas id *foo = nil declares a pointer to a pointer to a subclass of NSObject.


nil

This is equivalent to the C language's NULL value. It is defined in objc/objc.h and is used to refer to an Objective-C object instance pointer that points to nothing.


Nil

Yes, this is sort-of different than nil but they're defined in the same file. Nil (with a capital 'N') is used to define a pointer to an Objective-C class (type Class) that points to nothing.


SEL

Now this one is fun and interesting. SEL is the type of a "selector" which identifies the name of a method (not the implementation). So, for example, the methods -[Foo count] and -[Bar count] both share a selector, namely the selector "count". A SEL is a pointer to a struct objc_selector, but what the heck is an objc_selector? Well, it's defined differently depending on if you're using the GNU Objective-C runtime, or the NeXT Objective-C Runtime (like Mac OS X). Well, it ends up that Mac OS X maps SELs to simple C strings. For example, if we define a Foo class with a - (int)blah method, the code NSLog(@"SEL = %s", @selector(blah)); would output SEL = blah.


IMP

From the header above IMP is declared as id (*IMP)(id, SEL, ...), so it's a pointer to a function that takes an id (the "self" pointer), the SEL that was called, and some other variable arguments.


Method

The Method type is defined in objc/objc-class.h as:

typedef struct objc_method *Method;
struct objc_method {
SEL method_name;
char *method_types;
IMP method_imp;
};

So, this kind of ties together some of the other types that we talked about. So, a method is a type that relates selectors and implementations.


Class

From above, Class is defined to be a pointer to a struct objc_class, which is declared in objc/objc-class.h as:

struct objc_class {
struct objc_class *isa;
struct objc_class *super_class;
const char *name;
long version;
long info;
long instance_size;
struct objc_ivar_list *ivars;
struct objc_method_list **methodLists;
struct objc_cache *cache;
struct objc_protocol_list *protocols;
};

I'm not going to get into much detail here, other than to show the declaration. We'll talk more about this in a future post.





Well, that's about it for now. These are all important types and concepts in Objective-C and I thought they would be good to talk about. More later...

Saturday, February 11, 2006

Messaging nil in Objective-C

Sending a message to a nil object doesn't make much sense in many programming languages. For example, if you do this in Java you'll get the dreaded NullPointerException. But sending a message ("sending a message" in Objective-C is similar to "calling a method" in other OO languages) to a nil object is defined, okay, and incredibly useful in Objective-C. Actually, one of the most common coding idioms in objective-C is:


Foo *foo = [[Foo alloc] init];

which creates a Foo instance by sending the +alloc message to the Foo class, then sending the -init method to the returned instance. However, if +alloc fails and returns nil, the -init method will be sent to a nil object which simply ends up setting foo to nil (which is probably exactly what we'd want to happen anyway).


I'd like to see an example


OK, let's write some sample code to test this.

// Compile with: gcc -o nil nil.m -framework Foundation
#import <Foundation/Foundation.h>

@interface Foo : NSObject
- (NSString *)sayHi;
@end

@implementation Foo
- (NSString *)sayHi {
return @"Hello, World!";
}
@end

int main() {
Foo *foo = nil;
NSLog(@"Greeting = %@", [foo sayHi]);
return 0;
}

2006-02-11 20:49:26.372 nil[3406] Greeting = (null)

So, we can see that when we send the message -sayHi to a nil pointer the return value is nil.


How does this work?


The compiler turns message calls like [targetObject someSelector] into a C function call like objc_msgSend(targetObject, someSelector). So, to figure out what this returns we simply need to figure out what objc_msgSend() does when its first argument is nil. Well, we can download the source for the Objective-C runtime from Apple here. The file we're interested in is objc-msg-ppc.s (yes, it's in PPC assembly). If we search for "ENTRY _objc_msgSend" we'll see the function we're looking for. The comments are very useful in this file and we can pretty easily see that it checks if its first argument (passed in register r3), which happens to be the target object, is nil and if so it does a few other things and eventually returns nil. And since C functions on PowerPC chips return integer and pointer values in register r3 nothing needs to be done; the function simply returns and the result is that the caller thinks the function (or "message") returned nil. And since integers are returned the same way as pointers, sending a message that returns an int will return 0, simply because nil is #define'd to be 0 (/usr/include/objc/objc.h).


But what if the method returns a float?


Let's see...

#import <Foundation/Foundation.h>

@interface Foo : NSObject
- (float)blah;
@end

@implementation Foo
- (float)blah {
return 5.0;
}
@end

int main() {
Foo *foo = nil;
NSLog(@"blah = %f", [foo blah]);
return 0;
}

2006-02-11 21:20:20.948 nil[3441] blah = 0.000000

So, it looks like messages that return a float return 0.0 like we'd expect. Wrong! Change the test code as indicated:

void g(float f) {}
int main() {
g(2.0);

2006-02-11 21:22:47.094 nil[3452] blah = 2.000000

Ah-ha! Now the return value for messaging our nil object was 2.0! So, it looks like the return value in this case is whatever value happens to be in the appropriate floating point register.


Interesting! So what does it mean?


All this neat stuff means that it *is* safe to send a message to nil when:

  • The method is declared to return a pointer

  • The method is declared to return any integer value less than or equal to sizeof(void *) (32 on a 32-bit machine)



and it is NOT safe when

  • The method returns any floating point value

  • An integer value > sizeof(void *)



Also, it's usually *not* safe to message nil when the message returns a structure.

Conclusion



The ability to send messages to nil is an incredibly cool and powerful feature of Objective-C, but it may not always do what you intend. I've read that Apple is trying to standardize the behavior of messaging nil (they'll likely guarantee that it will "always" return a zero value), but this is currently not the case.

For more info check out these docs.

*DISCLAIMER: I've simplified a few things here to make this more understandable. I also did not cover issues related to messaging nil on Intel chips. Maybe I'll leave some of these things for future posts. If you have questions about any of this, or simply think I'm wrong about something, please post a comment. I'll get back to you as soon as possible. I love to discuss this stuff :-)