Sunday, August 20, 2006

APUE2e Acknowledgement

It's no secret that Advanced Programming in the UNIX Environment, by W. Richard Stevens and Stephen A. Rago, is a staple for all developers who write code for any flavor of Unix. If you don't have have at least one copy already, I'd strongly encourage you to pick one up.

That said, don't forget to check out the book's website at apuebook.com. And while you're there, check out item 18 on the "Additional Acknowledgements" page ;-) Anyway, here's a little bit more detail about that particular issue.

The SUSv3 (Single Unix Specification version 3) states that... "If connect() fails, the state of the socket is unspecified. Conforming applications should close the file descriptor and create a new socket before attempting to reconnect." And as an example, retrying connect() doesn't always work on Darwin 8.6.0 and FreeBSD 6.0-RC1 (the only versions of these OSes that I checked).

The case I found where retrying connect() doesn't work is when I try to connect() to a port that's not listening. The client (calling connect()) sends the SYN, a RST is received (as expected) and connect() returns -1 with errno set to ECONNREFUSED. This is all as expected. However, if that same socket is used to attempt the connect() again, no packets are sent and connect() immediately fails with EINVAL. This code illustrates:

int main(void) {
struct sockaddr_in remote_addr;

bzero(&remote_addr, sizeof(remote_addr));
remote_addr.sin_family = AF_INET;
remote_addr.sin_port = htons(3333);
inet_pton(AF_INET, "127.0.0.1", &remote_addr.sin_addr);

int sock = socket(AF_INET, SOCK_STREAM, 0);

while (connect(sock, (struct sockaddr *)&remote_addr,
sizeof(remote_addr)) == -1) {
perror("failed to connect");
sleep(2);
}
...
}


Again, on Darwin and FreeBSD the second time through the while-loop, EINVAL is immediately returned. And since no packets are actually sent, if the port at 127.0.0.1:3333 ever does open up, it will not be detected.

On the 2.4 Linux kernel I tested, the code does what I initially expected and it returns ECONNREFUSED every time.

Since, the SUSv3 says that a failed connect() leaves the socket in an undefined state, I don't think this is actually a bug. But it looks like it also means that the connect_retry() code in figure 16.9 (of APUE2e) is not portable.

So, to summarize, the issue is that if a connect() call fails for any reason, the state of the socket is undefined. To be portable, you must close the socket and create a new one before calling connect() again.

When I emailed Stephen Rago about this, he was very responsive and nice. He feels that this bug lies with the sockets implementation, but he added an FAQ on the book's website about it anyway.

Again, if you're reading this blog, and you don't already have a copy of this book, you should probably go get one now.

No comments: