First, benchmarking a web server is not an easy thing. To benchmark a web server the time it will take to give a page is not important: you don't care if a user can have his page in 0.1 ms or in 0.05 ms as nobody can have such delays on the Internet.
What is important is the average time it will take when you have a maximum number of users on your site simultaneously. Another important thing is how much more time it will take when there are 2 times more users: a server that take 2 times more for 2 times more users is better than another that take 4 times more for the same amount of users. If you run more than a web server on your computer, you will also want to look at the load average and CPU time of your system. Here is a typical output of the command uptime:
22:39:49 up 2:22, 5 users, load average: 0.01, 0.01, 0.00
And an extract from the top(1) man page:
" The load averages are the average number of process ready to run during the last 1, 5 and 15 minutes "
So the lower your load average is, the better for the other programs on your machine.
Now comes the next problem: how can you stress your web server with a maximum number of connections when your client (the machine making the request) will usually not be able to cope with the server and with the number of users you have.
To do this, increase the number of sockets you can have on your system. Under some systems it is 1024, which is too low, see Section 5.3 for more information. The next thing to do is to have a good client program written with threads and non-blocking sockets. If you use a multi-fork program on a single client, it will never cope with any web server. It is also good to have several clients stressing the server together.
Last, if you want to compare two web servers, be sure that they are on the same hardware, OS, and network. The same holds for the client(s).