Scaling Drupal: HTTP pipelining and benchmarking revisited
UPDATE: I've updated some of the numbers below to reflect corrections for a testing error. Let's just say to be sure not to benchmark with any external links in your test pages (because if you do use external links you'll obviously be benchmarking the external server too, which is not what we want in this case). To summarize the effect of these corrections - having lighttpd in front of Apache and pipelining actually provide a substantially larger boost in performance than I had indicated before. Other than that the results are the same.
So things with my first attempt at benchmarking HTTP pipelining did not go exactly as planned. It turns out that if two different domains/subdomains you are using for content on your site are pointing to the same IP, based on previous testing, it looks like browsers (at least FireFox) will not pipeline requests (e.g., create more concurrent requests to your site) because it considers the requests as being from the same origin. In order for a browser to pipelining requests at all, they seem to require two domains/subdomains which are using two separate/unique IPs. If you read the Wikipedia entry for hostnames this all makes sense, as it indicates domains are associated with IP's, and browserscope's testing of browsers checks for "Connections per Hostname", not "Connections per Domain".
After figuring out how to get requests to pipeline correctly, I re-benchmarked all the configurations from the first article . Everything from that article regarding lighttpd is still holds true, so without covering those aspects again, here's the updated benchmarks and notes for browser request pipelining:
- Once the conditions for request pipelining was setup correctly there were discernable performance implications. Some of them I definitely wasn't expecting. On the one end of the spectrum, with browser pipelining working (via string replacement of domains within the rendered HTML) and lighttpd serving the static files there was an 11% increase in throughput vs not using the pipelining methods. So static file serving ='s good, and static file serving + HTTP pipelining ='s a little better.
This is not where the story ends with pipelining however, as there was a net performance decrease by enabling pipelining with all configurations which did not use a separate static file server! (in my case lighttpd on the same machine)
Particularly in the configuration of Apache-only (e.g., no static file server/lighttpd), Apache 'KeepAlive On', and pipelined static content -- the performance hit was a crushing 67% less throughput than even the best non-lighttpd configuration (which used no pipelining and Apache keep-allive off). Intrigued, I considered what could cause this result and hypothesized that the newly increased number of requests to Apache's 20 available clients (the amount I used for testing) had overwhelmed them, and they simply locked up. To test this idea I increase maxclients to 40 and rebenchmarked this configuration. Sure enough throughput went up by 60% versus what it was with 20 maxclients (but still not higher than running without pipelining/string-replacement). So pipelining can truly provide too much of a good thing in some cases.
- 74% = the increase in throughput by using all-around best configuration, lighttpd + pipelining, versus the best performing Apache-only scenario (no-pipelining, KeepAlive off).
- 534% = the increase in throughput by using all-around best configuration, lighttpd + pipelining, versus the all-around worst scoring scenario (Apache-only, pipelining on, KeepAlive on)
- Regarding Apache KeepAlive: The highest performing configuration (Apache with lighttpd serving static files by using a combination of mod_proxy, mod_rewrite, and string-replacement to affect the domains in the rendered HTML) used Apache KeepAlive on, but Apache KeepAlive off actually performed better in all cases where lighttpd was not used in conjunction with Apache.
There are a lot of considerations to take into account before enabling pipelined requests for a site. If you want a net gain from it (and what other reason would there be to do it), you'll either need a static file server of some kind (my conclusion/recommendation), or else no-static file server but some seriously beefed up CPU/memory and some heavy testing/tinkering with httpd.conf.
Thanks to everyone who left comments on the earlier article. They helped me think through the things I needed to in order to figure out how to get request pipelining working.