Scaling Drupal: Benchmarking static file serving with lighttpd and browser pipelining
I finally had a chance to investigate an optimization which I've been wondering about for a while now - serving static files of a site from somewhere else. As a side, but related, experiment I also tested the claim that serving files from a static file server/separate domain/subdomain will speed things up because it results in browsers opening more concurrent requests than they would from a single domain.
Methods tested and benchmarked
I implemented and benchmarked the following methods of path modification in order to enable static file serving:
- For the first test method I modified the virtualhost in httpd.conf with some statements which take advantage of mod_rewrite and mod_proxy (this method makes leaves the image paths themselves the same but the request is served by lighttpd). You can test that lighttpd is serving the files by turning off lighttpd - if everything is working correctly your page will show up only as unformatted text when viewed in a browser.
- For the next test method, I undid the mod_rewrite and mod_proxy stuff in Apache, and simply did string replacement within Drupal to change the paths for static resources so that they pointed to a separate domain which lighttpd served, so as to test the 'pipelining' claim (again the idea is that using a different domain/subdomain than your site's standard one will force browsers to open additional connections to your server which they wouldn't otherwise).
- Lastly, and just to satisfy my curiosity about pipelining, I tried taking lighttpd out of the mix altogether and did string replacement in Drupal and created some subdomain - hostnames which merely pointed to the docroot. This let me test the pipelining claim, without lighttpd involved, and just using Apache itself.
Just like the previous pipeline test, this didn't seem to produce any more concurrent connections that by just serving everything from one domain.(see updated info/benchmarks for pipelining here)
Observations / Results
Keepalive on for Apache was alway faster in every scenario without or without lighttpd(updated info/benchmarks for pipelining which added a caveat to this observation) (note: and this is completely beside the point of this article, but if you're getting Dugg you can save a dying server by turn keepalive off sometimes. Having keepalive off is 'slower' on a non-overwhelmed server, but on one that is struggling turning it off can possibly let you serve *something* to many more people than it could with keepalive on)
- With all CSS and JS files un-aggregated there was a distinct and significant advantage to lighttpd serving the static content - a 21% increase in throughput in load testing with 10 concurrent users in JMeter. I also tried the everything-non-aggregated test with 300 current users and got a 19.7% increase with lighttpd in front.
With only CSS aggregated and leaving the JS un-aggregated the improvement was 14.4%, and with both CSS and JS aggregated, taking into account margin or error, I couldn't reliably produce an advantage with lighttpd serving the static files.
At least based on using Firefox 3.5.7, using domains/subdomains to force browsers into pipelining requests didn't seem to work (verified via Firebug's 'Net' panel). Firefox did not seem to open any new concurrent connections that were not there before, and the benchmarks in JMeter and/or YSlow did not seem to improve by having the lighttpd images come from a completely different domain and/or subdomain (I tried both).(see updated info/benchmarks for pipelining here)
- In all cases making sure there were enough spare Apache threads was a big factor in the throughput score inside JMeter. Like memory, the more the better, even with lighttpd in front. I tested using 10 concurrent users (in JMeter), and set maxclients in Apache to 20. I also benchmarked with normal caching on so that I could crank a larger number of concurrent requests at my Macbook - after all what I was trying to test was Apache/lighttpd/page-load-times, not MySQL.
- Perhaps not surprisingly, the more static files you have the larger percentage of improvement you'll see by using a static file server. This means that if you're aggregating your CSS and JS already (and don't happen to have a couple hundred images per page) there may not be much of a gain to be had with a static file server. Of course, performance considerations aside you still may have a motivation to use a static file server if your current environment simply cannot handle the number of requests being sent to it, or perhaps you're just trying lower your bandwidth usage on your main server(s).
- It is important to note that while static file serving is very related to CDN's (a CDN provides static file serving, but a static file server is not necessarily a part of a CDN). A well functioning CDN will add another aspect to the performance equation that a copy of lighttpd running on your server(s) won't - geographical implications to latency/download times and for client downloads (the idea of a CDN is to provide a download source to client browsers which is physically closer to them, which all other things being equal should result in faster response/download times). It may very well be that combined with the advantages of a CDN, static file serving would be a performance gain even for sites that aggregate their CSS and JS. This is an aspect that deserves it's own testing (something I did not have time to do for purposes of this article).
- Even when the load testing showed big performance gains with static file serving (in the case of non-aggregated CSS and/or JS files), I couldn't benchmark any difference in individual page load times with YSlow. It's easy to imagine that this because an Apache/lighttpd that faces only having to fulfill a single page request, instead of a bunch of concurrent page requests, is going to handle most anything thrown at it on for that single page with out breaking a sweat (e.g., for this reason doing benchmarks by loading single pages is probably pretty pointless for many things).
- lighttpd itself is compelling enough for me to want to try and see how it would do as a replacement for apache. (something for another day, perhaps)
Notes about benchmarking
- I used JMeter, YSlow, Firebug, and Firefox for my benchmarking tasks.
- Haven't included specific benchmark numbers here other than percentages, because there are so many variables when benchmarking a given site/setup that any numbers that they don't mean much to anyone else's situation.
- I came to the conclusions above about after some fairly exhaustive testing. If you run your own benchmarks, be sure to run them several times and control any variable conditions to the best of your ability. JMeter benchmark numbers (as well as any benchmarking tool) can fluctuate wildly for any number of reasons, some of which may have nothing to do with what you're actually trying to benchmark.
- I welcome anyone who wants to benchmark their own stuff and share their results, particularly if yours differ.
- Please take all the above with a grain of salt.
Links that may be helpful
MacPorts (where to get lighttpd for OS X)
Optimizing performance in Lighttpd
Using lighttpd as a static file server for Drupal (note I did not use the patch here, and did a few other things differently - maybe helpful as background and for setting lighttpd up though)
Optimizing Page Load Time (talks about pipelining)