Scaling Drupal: Benchmarking static file serving with lighttpd and browser pipelining

I finally had a chance to investigate an optimization which I've been wondering about for a while now - serving static files of a site from somewhere else. As a side, but related, experiment I also tested the claim that serving files from a static file server/separate domain/subdomain will speed things up because it results in browsers opening more concurrent requests than they would from a single domain.

For my tests I used lighttpd (pron. lighty) as a static file server for Apache. The idea is that lighttpd, which is acclaimed as being fast and light on memory, will serve the non-dynamic pages of the site (images, CSS, Javascript, etc), which should thereby help relieve Apache of some of its workload. This arrangement involves changing the paths, either on the backend or frontend, to these static resources so that they no longer get served by Apache.

The pieces
All tests took place on my Macbook Pro and involved two pages on a large Drupal 5 site running Pressflow. For the static file server itself, I installed lighttpd using Macports. Two separate pages of the site were tested, the smaller page's number of static files was in the category of 'average' for most sites. The larger page of the two, was very large - 39 CSS files, 23 Javascript files, and 46 image files.

Methods tested and benchmarked
I implemented and benchmarked the following methods of path modification in order to enable static file serving:

  • For the first test method I modified the virtualhost in httpd.conf with some statements which take advantage of mod_rewrite and mod_proxy (this method makes leaves the image paths themselves the same but the request is served by lighttpd). You can test that lighttpd is serving the files by turning off lighttpd - if everything is working correctly your page will show up only as unformatted text when viewed in a browser.
  • For the next test method, I undid the mod_rewrite and mod_proxy stuff in Apache, and simply did string replacement within Drupal to change the paths for static resources so that they pointed to a separate domain which lighttpd served, so as to test the 'pipelining' claim (again the idea is that using a different domain/subdomain than your site's standard one will force browsers to open additional connections to your server which they wouldn't otherwise).
  • Lastly, and just to satisfy my curiosity about pipelining, I tried taking lighttpd out of the mix altogether and did string replacement in Drupal and created some subdomain - hostnames which merely pointed to the docroot. This let me test the pipelining claim, without lighttpd involved, and just using Apache itself. Just like the previous pipeline test, this didn't seem to produce any more concurrent connections that by just serving everything from one domain. (see updated info/benchmarks for pipelining here)

Observations / Results

  • Keepalive on for Apache was alway faster in every scenario without or without lighttpd (updated info/benchmarks for pipelining which added a caveat to this observation) (note: and this is completely beside the point of this article, but if you're getting Dugg you can save a dying server by turn keepalive off sometimes. Having keepalive off is 'slower' on a non-overwhelmed server, but on one that is struggling turning it off can possibly let you serve *something* to many more people than it could with keepalive on)
  • With all CSS and JS files un-aggregated there was a distinct and significant advantage to lighttpd serving the static content - a 21% increase in throughput in load testing with 10 concurrent users in JMeter. I also tried the everything-non-aggregated test with 300 current users and got a 19.7% increase with lighttpd in front.

    With only CSS aggregated and leaving the JS un-aggregated the improvement was 14.4%, and with both CSS and JS aggregated, taking into account margin or error, I couldn't reliably produce an advantage with lighttpd serving the static files.

  • At least based on using Firefox 3.5.7, using domains/subdomains to force browsers into pipelining requests didn't seem to work (verified via Firebug's 'Net' panel). Firefox did not seem to open any new concurrent connections that were not there before, and the benchmarks in JMeter and/or YSlow did not seem to improve by having the lighttpd images come from a completely different domain and/or subdomain (I tried both). (see updated info/benchmarks for pipelining here)
  • In all cases making sure there were enough spare Apache threads was a big factor in the throughput score inside JMeter. Like memory, the more the better, even with lighttpd in front. I tested using 10 concurrent users (in JMeter), and set maxclients in Apache to 20. I also benchmarked with normal caching on so that I could crank a larger number of concurrent requests at my Macbook - after all what I was trying to test was Apache/lighttpd/page-load-times, not MySQL.

Conclusions

  • Perhaps not surprisingly, the more static files you have the larger percentage of improvement you'll see by using a static file server. This means that if you're aggregating your CSS and JS already (and don't happen to have a couple hundred images per page) there may not be much of a gain to be had with a static file server. Of course, performance considerations aside you still may have a motivation to use a static file server if your current environment simply cannot handle the number of requests being sent to it, or perhaps you're just trying lower your bandwidth usage on your main server(s).
  • It is important to note that while static file serving is very related to CDN's (a CDN provides static file serving, but a static file server is not necessarily a part of a CDN). A well functioning CDN will add another aspect to the performance equation that a copy of lighttpd running on your server(s) won't - geographical implications to latency/download times and for client downloads (the idea of a CDN is to provide a download source to client browsers which is physically closer to them, which all other things being equal should result in faster response/download times). It may very well be that combined with the advantages of a CDN, static file serving would be a performance gain even for sites that aggregate their CSS and JS. This is an aspect that deserves it's own testing (something I did not have time to do for purposes of this article).
  • Even when the load testing showed big performance gains with static file serving (in the case of non-aggregated CSS and/or JS files), I couldn't benchmark any difference in individual page load times with YSlow. It's easy to imagine that this because an Apache/lighttpd that faces only having to fulfill a single page request, instead of a bunch of concurrent page requests, is going to handle most anything thrown at it on for that single page with out breaking a sweat (e.g., for this reason doing benchmarks by loading single pages is probably pretty pointless for many things).
  • lighttpd itself is compelling enough for me to want to try and see how it would do as a replacement for apache. (something for another day, perhaps)

Notes about benchmarking

  • I used JMeter, YSlow, Firebug, and Firefox for my benchmarking tasks.
  • Haven't included specific benchmark numbers here other than percentages, because there are so many variables when benchmarking a given site/setup that any numbers that they don't mean much to anyone else's situation.
  • I came to the conclusions above about after some fairly exhaustive testing. If you run your own benchmarks, be sure to run them several times and control any variable conditions to the best of your ability. JMeter benchmark numbers (as well as any benchmarking tool) can fluctuate wildly for any number of reasons, some of which may have nothing to do with what you're actually trying to benchmark.
  • I welcome anyone who wants to benchmark their own stuff and share their results, particularly if yours differ.
  • Please take all the above with a grain of salt.

Links that may be helpful
Lighttpd
MacPorts (where to get lighttpd for OS X)
Optimizing performance in Lighttpd
Pressflow
Firefox
YSlow
Firebug
JMeter
Using lighttpd as a static file server for Drupal (note I did not use the patch here, and did a few other things differently - maybe helpful as background and for setting lighttpd up though)
Optimizing Page Load Time (talks about pipelining)

25 January, 2010

Comments

Actually, I got the the inspiration for the string replacement code for testing pipeling in the tests above from the parallel module.

When I benchmarked the parallel module (which I didn't document in the article above), as well as with my own string replacement code, and all other forms of browser pipelining methods I tested (e.g., using mod_proxy, mod_rewrite), I didn't get any benchmarkable differences with or without them vs non-pipelining methods (e.g., just serving everything from a single domain).

The fact that you saw no improvement with multiple domains indicates that there may be something wrong with your setup. Because FF 3.0 and up definitely supports 6 connections per hostname and 30 connections max:

http://www.browserscope.org/?c...

I'm aware of how many connections FF is capable of and how browser pipelining is supposed to work in theory. What the data I got back from testing told me though, was that despite the theories there were no benchmarkable differences in either yslow or jmeter load tests by using subdomains and/or other domains for static content in order to trigger the pipelining effect. Until/unless I see benchmarks from someone else showing something to the contrary, I've got to conclude that despite the promise of pipelining, the results don't amount to anything measurable.

Updated: I've gotta say that I am very intrigued by the possibility of missing something really fundamental here. For instance, perhaps anytime two or more domains resolve to the same ip as each other (e.g., 127.0.0.1 in this case) to firefox they'd would still count as a single domain, and hence wouldn't trigger more open connections? I came up with this after testing my browser at browserscope and seeing "Connections per Hostname". Looking up hostname at wikipedia it seems that a domain *and* it's ip are intrinsically tied. So I guess the way to test this would be to make another local ip assign it some domains and then test things again with the multiple ip setup. I will try to do that when I have some time, unless someone else wants to beat me to it.

I don't know from the top of my head if it'll make a difference, but technically there are a lot of distinct IP addresses in 127.x.x.x. So you could try one of these as a second host address without having to configure additional interfaces.

Changing the ip of the other domain I used for the static files made all the difference. Will update the post soon with more information and new benchmarks.

One thing that seems to be missing in this test seems to be latency. You state that all your tests took place on your MBP which implies that latency is very close to zero.

The main advantage of pipelining and using multiple domains is that it reduces the impact of latency. If a page contains 30 resources (CSS/JS/images) and if there's 50ms of latency between your browser and your browser can open 3 connections per domain it will take at least 500ms before your browser has fetched everything needed for that page. That is, if things such as keepalive are disabled.

This number drops to 250ms if you manage to serve half of the resources from another server. If you have 0ms latency (everything served from localhost) your performance gains are approx 0ms because of these changes :)

So if you really want to test the impact of these changes you should test this on a remote server, not locally...

That is an interesting point, but even locally it seems to me that the pipelined tests should have resulted in more lighttpd/apache processes getting used concurrently, which if true seems like it should still have increased throughput and/or sped things up...

Perhaps along with testing the idea of using multiple ip's for the subdomains/domains (see comment above), it might be worth taking a look at some remote tests some day, too.