Scaling Drupal: HTTP pipelining and benchmarking revisited

UPDATE: I've updated some of the numbers below to reflect corrections for a testing error. Let's just say to be sure not to benchmark with any external links in your test pages (because if you do use external links you'll obviously be benchmarking the external server too, which is not what we want in this case). To summarize the effect of these corrections - having lighttpd in front of Apache and pipelining actually provide a substantially larger boost in performance than I had indicated before. Other than that the results are the same.

So things with my first attempt at benchmarking HTTP pipelining did not go exactly as planned. It turns out that if two different domains/subdomains you are using for content on your site are pointing to the same IP, based on previous testing, it looks like browsers (at least FireFox) will not pipeline requests (e.g., create more concurrent requests to your site) because it considers the requests as being from the same origin. In order for a browser to pipelining requests at all, they seem to require two domains/subdomains which are using two separate/unique IPs. If you read the Wikipedia entry for hostnames this all makes sense, as it indicates domains are associated with IP's, and browserscope's testing of browsers checks for "Connections per Hostname", not "Connections per Domain".

After figuring out how to get requests to pipeline correctly, I re-benchmarked all the configurations from the first article . Everything from that article regarding lighttpd is still holds true, so without covering those aspects again, here's the updated benchmarks and notes for browser request pipelining:

  • Once the conditions for request pipelining was setup correctly there were discernable performance implications. Some of them I definitely wasn't expecting. On the one end of the spectrum, with browser pipelining working (via string replacement of domains within the rendered HTML) and lighttpd serving the static files there was an 11% increase in throughput vs not using the pipelining methods. So static file serving ='s good, and static file serving + HTTP pipelining ='s a little better.

    This is not where the story ends with pipelining however, as there was a net performance decrease by enabling pipelining with all configurations which did not use a separate static file server! (in my case lighttpd on the same machine)

27 January, 2010

Drupal/server optimization may matter little if you've got the leeches

During the course of administering a server full of various sites which have been Farked, Dugg, and StumbledUpon'd, I've learned first hand the value of optimizing a Drupal site/server to handle large amounts of traffic. I've also learned that eventually it's likely that the level of optimization for one's Drupal site/server will be rendered mostly irrelevant by frequent, and (mostly) malicious, circumstances.

Malicious and/or misdirected requests
Compared to more popular subjects such as Drupal optimization, Apache, MySQL, and/or PHP optimization - the subject of malicious requests gets a rare mention. Despite losing the popularity contest to those sexier subjects, rest assured, there is a lot more to running a site than just tuning the former items. All of those things can be working really, really well and your site/server can still be hammered to a state of dysfunction - even with very few users coming through the site.

Welcome to the wonderful world of denial of service attacks and/or server spam. Broadly defined this includes anything that is requesting something from your server that is of a malicious (usually the case) and/or misguided origin. Make no mistake it's a problem which anyone who is running a medium to large size web site will contend with, whether they know it or not.

Server spam/DDS attacks are NOT uncommon problems limited only to large or 'unlucky' websites. If you have a server and/or VPS with a frequently trafficked site, or especially one with several frequently trafficked sites you will be amazed at just how much processor, memory, and bandwidth can be allocated to malicious and/or wrongly directed requests at any given point in time. The exact of amount of resources that these leeches suck up varies greatly depending on many factors, but on the high-end I can share that several times over the last 6 months alone I've personally witnessed, and fixed, crippling issues stemming from server spam for sites that I system admin for. Likewise several months ago I was happy to be able to help resurrect Drupal.org one day when it was suffering from a particularly nasty malbot. (How many people has this happened to that never realize what was going on, and who in turn just blamed their host and/or Drupal??)

So once one realizes that server spam is a real, and not theoretical, problem how does one confirm the issue(s) on their own site/server and what can they do about it?

23 September, 2007
Subscribe to RSS - optimization