Blog

Scaling Drupal: HTTP pipelining and benchmarking revisited

UPDATE: I've updated some of the numbers below to reflect corrections for a testing error. Let's just say to be sure not to benchmark with any external links in your test pages (because if you do use external links you'll obviously be benchmarking the external server too, which is not what we want in this case). To summarize the effect of these corrections - having lighttpd in front of Apache and pipelining actually provide a substantially larger boost in performance than I had indicated before. Other than that the results are the same.

So things with my first attempt at benchmarking HTTP pipelining did not go exactly as planned. It turns out that if two different domains/subdomains you are using for content on your site are pointing to the same IP, based on previous testing, it looks like browsers (at least FireFox) will not pipeline requests (e.g., create more concurrent requests to your site) because it considers the requests as being from the same origin. In order for a browser to pipelining requests at all, they seem to require two domains/subdomains which are using two separate/unique IPs. If you read the Wikipedia entry for hostnames this all makes sense, as it indicates domains are associated with IP's, and browserscope's testing of browsers checks for "Connections per Hostname", not "Connections per Domain".

After figuring out how to get requests to pipeline correctly, I re-benchmarked all the configurations from the first article . Everything from that article regarding lighttpd is still holds true, so without covering those aspects again, here's the updated benchmarks and notes for browser request pipelining:

  • Once the conditions for request pipelining was setup correctly there were discernable performance implications. Some of them I definitely wasn't expecting. On the one end of the spectrum, with browser pipelining working (via string replacement of domains within the rendered HTML) and lighttpd serving the static files there was an 11% increase in throughput vs not using the pipelining methods. So static file serving ='s good, and static file serving + HTTP pipelining ='s a little better.

    This is not where the story ends with pipelining however, as there was a net performance decrease by enabling pipelining with all configurations which did not use a separate static file server! (in my case lighttpd on the same machine)

27 January, 2010

Scaling Drupal: Benchmarking static file serving with lighttpd and browser pipelining

I finally had a chance to investigate an optimization which I've been wondering about for a while now - serving static files of a site from somewhere else. As a side, but related, experiment I also tested the claim that serving files from a static file server/separate domain/subdomain will speed things up because it results in browsers opening more concurrent requests than they would from a single domain.

For my tests I used lighttpd (pron. lighty) as a static file server for Apache. The idea is that lighttpd, which is acclaimed as being fast and light on memory, will serve the non-dynamic pages of the site (images, CSS, Javascript, etc), which should thereby help relieve Apache of some of its workload. This arrangement involves changing the paths, either on the backend or frontend, to these static resources so that they no longer get served by Apache.

The pieces
All tests took place on my Macbook Pro and involved two pages on a large Drupal 5 site running Pressflow. For the static file server itself, I installed lighttpd using Macports. Two separate pages of the site were tested, the smaller page's number of static files was in the category of 'average' for most sites. The larger page of the two, was very large - 39 CSS files, 23 Javascript files, and 46 image files.

Methods tested and benchmarked
I implemented and benchmarked the following methods of path modification in order to enable static file serving:

25 January, 2010

Convert your MySQL database from MyISAM to InnoDB, and get ready for Drupal 7 at the same time

If you haven't already heard, Drupal 7 will default to using the InnoDB storage engine instead of MyISAM for MySQL (though a MyISAM database will continue to work just fine in Drupal 7). This is fairly substantial change within Drupal core, and as the thread in the issue queue I linked to shows, there were a lot of questions and apprehension about it. However...

...we are going to just skip over a lot of that apprehension and get down to point of this article - there's no good reason not to hop right into using InnoDB today on your Drupal 5 or Drupal 6 site. The rewards are; a possibly significant improvement in performance, a definite improvement in scalability (most highly trafficked Drupal sites have been using InnoDB for some time now because of this), and you'll start getting used to working with what will be more and more common in your Drupal-life, InnoDB.

My experience
I came to the conclusion about how great InnoDB is after researching the experiences of others, and after converting a large Pressflow-driven Drupal 5 site from InnoDB vs MyISAM. This change resulted in a 14% increased throughput during load tests performed in JMeter. That's a very substantial increase, and while everyone's mileage will vary based on their own site, server, and any number of variables it's clear enough to me that there's nothing to be afraid of as far as InnoDB goes (quite the contrary).

Converting your database to InnoDB
Before you go any further backup your database before doing any steps below. If you 'splode your database for any reason, you'll need it.

Here are the steps:

1. Shutdown MySQL

2. Move/copy/change the name of ib_logfile0 and ib_logfile1 files. (find where MySQL exists on your system - locations can vary greatly). MySQL will recreate these files when you start it up again. Not anytime you change the innodb_log_file_size parameter you will need to shutdown MySQL, move these files, and start up MySQL again.

3. Tune it up a bit
Based on a lot of searching around and benchmarking with JMeter I arrived at the setting below for running on my Macbook Pro. See the links at the end of this post for articles which can help you determine what to adjust these numbers to for other machines (ones with more RAM/CPU, for instance. The production server for this particular site ended up with 5000M setting for innodb_buffer_pool_size. So settings will, and should, vary greatly just depending).

18 January, 2010

"I have an older version Drupal site and I feel stuck between a rock and hard place..."

"Official support status" for various versions of Drupal core:

So, official Drupal.org policy is no support for more than one version back. Many Drupal developers/shops have the same policy as Drupal.org itself - no services, development, or maintenance for legacy Drupal versions.

In a perfect world free of money and time limitations, the benefits of staying with the most current version of Drupal make enough sense. However, the reality is that there are practical reasons to try and stick it out with an older site sometimes. These reasons can include (but are probably not limited to):

  • Older sites which are merrily chugging along. For these sites "upgrading" or "migrating" simply translates into having to paying big money just to have the same exact site they have now, only with a newer Drupal version number next to it.
  • Older Drupal sites of such customization, scope, and/or scale that upgrading them would be unmanageable in the near term. I worked on such a site for 4 years. With over 30+ custom coded modules (not just downloaded modules from Drupal.org), upgrading to a new version of Drupal would have taken hundreds if not thousands of hours.

Drupal project lead, Dries Buytaert, recently called for more developers to go after niches within the Drupal ecosphere. I believe serving users of older Drupal sites could be a valuable one to some Drupal users, as well as to the Drupal project itself.

As a fan of older Drupal versions and as someone who has spent a lot of time working with them (and yes, plenty of time working with newer Drupal versions), I'm throwing my hat into the ring to try and assist these particular Drupal users. If you have an older Drupal site (or know someone with one) which needs development, support, maintenance, or server administration drop me a line.

9 November, 2009

Misinformation: A response to Chris Wilson's Whitehouse.gov article on Slate

Published in: 

Yesterday, I came across one of the most fabricated and agenda-laden articles I've ever seen in the world of software and open source, "Why running the White House Web site on Drupal is a political disaster waiting to happen". (no-followed)

Despite wishing "Drupal and the White House nothing but happiness" at the outset, Chris Wilson quickly moves to scare the beejezus out of you about Drupal and make sure everyone understands that the thousands of people coming together to provide really awesome free software are actually all user-hating Nazis and that Drupal is a REALLY. BAD. THING.

Unfortunately, the thing about misinformation is that it often does cause a stir. As one can see from this comment left on another article about Whitehouse.gov, a well known and curious Joomla developer is linking to the propaganda piece and referring to it as a "very different view". So misinformation success, it's now a 'point of view' whether Drupal folk are all user-hating nazis or not.

The software world is not generally the hack-political world where all one needs is an implication and a "reliable source" to start a false "debate" on whether something is true or not. But the slate article, and reactions to it, does demonstrate the point that the Drupal community needs to be prepared to address misinformation. This is a (large) annoyance, of course (more time fighting propaganda ='s less time coding or helping newcomers), but as a great person once said, "With great power comes great responsibility".

UPDATE: Informationweek.com weighs in with some sense on this issue:

"The news that WhiteHouse.gov relaunched this week running open source Drupal software raised eyebrows and hackles among knee-jerk anti-Obama types and a small cadre of ignorant bloggers."
28 October, 2009

Whitehouse.gov now powered by Drupal

Word is out that Whitehouse.gov is now powered by Drupal. The Washington Post has the details of this big win for Drupal and Open Source:

"The online-savvy administration on Saturday switched to open-source code for http://www.whitehouse.gov meaning the programming language is written in public view, available for public use and able for people to edit.

"We now have a technology platform to get more and more voices on the site," White House new media director Macon Phillips told The Associated Press hours before the new site went live on Saturday. "This is state-of-the-art technology and the government is a participant in it.

Under the open-source model, thousands of people pick it apart simultaneously and increase security. It comes more cheaply than computer coding designed for a single client, such as the Executive Office of the President. It gives programmers around the world a chance to offer upgrades, additions or tweaks to existing programs that the White House could - or could not - include in daily updates.

Yet the system - known as Drupal - alone won't make it more secure on its own, cautioned Ari Schwartz of the Center for Democracy and Technology.

"The platform that they're moving to is just something to hang other things on," he said. "They need to keep up-to-date with the latest security patches."

UPDATE: More coverage of this choice along with information on who worked on the site here.

UPDATE II: Dries (Drupal founder and project lead) blogs about it
UPDATE III: MSNBC has an article about Whitehouse.gov. Slightly misleading title to it though.

24 October, 2009

Pages

Subscribe to RSS - blogs