Apache

Scaling Drupal: Benchmarking static file serving with lighttpd and browser pipelining

I finally had a chance to investigate an optimization which I've been wondering about for a while now - serving static files of a site from somewhere else. As a side, but related, experiment I also tested the claim that serving files from a static file server/separate domain/subdomain will speed things up because it results in browsers opening more concurrent requests than they would from a single domain.

For my tests I used lighttpd (pron. lighty) as a static file server for Apache. The idea is that lighttpd, which is acclaimed as being fast and light on memory, will serve the non-dynamic pages of the site (images, CSS, Javascript, etc), which should thereby help relieve Apache of some of its workload. This arrangement involves changing the paths, either on the backend or frontend, to these static resources so that they no longer get served by Apache.

The pieces
All tests took place on my Macbook Pro and involved two pages on a large Drupal 5 site running Pressflow. For the static file server itself, I installed lighttpd using Macports. Two separate pages of the site were tested, the smaller page's number of static files was in the category of 'average' for most sites. The larger page of the two, was very large - 39 CSS files, 23 Javascript files, and 46 image files.

Methods tested and benchmarked
I implemented and benchmarked the following methods of path modification in order to enable static file serving:

25 January, 2010

High traffic: Serving 500,000+ unique visitors in 24 hours and over a 9,000,000 a month

Recently a client we helped get started with Drupal and admin their backend systems for, had their first 500,000+ unique-visitors day.

Everything went well, the lessons we've learned over the years of Drupal performance tuning (Part I, Part II), combined with well planned Apache/MySQL/PHP settings provided an event-free day (other than watching the hit counter go through the roof!).

Showing that it was no fluke, the site has had numerous 500,000+ days since then and has settled into an average of 150,000-200,000 unique views a day / 6,000,000-9,000,000 visitors a month.

If you have a Drupal site that you need to prepare for high traffic and high availability, give us a shout. We can help you handle it.

12 March, 2007

Prepare your Drupal site to be Slashdotted, Dugg, and Farked - Part II

Published in: 

It's been a couple weeks since we posted part one of our look at optimizing a Drupal site to withstand large amounts of traffic,
and since that time it happened again - a site we host, got "Farked" (an inbound link from Fark.com) even bigger than it did last time. In the 8 short hours since the link to the client's site went up, and as I write this - the site has received 27,000 + unique viewers. When I logged in there to the site there actually were 1850 users online at the same time.

We just about fell out of our chair when we saw that...

...after all this is a site is that's on a shared server - not a dedicated one. And those kind of numbers would even give some dedicated servers a thorough workout. In the meantime, it was operation 911. Forget the long term issue of finding a larger server space for this site which clearly is outgrowing it's enviroment - that could be handled afterwards. Right now, we had to get the server and site back pronto.

So quickly we posted a "We'll be back soon" message to the deluged site so that the rest of the sites on the server would work again, and then set out to get Farked site back up. Now 27,000 visitors in 8 hours would mean over 75,000 in 24. There's no shared hosting enviroment we know of that can handle a dynamic, php-driven, website with kind of load...so what to do...what to do.

Well, in short order we made an html file out of the page which Fark.com linked to and put a 301 redirect in the site .htaccess file so that anyone visiting that page would get redirected to the much lighter html file and thereby bypass Drupal and all of the bootstrapping and database overhead. The rest of the pages on the site would function just as they always do, of course. What this did was allow us to take the site live again, serve wayyyy more people than should be possible for on a $20 a month hosting plan, and keep everyone else's site happy and screaming along.

Two other things we did since the time we wrote the first article helped a lot too:

condensed the number of, and compressed, the css files for the site. Simply put, we cut the number of files requested on each page visit by 6 by aggregating all the css into one file. That's a lot of requests when you multiply it by thousands.

Disabled the statistics.module. This is a no-brainer. Apache is logging eveything already and there are more robust tools to process and interpret the logs with than what comes with Drupal (no offense Drupal) - so this is a LOT of overhead that the site/server doesn't need to deal with. Besides the input/output processesing, it also speeds up the database to not have such a larger access log file hangin around.

So there you go - how to make lemonade out of more hits than you should be able to handle. 

15 February, 2007

Blocking referrer spam, mal-bots, and other malicious weasels with htaccess

Anyone who runs a site/server for very long will likely find out about the gruesome underbelly of the online work - spammers. They come in many shapes and sizes (most are bots), and with different purposes each, but they each have this in common - they hurt your site/server and it's available resources.

Below are some things to look out for and some methods to take care of one particular type of spam, referrer spam, which can cripple a site/server in no time. With enough referrer spam you'll have what amounts to a denial of service attack (e.g., so many junk requests that the server can't even tend to the real ones).

Example of how serious this can be
Recently one of the sites we host had a big traffic day thanks to being front paged at Fark.com and Foobies.com. 18,000+ unique visitors in 18 hours. Suffice it to say that put quite a load on the shared environment they were hosted in. Well, guess what - the (unrelated) spam attack the site received a few days later actually created more than twice the load on the server that the huge amounts of legitmate traffic did!

Identifying the problem

The first step in fixing a problem is, of course, to know you have one! Referrer spam can be tricky because without knowing where to look you may never realize what is happening in the dark corners of your webserver - you'll just see the symptoms. (a slow site or one that is down completely)

Where to look
If you've got performance issues with your site that you can't tie to an increase in visits then it might be worth a look. The places where you can track referrer spam are a) in your server logs, b) in your site/cpanel statistics pages.

What you'll want to look at is your most recent hits, and the most frequently requested pages. If you see something that surprises you (e.g., an invalid url, or a url that you don't think should be that busy) then note the ip address(es) and/or domain(s) of the who is requesting it. If you ever see pages continually requested by only one ip address/domain or numerous ip's within the same range, then that's not a good sign. Grab the ip address and do a whois lookup on it and try and find out more. There are certain countries, for instance, where spam often orignates from.

Block that spammer
Ok, so now you sure. Your site is being taken apart by a rougue bot. You've identified a fixed ip or defined range of ip that it's coming from. Now it's time to block this vermin using a little .htaccess magic:

To block a single ip address:
(substituting the real ip for the placeholders x's, of course):

order allow,deny
deny from xxx.xxx.xx.x
allow from all

10 February, 2007
Subscribe to RSS - Apache