LAMP

Prepare your Drupal site to be Slashdotted, Dugg, and Farked - Part II

Published in: 

It's been a couple weeks since we posted part one of our look at optimizing a Drupal site to withstand large amounts of traffic,
and since that time it happened again - a site we host, got "Farked" (an inbound link from Fark.com) even bigger than it did last time. In the 8 short hours since the link to the client's site went up, and as I write this - the site has received 27,000 + unique viewers. When I logged in there to the site there actually were 1850 users online at the same time.

We just about fell out of our chair when we saw that...

...after all this is a site is that's on a shared server - not a dedicated one. And those kind of numbers would even give some dedicated servers a thorough workout. In the meantime, it was operation 911. Forget the long term issue of finding a larger server space for this site which clearly is outgrowing it's enviroment - that could be handled afterwards. Right now, we had to get the server and site back pronto.

So quickly we posted a "We'll be back soon" message to the deluged site so that the rest of the sites on the server would work again, and then set out to get Farked site back up. Now 27,000 visitors in 8 hours would mean over 75,000 in 24. There's no shared hosting enviroment we know of that can handle a dynamic, php-driven, website with kind of load...so what to do...what to do.

Well, in short order we made an html file out of the page which Fark.com linked to and put a 301 redirect in the site .htaccess file so that anyone visiting that page would get redirected to the much lighter html file and thereby bypass Drupal and all of the bootstrapping and database overhead. The rest of the pages on the site would function just as they always do, of course. What this did was allow us to take the site live again, serve wayyyy more people than should be possible for on a $20 a month hosting plan, and keep everyone else's site happy and screaming along.

Two other things we did since the time we wrote the first article helped a lot too:

condensed the number of, and compressed, the css files for the site. Simply put, we cut the number of files requested on each page visit by 6 by aggregating all the css into one file. That's a lot of requests when you multiply it by thousands.

Disabled the statistics.module. This is a no-brainer. Apache is logging eveything already and there are more robust tools to process and interpret the logs with than what comes with Drupal (no offense Drupal) - so this is a LOT of overhead that the site/server doesn't need to deal with. Besides the input/output processesing, it also speeds up the database to not have such a larger access log file hangin around.

So there you go - how to make lemonade out of more hits than you should be able to handle. 

15 February, 2007

Blocking referrer spam, mal-bots, and other malicious weasels with htaccess

Anyone who runs a site/server for very long will likely find out about the gruesome underbelly of the online work - spammers. They come in many shapes and sizes (most are bots), and with different purposes each, but they each have this in common - they hurt your site/server and it's available resources.

Below are some things to look out for and some methods to take care of one particular type of spam, referrer spam, which can cripple a site/server in no time. With enough referrer spam you'll have what amounts to a denial of service attack (e.g., so many junk requests that the server can't even tend to the real ones).

Example of how serious this can be
Recently one of the sites we host had a big traffic day thanks to being front paged at Fark.com and Foobies.com. 18,000+ unique visitors in 18 hours. Suffice it to say that put quite a load on the shared environment they were hosted in. Well, guess what - the (unrelated) spam attack the site received a few days later actually created more than twice the load on the server that the huge amounts of legitmate traffic did!

Identifying the problem

The first step in fixing a problem is, of course, to know you have one! Referrer spam can be tricky because without knowing where to look you may never realize what is happening in the dark corners of your webserver - you'll just see the symptoms. (a slow site or one that is down completely)

Where to look
If you've got performance issues with your site that you can't tie to an increase in visits then it might be worth a look. The places where you can track referrer spam are a) in your server logs, b) in your site/cpanel statistics pages.

What you'll want to look at is your most recent hits, and the most frequently requested pages. If you see something that surprises you (e.g., an invalid url, or a url that you don't think should be that busy) then note the ip address(es) and/or domain(s) of the who is requesting it. If you ever see pages continually requested by only one ip address/domain or numerous ip's within the same range, then that's not a good sign. Grab the ip address and do a whois lookup on it and try and find out more. There are certain countries, for instance, where spam often orignates from.

Block that spammer
Ok, so now you sure. Your site is being taken apart by a rougue bot. You've identified a fixed ip or defined range of ip that it's coming from. Now it's time to block this vermin using a little .htaccess magic:

To block a single ip address:
(substituting the real ip for the placeholders x's, of course):

order allow,deny
deny from xxx.xxx.xx.x
allow from all

10 February, 2007

Prepare your Drupal site to be Slashdotted, Dugg, and Farked

Slashdotted, Dugg, Farked. These are all terms that site operators, bloggers, and web developers are very familiar with. They imply having a site 'front paged' at a website that drives a LOT of traffic to your own site.

Over the past week one of the sites we host, ended up on the front page of Fark.com and Foobies.com at the same exact time. It added up to some very busy days for a site which is hosted in a shared environment (meaning that it has to share resources of a server with other sites) as well as some useful knowledge concerning:

  • what kind of load a Drupal powered site can handle when in a shared enviroment
  • how to optimize Drupal's capability to handle a large number of visitors

To begin, it need to be understood that overall optimization for site traffic is going to depend on a gazillion different factors. If you don't have a reliable server stack which is already optimimized this article will only do you so much good. Apache, MySQL, and PHP need to be running reliably, and well tuned.

Assuming you have a well tuned server, then how much traffic your Drupal powered site can handle will depend on:

The amount of resources it has available (cpu and memory particullarly)
If your site is on a fully dedicated server that has 4GB's of ram and 4 CPU's, it's obviously going to make a tremendous difference in what the site can handle, in comparison to a site which exists in a shared enviroment and only gets a fraction of those resources to use. This is common sense, of course. Eventually, if your server stack is fully optimized and your Drupal installation is fully optimized and your site still can't handle the load then mo' better hardmare is your only long term choice.

How many features are enabled on the site, and which ones

One of the rather fun aspects of watching the site receive so much traffic was having a chance to test real world cause and effect with a number of Drupal/site features. Some of them make a very big difference in how much work needs to be done to generate a page view, and thereby how many people the site and server can reliably and consistently handle.

10 February, 2007

Mysql tuning. Tools, tips, and links on optimizing mysql for Drupal

Published in: 

Here are some basic, but high impact ways to optimize MySQL for Drupal (there are much more sophisticated and expensive ways to speed up your database of course):

Am not sure if these tips do any good for someone on a shared hosting plan or not (do shared plans have access to a my.cnf file?). Also, I can only confirm these setting for MySQL 4.0.2 thru the latest 4.0.x version, but I think it would work for 5.x (maybe someone can confirm this and leave a comment...).

Actually, it will work for below 4.0.2 I think as long as you add set-variable = before each line (see this page for more on set-variable)

1. Get this script, upload it, unzip it, and install it in your /etc folder (at the root of your server, not your Drupal install, right). Then run it from the command line by entering sh /path-to-file/tuning-primer.sh

The script will run and what you'll be left with is an output with some info and suggestions about your MySQL settings. Was shocked to learned that on my VPS the cache was not even enabled - very helpful to know!

2. Next open your my.conf file in pico or some kind of proper code/text editor:

Depending on the memory resources you have available you'll want to paste in something like these examples (adjust up or down depending on how your system differs, of course):

For a setup with 500mb or ram paste this in your my.cnf file:

[mysqld]
max_connections = 800
max_user_connections = 800
key_buffer = 36M
myisam_sort_buffer_size = 64M
join_buffer_size = 2M
read_buffer_size = 2M
sort_buffer_size = 3M
table_cache = 1024
thread_cache_size = 286
interactive_timeout = 25
wait_timeout = 1800
connect_timeout = 10
max_allowed_packet = 1M
max_connect_errors = 999999
query_cache_limit = 1M
query_cache_size = 16M
query_cache_type = 1
tmp_table_size = 16M

For a system with 256mb of ram:

[mysqld]
max_connections=500
max_user_connections = 500
key_buffer = 16M
myisam_sort_buffer_size = 32M
join_buffer_size = 1M
read_buffer_size = 1M
sort_buffer_size = 2M
table_cache = 1024
thread_cache_size = 286
interactive_timeout = 25
wait_timeout = 1000
connect_timeout = 10
max_allowed_packet = 1M
max_connect_errors = 999999
query_cache_limit = 1M
query_cache_size = 16M
query_cache_type = 1
tmp_table_size = 16M

3. Save your my.cnf file and restart mysql. This can be done via WHM or the command line (not sure what that command is - sorry)

21 January, 2007
Subscribe to RSS - LAMP