It's been a couple weeks since we posted part one of our look at optimizing a Drupal site to withstand large amounts of traffic, and since that time it happened again - a site we host, got "Farked" (an inbound link from Fark.com) even bigger than it did last time. In the 8 short hours since the link to the client's site went up, and as I write this - the site has received 27,000 + unique viewers. When I logged in there to the site there actually were 1850 users online at the same time.
We just about fell out of our chair when we saw that...
...after all this is a site is that's on a shared server - not a dedicated one. And those kind of numbers would even give some dedicated servers a thorough workout. In the meantime, it was operation 911. Forget the long term issue of finding a larger server space for this site which clearly is outgrowing it's enviroment - that could be handled afterwards. Right now, we had to get the server and site back pronto.
So quickly we posted a "We'll be back soon" message to the deluged site so that the rest of the sites on the server would work again, and then set out to get Farked site back up. Now 27,000 visitors in 8 hours would mean over 75,000 in 24. There's no shared hosting enviroment we know of that can handle a dynamic, php-driven, website with kind of load...so what to do...what to do.
Well, in short order we made an html file out of the page which Fark.com linked to and put a 301 redirect in the site .htaccess file so that anyone visiting that page would get redirected to the much lighter html file and thereby bypass Drupal and all of the bootstrapping and database overhead. The rest of the pages on the site would function just as they always do, of course. What this did was allow us to take the site live again, serve wayyyy more people than should be possible for on a $20 a month hosting plan, and keep everyone else's site happy and screaming along.
Two other things we did since the time we wrote the first article helped a lot too:
condensed the number of, and compressed, the css files for the site. Simply put, we cut the number of files requested on each page visit by 6 by aggregating all the css into one file. That's a lot of requests when you multiply it by thousands.
Disabled the statistics.module. This is a no-brainer. Apache is logging eveything already and there are more robust tools to process and interpret the logs with than what comes with Drupal (no offense Drupal) - so this is a LOT of overhead that the site/server doesn't need to deal with. Besides the input/output processesing, it also speeds up the database to not have such a larger access log file hangin around.
So there you go - how to make lemonade out of more hits than you should be able to handle.
Anyone who runs a site/server for very long will likely find out about the gruesome underbelly of the online work - spammers. They come in many shapes and sizes (most are bots), and with different purposes each, but they each have this in common - they hurt your site/server and it's available resources.
Below are some things to look out for and some methods to take care of one particular type of spam, referrer spam, which can cripple a site/server in no time. With enough referrer spam you'll have what amounts to a denial of service attack (e.g., so many junk requests that the server can't even tend to the real ones).
Example of how serious this can be Recently one of the sites we host had a big traffic day thanks to being front paged at Fark.com and Foobies.com. 18,000+ unique visitors in 18 hours. Suffice it to say that put quite a load on the shared environment they were hosted in. Well, guess what - the (unrelated) spam attack the site received a few days later actually created more than twice the load on the server that the huge amounts of legitmate traffic did! Identifying the problem The first step in fixing a problem is, of course, to know you have one! Referrer spam can be tricky because without knowing where to look you may never realize what is happening in the dark corners of your webserver - you'll just see the symptoms. (a slow site or one that is down completely)
Where to look If you've got performance issues with your site that you can't tie to an increase in visits then it might be worth a look. The places where you can track referrer spam are a) in your server logs, b) in your site/cpanel statistics pages.
What you'll want to look at is your most recent hits, and the most frequently requested pages. If you see something that surprises you (e.g., an invalid url, or a url that you don't think should be that busy) then note the ip address(es) and/or domain(s) of the who is requesting it. If you ever see pages continually requested by only one ip address/domain or numerous ip's within the same range, then that's not a good sign. Grab the ip address and do a whois lookup on it and try and find out more. There are certain countries, for instance, where spam often orignates from.
Block that spammer Ok, so now you sure. Your site is being taken apart by a rougue bot. You've identified a fixed ip or defined range of ip that it's coming from. Now it's time to block this vermin using a little .htaccess magic:
To block a single ip address: (substituting the real ip for the placeholders x's, of course):
order allow,deny deny from xxx.xxx.xx.x allow from all
Slashdotted, Dugg, Farked. These are all terms that site operators, bloggers, and web developers are very familiar with. They imply having a site 'front paged' at a website that drives a LOT of traffic to your own site.
Over the past week one of the sites we host, ended up on the front page of Fark.com and Foobies.com at the same exact time. It added up to some very busy days for a site which is hosted in a shared environment (meaning that it has to share resources of a server with other sites) as well as some useful knowledge concerning:
what kind of load a Drupal powered site can handle when in a shared enviroment
how to optimize Drupal's capability to handle a large number of visitors
To begin, it need to be understood that overall optimization for site traffic is going to depend on a gazillion different factors. If you don't have a reliable server stack which is already optimimized this article will only do you so much good. Apache, MySQL, and PHP need to be running reliably, and well tuned.
Assuming you have a well tuned server, then how much traffic your Drupal powered site can handle will depend on:
The amount of resources it has available (cpu and memory particullarly) If your site is on a fully dedicated server that has 4GB's of ram and 4 CPU's, it's obviously going to make a tremendous difference in what the site can handle, in comparison to a site which exists in a shared enviroment and only gets a fraction of those resources to use. This is common sense, of course. Eventually, if your server stack is fully optimized and your Drupal installation is fully optimized and your site still can't handle the load then mo' better hardmare is your only long term choice. How many features are enabled on the site, and which ones One of the rather fun aspects of watching the site receive so much traffic was having a chance to test real world cause and effect with a number of Drupal/site features. Some of them make a very big difference in how much work needs to be done to generate a page view, and thereby how many people the site and server can reliably and consistently handle.
Recently we had the pleasure of developing a very cool intranet for a group associated with the United Nations. They desired an online space within which they can privately share articles, comments, and files with each other.
Our mission was to make a site that would:
Not let anonymous users view any content
Enable varying levels of viewing, adding, and editing rights across differing authenticated user roles - on a per page/node basis
Enable different/custom menu configurations based upon user role
Redirect users after a successful login attempt to a front page which is unique to user role
If you have never used Drupal before you may not know that the above functionality is not available out-of-the-box. However, with a little research we found some contributed modules which helped us to achieve a totally customizable intranet:
Welcome to the our first newsletter. It's been six months since we launched our first site Bloggyland.com. With another hosting site and many more ideas/plans in the making, we're very pleased with foundation and progress we achieved so far. Since last July we've had customers not just from the U.S., but also from as far as Thailand and the United Kingdom sign up with us.
2007 begins in earnest Drupal 5.0 was released the first week of January, a product which we personally contributed to during its development cycle. As such, we are pleased and proud to announce our first Drupal 5.0 site - a much improved version of HigherVisibilityWebsites.com - our storefront for 100% custom website design and development services, and parent site of Drupal-CMS Hosting and Bloggyland.
Of particular interest to you may be the all-new Template Library, which includes 1000+ super-high quality designs which can be Drupalized and utilized for your site.
What's ahead We're nearly ready to roll out a new version of our hosted platform based on Drupal 5, and are looking to build on our early experiences of the past six months to make this version even better. We expect to release our new host package within the next several weeks.
Ideas, requests? A large part of our motivation for this newsletter came from a desire to hear from you - our client! We would love to hear from you if you have any ideas about making any of our products more useful, convenient, or appealing to yourself or others.
Refer a friend and get free stuff If someone you refer signs up with Drupal-CMS Hosting or Bloggyland, we'll credit you two free months of hosting. If someone you refer hires us to do custom work for them via HigherVisibilityWebsites.com, you'll receive 4 free months of hosting.
We appreciate your business I for one am grateful for the opportunity to communicate with you, and I wish everyone a happy and prosperous new year from HigherVisibility.