Example Varnish VCL for a Drupal / Pressflow site

A few months ago I set up Varnish on my Macbook Pro and have deployed it for a production site which serves anonymous and (a lot of) authenticated users. Initially, I spent a couple months just running it in my local environment, including backporting the Varnish.module to Drupal 5. In retrospect, I'm glad that I spent the time to learn how Varnish and it's configuration file works before deploying it, as it's paid off in a big way as our production site now has something which is equivalent to:

  • ...an in-memory static file server for all users (e.g., the equivalent of hooking up something like nginx or lighttpd as a front end to Apache (or whatever you're using).
  • ...an in-memory boost.module in terms of database-relief for anonymous users.

Contrary to popular belief the two items above are in no way an automatic benefit of simply installing Varnish. If the configuration file, and Drupal installation, is not massaged with care one definitely won't get the database relief from anonymous page caching, and the benefits from Varnish-as-a-static-file server will not nearly be optimized. Bottom line Varnish can be a temperamental piece of software. It only gives back what you put into it.

To this end, the settings in the Varnish VCL file can make or break whether you get a substantial benefit from it. Below is an example VCL file, which was formed from a good amount of research and a lot of trial and error:

# If you're running a single site on a server, or else want all sites
# on a server to go through Varnish you'd only need one of the following backends.
# Showing different possibilities for those who have sites that they
# don't want to run Varnish on. In this example file, Varnish is assumed to
# be running on port 80, and Apache (or whatever) on port 8080.

backend default {
.host = "127.0.0.1";
.port = "8080";
.connect_timeout = 600s;
.first_byte_timeout = 600s;
.between_bytes_timeout = 600s;
.max_connections = 800;
}

backend sitea {
.host = "xxx.xx.xxx.103";
.port = "8080";
.connect_timeout = 600s;
.first_byte_timeout = 600s;
.between_bytes_timeout = 600s;
.max_connections = 800;
}

backend siteb {
.host = "xxx.xx.xxx.104";
.port = "8080";
.connect_timeout = 600s;
.first_byte_timeout = 600s;
.between_bytes_timeout = 600s;
.max_connections = 800;
}

sub vcl_recv {

# Now we use the different backends based on the uri of the site. Again, this is
# not needed if you're running a single site on a server
if (req.http.host ~ "sitea.com$") {
set req.backend = sitea;
} else if (req.http.host ~ "siteb.com$") {
set req.backend = siteb;
} else {
# Use the default backend for all other requests
set req.backend = default;
}

# Allow a grace period for offering "stale" data in case backend lags
set req.grace = 5m;

remove req.http.X-Forwarded-For;
set req.http.X-Forwarded-For = client.ip;

# Properly handle different encoding types
if (req.http.Accept-Encoding) {
if (req.url ~ "\.(jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf)$") {
# No point in compressing these
remove req.http.Accept-Encoding;
} elsif (req.http.Accept-Encoding ~ "gzip") {
set req.http.Accept-Encoding = "gzip";
} elsif (req.http.Accept-Encoding ~ "deflate") {
set req.http.Accept-Encoding = "deflate";
} else {
# unkown algorithm
remove req.http.Accept-Encoding;
}
}

# Force lookup if the request is a no-cache request from the client
if (req.http.Cache-Control ~ "no-cache") {
return (pass);
}

## Default request checks
if (req.request != "GET" &&
req.request != "HEAD" &&
req.request != "PUT" &&
req.request != "POST" &&
req.request != "TRACE" &&
req.request != "OPTIONS" &&
req.request != "DELETE") {
# Non-RFC2616 or CONNECT which is weird.
return (pipe);
}
if (req.request != "GET" && req.request != "HEAD") {
# We only deal with GET and HEAD by default
return (pass);
}

## Modified from default to allow caching if cookies are set, but not http auth
if (req.http.Authorization) {
/* Not cacheable by default */
return (pass);
}
## This would make varnish skip caching for this particular site
if (req.http.host ~ "internet-safety.yoursphere.com$") {
return (pass);
}

# This makes varnish skip caching for every site except this one
# Commented out here, but shown for sake of some use cases
#if (req.http.host != "sitea.com") {
#   return (pass);
#}

## Remove has_js and Google Analytics cookies.
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", "");

## Remove a ";" prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");

## Remove empty cookies.
if (req.http.Cookie ~ "^\s*$") {
unset req.http.Cookie;
}

## Pass cron jobs
if (req.url ~ "cron.php") {
return (pass);
}

# Pass server-status
if (req.url ~ ".*/server-status$") {
return (pass);
}

# Don't cache install.php
if (req.url ~ "install.php") {
return (pass);
}
 
# Cache things with these extensions
if (req.url ~ "\.(js|css|jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf)$") {
    return (lookup);
}

# Don't cache Drupal logged-in user sessions
# LOGGED_IN is the cookie that earlier version of Pressflow sets
# VARNISH is the cookie which the varnish.module sets
if (req.http.Cookie ~ "(VARNISH|DRUPAL_UID|LOGGED_IN)") {
return (pass);
}

}

sub vcl_fetch {

  # Grace to allow varnish to serve content if backend is lagged
  set obj.grace = 5m;

# These status codes should always pass through and never cache.
if (obj.status == 404 || obj.status == 503 || obj.status == 500) {
set obj.http.X-Cacheable = "NO: obj.status";
set obj.http.X-Cacheable-status = obj.status;
return (pass);
}
   
  if (req.url ~ "\.(js|css|jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf)$") {
   unset obj.http.set-cookie;
}

if (req.url ~ "^/files/") {
unset req.http.Set-Cookie;
set obj.cacheable = true;
}

if (req.url ~ "^/sites/") {
unset req.http.Set-Cookie;
set obj.cacheable = true;
}

if (!obj.cacheable) {
set obj.http.X-Cacheable = "NO: !obj.cacheable";
return (pass);
}
else {
# From http://varnish-cache.org/wiki/VCLExampleLongerCaching
/* Remove Expires from backend, it's not long enough */
unset obj.http.expires;

# These TTLs are based on the specific paths and may not apply to your site.
# You could just set a single default TTL if you want.
if (req.url ~ "(.js|.css)$") {
set obj.ttl = 60m; // js and css files ttl 60 minutes
}
else if (req.url ~ "(^/articles/)|(^/tags/)|(^/taxonomy/)") {
set obj.ttl = 10m; // list page ttl 10 minutes
}
else if (req.url ~ "^/article/") {
set obj.ttl = 5m; // article ttl 5 minutes
}
else {
set obj.ttl = 45m; // default ttl 45 minutes
}

/* marker for vcl_deliver to reset Age: */
set obj.http.magicmarker = "1";

# All tests passed, therefore item is cacheable
set obj.http.X-Cacheable = "YES";
}

return (deliver);
}

sub vcl_deliver {

  # From http://varnish-cache.org/wiki/VCLExampleLongerCaching
  if (resp.http.magicmarker) {
     /* Remove the magic marker */
     unset resp.http.magicmarker;

     /* By definition we have a fresh object */
     set resp.http.age = "0";
   }

   #add cache hit data
   if (obj.hits > 0) {
     #if hit add hit count
     set resp.http.X-Cache = "HIT";
     set resp.http.X-Cache-Hits = obj.hits;
   }
else {
     set resp.http.X-Cache = "MISS";
   }

}

sub vcl_error {

if (obj.status == 503 && req.restarts < 5) {
set obj.http.X-Restarts = req.restarts;
restart;
}

}

# Added to let users force refresh
sub vcl_hit {

if (!obj.cacheable) {
pass;
}

if (req.http.Cache-Control ~ "no-cache") {
# Ignore requests via proxy caches,  IE users and badly behaved crawlers
# like msnbot that send no-cache with every request.
if (! (req.http.Via || req.http.User-Agent ~ "bot|MSIE")) {
set obj.ttl = 0s;
return (restart);
}
}

deliver;

}

Some links from places which helped me arrive at this VCL:
* VCL Examples
* Lazy sessions with PF5 and Varnish problem
* Configure Varnish for Pressflow
* Varnish config - default.vcl

18 May, 2010

Comments

Caleb, thanks for posting your varnish config file. It's quite helpful and I've added a few lines to my own file.

One line that looked incorrect is:

if (req.url ~ "(.js|.css)$") {
set obj.ttl = 60m; // js and css files ttl 60 minutes
}

I believe the dots need to be escaped

if (req.url ~ "\.(js|css)$") {
set obj.ttl = 60m; // js and css files ttl 60 minutes
}

I'll have to play with this, but I think the periods inside the parenthesis mean "Matches any one character" (source, Mastering Regular Expression 3rd edition), in which case the original way will work fine.

If I'm looking at the expressions correctly, both would work in many cases, with the top one possibly picking up some things that the bottom one would not (e.g., if a file was called mycss.file.css for example).

yes the dot means "any character". shure "mycss.file.css would macht because the dot is a character. the "$" meas "this is the end of the string" so mycss.file.txt wouldn't match

The concern is that it might incorrectly grab some URIs; for example, the "friendly" URI:

http://www.blackmesh.com/ten-cool-tricks-with-css

would get caught by the rule, since "-" would match an un-escaped ".", and the remainder of the path matches "css$".

No, the backslash before the dot escapes it so that it is a literal dot. It will not match anything except a dot.

Hello,

Can I ask your back ported varnish module for D5, because I have a website using D5 and I can't find a steps on how to work varnish on my site. I have plan to migrate to D6 but there's too many things to work out, so my priority first is to lessen the downtime of my site before migrating.

I haven't heard back yet, but the maintainer of the varnish.module has mentioned giving me commit rights on Drupal.org to post the D5 backport. If I don't hear back from the maintainer in the next couple weeks I'll go ahead and post, but would really rather not release in that way.

You can download a zipped copy of the Varnish module I backported for Drupal 5 here. When/if I hear from the maintainer of Drupal 6 version, I'll move this into the Drupal.org repo. Disclaimer this module is presented as is (though it works well for me and has been running on a production site for a few months now).

Hello Caleb G,

This article helps a lot. Could you please let me know how to configure Varnish with Drupal5 without using Pressflow that you have done so far.

Thanks
Xavier

Hi Xavier - you cannot configure non-pressflow Drupal 5 to work with Varnish. Pressflow makes a number of changes that are critical for getting reverse proxies (like Varnish) to work.

# Force lookup if the request is a no-cache request from the client
if (req.http.Cache-Control ~ "no-cache") {
return (pass);
}

How does that result in a lookup, i.e. doesn't the 'return (pass)' push the request onto the backend (proxies the request)?

There aren't many examples of setting up Varnish in a mixed environment that I've found thus far.

I'm just not sure in reviewing your backend installs for default, sitea & siteb what the differences are. Other than the IP addresses they seem pretty much identical.

If you're not using IP addresses to differentiate your domains, is there a way to do this. I'm just trying to set this up on a dev server and we have a whole bunch of sites that we don't want to run through varnish at this time that are using it. The server's just got the one IP.