CALL +44 (0)20 7183 3893
Blog

Monday, 7 January 2013

Varnish and Autoscaling... a love story


While working on a cool project at Cloudreach, I stumbled upon Varnish, and fell in love with it instantly. The first thing I tried to do was to combine Varnish with the awesomeness provided by AWS Elastic Load Balancer (ELB), in a combination which looks like:





While the frontend ELB works out of the box with Varnish (no surprises here), the backend ELB doesn’t work as expected with Varnish. The problem lies on the fact that Varnish is resolving the name assigned to the ELB, and it’s caching the IP addresses until the VCL get’s reloaded. Because of the dynamic nature of the ELB, the IPs linked to the cname can change at any time, resulting in Varnish routing traffic to an IP which is not linked to the correct ELB anymore.

The problem is discussed here and here but after Googling around I couldn't find any solution which didn’t involve doing:

ELB -> VARNISH -> NGINX (or HAproxy) ->  ELB -> AUTOSCALING GROUP

Going through so many layers seemed too much, taking into consideration that Varnish can be used to load balance requests and perform health checks on the backend nodes without the need for an Internal ELB. The more I thought about it, the more I realised how simple it would be to implement a solution..... so I did it. Using Varnish to perform the load-balancing, removes the overhead of going through an internal ELB, and it will require reloading the backend nodes only when an autoscaling activity takes place.


The solution I've implemented uses varnishadmin command line tool, boto, and some bash scripting to glue all together.

First of all we need to get the backend nodes configured in Varnish and store them on a file:


varnishadm -T $HOSTPORT -S $SECRET backend.list > varnish_ips

Then, we will have to query the autoscaling group, and update the backends if any instance has been added/terminated. The following Python code does most of the job:


Let’s break it down:

  • get_autoscaling_ips gets the IPs associated with instances added to a specific autoscaling group.
  • get_varnish_ips loads the backend IPs in a Python array
  • update_vlc_file compares the two list of IPs. If there is any difference (you might want to reconsider this aspect) in the two lists of IPs, it creates a new VCL file containing the IPs retrieved from the autoscaling group.

In order to decouple the VCL section which is used to define request handling and document caching policies (unlikely to change according to the autoscaling group)  from the section which is used to configure the backends, the Python script outputs the new VCL in the following format:

include /etc/varnish/healthcheck.vcl;

node definitions


director definitions


include /etc/varnish/use.vcl

The node definition and the director definition is dynamically generated by the script, while healthcheck.vcl is a static file where the healthchek conditions are defined (what a surprise:) and use.vcl is another static Varnish config file, which makes use of the director definition.

Once the new VCL is generated, it’s just a matter of reloading it, running:

varnishadm -T $HOSTPORT -S $SECRET vcl.load $NAME $FILE
varnishadm -T $HOSTPORT -S $SECRET vcl.use $NAME


Something I noticed when creating the script, is that backend.list returns the list of the configured backends, regardless if the VCL which defines them is in use or not. This behaviour makes the all exercise of comparing VCL backends with autoscaling IPs useless, so we need to remove all the previous VCL configs running:

varnishadm -T $HOSTPORT -S $SECRET vcl.discard $OLD_VCL

The three scripts can be glued together on a bash script which runs as a cron job on each Varnish server. The code above has not been used in production yet, so please do test thoroughly before usage. II’m always curious to hear of any feedback, so get in touch if you have any comments on this.

As usual, please reach out to us if you need any help or advice using AWS!


Nicola Salvo
System Developer

4 comments:

Ian McDonald said...

Great stuff! I love Varnish for the ability to override the nocache directives. Used with great effort on older Drupal and other web servers to massively take the load off the back end.

Keith said...

Great article. Another viable option would be to have auto scaling post to an SNS topic with listeners that rewrite and reload vcl on message received.

Unknown said...

which version of varnish did you write the script for?

Unknown said...

Seems like this solution works only for Varnish 3.0.3+ because of the bug https://www.varnish-cache.org/trac/ticket/1141 in 3.0.2 and below.

Post a Comment

Pontus is ready and waiting to answer your questions