Setting up keepalived with Chef on Ubuntu 12.04

By Fri, Apr 19 2013 at 04:26PM EDT From Agile Testing
We have 2 servers running HAProxy on Ubuntu 12.04. We want to set them up in an HA configuration, and for that we chose keepalived.

The first thing we did was look for an existing Chef cookbook for keepalived -- luckily, @jtimberman already wrote it. It's a pretty involved cookbook, probably one of the most complex I've seen. The usage instructions are pretty good though. In any case, we ended up writing our own wrapper cookbook on top of keepalived -- let's call it frontend-keepalived.

The usage documentation for the Opscode keepalived cookbook contains a role-based example and a recipe-based example. We took inspiration from both. In our frontend-keepalived/recipes/default.rb file we have:


include_recipe 'keepalived'

node[:keepalived][:check_scripts][:chk_haproxy] = {
  :script => 'killall -0 haproxy',
  :interval => 2,
  :weight => 2
}
node[:keepalived][:instances][:vi_1] = {
  :ip_addresses => '172.30.10.10',
  :interface => 'frontend_if',
  :track_script => 'chk_haproxy',
  :nopreempt => false,
  :advert_int => 1,
  :auth_type => :pass, # :pass or :ah
  :auth_pass => 'mypass'
}


This code overrides the default values for many of the attributes defined in the Opscode keepalived cookbook. It specifies the floating IP address that will be common between the 2 servers that will each run HAProxy (:ip_addresses). It also specifies the network interface where the multicast-based keepalived protocol (:interface) and the 'check script' which tests whether HAProxy is still running on each server.

However, we still needed a way to specify which of the 2 servers is the master and which is the backup (in keepalived parlance), as well as indicating priorities for each server. The usage document in the keep alived cookbook shows this as an example of using a single role to define the master and the backup:


override_attributes(
:keepalived => {
:global => {
:router_ids => {
'node1' => 'MASTER_NODE',
'node2' => 'BACKUP_NODE'
}
}
}
)

We couldn't get this to work (if somebody who did reads this, please leave a comment and tell me how you did it!). Instead, we defined 2 roles, one for the master and one for the backup. Here's the master role:


$ cat frontend-keepalived-master.rb
name "frontend-keepalived-master"
description "install keepalived and set state to MASTER"

override_attributes(
    "keepalived" => {
      "instance_defaults" => {
        "state" => "MASTER",
        "priority" => "101"
}
      }
)

run_list(
  "recipe[frontend-keepalived]"
)

Here's the backup role:


$ cat frontend-keepalived-backup.rb
name "frontend-keepalived-backup"
description "install keepalived and set state to BACKUP"

override_attributes(
    "keepalived" => {
      "instance_defaults" => {
        "state" => "BACKUP",
        "priority" => "100"
}
    }
)

run_list(
  "recipe[frontend-keepalived]"
)

Notice that we override 2 attributes, the state and the priority. The defaults for these are in the Opscode keepalived cookbook, under attributes/default.rb


default['keepalived']['instance_defaults']['state'] = 'MASTER'
default['keepalived']['instance_defaults']['priority'] = 100

This was useful in determining how to specify the stanza overriding them in our roles -- it made us see that we needed to specify the instance_defaults key under keepalived in the role files.

At this point, we added the master role to the Chef run_list of server #1 and the backup role to the Chef run_list of server #2. We had to do one more thing on each server (which we'll add to the default recipe of our frontend-keepalived cookbook): per this very helpful blog post on setting up HAProxy and keepalived, we edited /etc/systctl.conf and added:

net.ipv4.ip_nonlocal_bind=1
then applied it via 'sysctl -p'. This was needed so that HAProxy can listen on the keepalived-created 'floating IP' common to the 2 servers, which is not a real IP tied to an existing local network interface.

Once we ran chef-client on each of the 2 servers, we were able to verify that keepalived does its job by pinging the common floating IP from a 3rd server, then shutting down the network interface 'frontend_if' on each server, with no interruption in the ICMP responses sent from the floating IP. Our next step is to do some heavy-duty testing involving HTTP requests handled by HAProxy, and see that there is no interruption in service when we fail over from one HAProxy server to the other.





Using wrapper cookbooks in Chef

By Tue, Apr 02 2013 at 06:58PM EDT From Agile Testing
Not sure if this is considered a Chef best practice or not -- I would like to get some feedback, hopefully via constructive comments on this blog post. But I've started to see this pattern when creating application-specific Chef cookbooks: take a community cookbook, include it in your own, and customize it for your specific application.

A case in point is the haproxy community cookbook. We have an application that needs to talk to a Riak cluster. After doing some research (read 'googling around') and asking people on Twitter (because Tweeps are always right), it looks like the preferred way of putting a load balancer in front of Riak is to run haproxy on each application server that needs to talk to Riak, and have haproxy listen on 127.0.0.1 on some port number, then load balance those requests to the Riak backend. Here is an example of such an haproxy.cfg file.

So what I did was to create a small cookbook called haproxy-riak, add a default.rb recipe file that just calls

include_recipe haproxy

and then customize the haproxy.cfg file via a template file in my new cookbook (I actually didn't do it via the template yet, only via a hardcoded cookbook file, but my colleague and Chef guru Jeff Roberts is working on templatizing it).

I also added

depends haproxy

to metadata.rb.

I think this is pretty much all that's needed in order to create this 'wrapper' cookbook. This cookbook can then be used by any application that needs to talk to Riak. As a matter of fact, we (i.e. Jeff) are thinking that we should have such a cookbook per application (so we would call it app1-haproxy-riak for example) just so we can do things like search for different Riak clusters that we may have for different types of applications, and populate haproxy.cfg with the search results.

In any case, I would be curious to find out if other people are using this 'pattern', or if they found other ways to apply the DRY principle in their Chef cookbooks. Please leave a comment!

Installing ruby 2.0 on Ubuntu 12.04

By Mon, Mar 18 2013 at 12:55PM EDT From Agile Testing
I wanted to find a way to install ruby 2.0 on Ubuntu 12.04 via Chef, without going the rvm route, which is harder to automate. After several tries, I finally found a list of .deb packages which are necessary and sufficient as pre-requisites. Writing it down here for my future reference, and maybe it will prove useful to others out there.

I downloaded most of these packages from packages.ubuntu.com (if you google their names you'll find them), then I installed them via 'dpkg -i'. Here they are:


libffi5_3.0.9-1_amd64.deb
libssl0.9.8_0.9.8o-7ubuntu3.1_amd64.deb
zlib1g-dev_1.2.3.4.dfsg-3ubuntu4_amd64.deb

The actual ruby .deb package was built with fpm by my colleague Jeff Roberts courtesy of this blog post.




No snowflakes allowed

By Wed, Mar 06 2013 at 05:44PM EST From Agile Testing
"Snowflake" is a term I learned from my colleague Jeff Roberts. It is used in the Chef community (maybe in the configuration management community at large as well) to designate a server/node that is 'unique', i.e. not in configuration management control. In a Chef environment, it means that the node in question was never added to Chef and never had chef-client run on it.

We've all been in situations where it seems overkill to go through the effort of automating the setup of a server. Maybe the server has a unique purpose within our infrastructure. Maybe we didn't feel like spending the time to create Chef recipes for that server. Whatever the reasoning, it seemed low-risk at the time.

Well, I am here to tell you there is danger in this way of thinking. Example: we deployed a server in EC2 manually. We installed the Sensu client on it manually and pointed it at our Sensu server. Everything seemed fine. Then one day we updated our Sensu configuration (via Chef) both on the Sensu server and on all the Sensu clients. Of course, the Sensu configuration on our snowflake server never got updated, since chef-client wasn't running on that server. As a result, the Sensu client wasn't checking in properly with the Sensu server, and the snowflake behaved as if it was falling off the map as far as our monitoring system was concerned. We had to manually update Sensu on the snowflake to bring it in sync with our configuration changes.

Basically, the result of having snowflake servers is that they do fall off the map as far as the overall automation of your infrastructure is concerned. They suffer bitrot, and you end up spending lots of time on their care and feeding, thus defeating the purpose of saving the time to automate them in the first place.

This being said, it's hard to be disciplined enough to run chef-client periodically on every single server in your infrastructure. I've never been able to do that before, but we are doing it now, mostly because of the insistence of Jeff. I do see the advantages of this discipline, and I do recommend it to everybody.

Video and slides for 'Five years of EC2 distilled' talk

By Tue, Mar 05 2013 at 08:49PM EST From Agile Testing
On February 19th I gave a talk at the Silicon Valley Cloud Computing Meetup about my experiences and lessons learned while using EC2 for the past 5 years or so. I posted the slides on Slideshare and there's also a video recording of my presentation. I think the talk went pretty well, judging by the many questions I got at the end. Hopefully it will be useful to some people out there who are wondering if EC2 or The Cloud in general is a good fit for their infrastructure (short answer: it very much depends).

I want to thank Sebastian Stadil from Scalr for inviting me to give the talk (he first contacted me about giving a talk like this in -- believe it or not -- early 2008!). I also want to thank Adobe for hosting the meeting, and CDNetworks for sponsoring the meeting.