Project: Terraform Swarm mode cluster

Using Terraform to provision a swarm cluster on DigitalOcean

posted 2017-10-06 by Thomas Kooi

Terraform Orchestration DigitalOcean Swarm mode

I recently started playing around with Terraform. Since I often manually spin up a couple of droplets on DigitalOcean to set up a Swarm mode cluster to try out some projects or set ups, I figured I should put all of that into code as my first Terraform project. I created a Terraform module for provisioning a basic Swarm mode cluster during this project. Ideal for labs or development clusters.

This entire set up can be found on Github.

Goals

  • Set up a HA Docker Swarm mode cluster on DigitalOcean to use in a lab set-up

Prerequisites

  • Terraform >= 0.10.6
  • DigitalOcean account / API token with write access.
  • SSH Keys added to your DigitalOcean account
  • jq for parsing the output of the Swarm join tokens.

Terraform set up

Creating a multi-master cluster

First a single Swarm mode manager is provisioned. This is the leader node. Any additional manager nodes will be provisioned after this first step. Once the manager nodes have been provisioned, Terraform will initialize the Swarm on the first manager node and retrieve the join tokens. It will then have all the managers join the cluster.

If the cluster is already up and running, Terraform will check with the first leader node to refresh the join tokens. It will join any additional manager nodes that are provisioned automatically to the Swarm.

The worker nodes are provisioned after all the leaders are up-and-running. Once they are created, they will run a docker join command with the tokens retrieved from the masters.

Creating the managers

Here is my Terraform code for provisioning the manager. It will run a provisioner for the first manager and have it initialize the Swarm. All of it is available on Github in my DigitalOcean Swarm managers Terraform module.

resource "digitalocean_droplet" "manager" {
  ssh_keys           = "${var.ssh_keys}"
  image              = "${var.image}"
  region             = "${var.region}"
  size               = "${var.size}"
  private_networking = true
  backups            = "${var.backups}"
  ipv6               = false
  tags               = ["${var.tags}"]
  user_data          = "${var.user_data}"
  count              = "${var.total_instances}"
  name               = "${format("%s-%02d.%s.%s", var.name, count.index + 1, var.region, var.domain)}"

  connection {
    type        = "ssh"
    user        = "${var.provision_user}"
    private_key = "${file("${var.provision_ssh_key}")}"
    timeout     = "2m"
  }

  provisioner "remote-exec" {
    inline = [
      "while [ ! $(docker info) ]; do sleep 2; done",
      # Init the swarm on the first manager
      "if [ ${count.index} -eq 0 ]; then sudo docker swarm init --advertise-addr ${digitalocean_droplet.manager.0.ipv4_address_private}; exit 0; fi",
    ]
  }
}

Once the Swarm has been been initialize the join tokens can be retrieved. I do this using the external data resource.

data "external" "swarm_tokens" {
  program = ["bash", "${path.module}/scripts/get-swarm-join-tokens.sh"]

  query = {
    host        = "${element(digitalocean_droplet.manager.*.ipv4_address, 0)}"
    user        = "${var.provision_user}"
    private_key = "${var.provision_ssh_key}"
  }
}

Here is the script to fetch the join tokens:

#!/usr/bin/env bash

# Processing JSON in shell scripts
# https://www.terraform.io/docs/providers/external/data_source.html#processing-json-in-shell-scripts
# Credits to https://github.com/knpwrs/docker-swarm-terraform for inspiration on how to do this

set -e
eval "$(jq -r '@sh "HOST=\(.host) USER=\(.user) PRIVATE_KEY=\(.private_key)"')"

# Fetch the manager join token
MANAGER=$(ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i $PRIVATE_KEY \
    $USER@$HOST docker swarm join-token manager -q)

# Fetch the worker join token
WORKER=$(ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i $PRIVATE_KEY \
    $USER@$HOST docker swarm join-token worker -q)

# Produce a JSON object containing the tokens
jq -n --arg manager "$MANAGER" --arg worker "$WORKER" \
    '{"manager":$manager,"worker":$worker}'

scripts/get-swarm-join-tokens.sh

This will ssh to the leader node, get the join tokens for the manager and workers and use jq to output this as a JSON object.

With these join tokens, the rest of the managers can be added to the cluster. The following terraform resource will join the existing Swarm through the first created manager.

resource "null_resource" "bootstrap" {
  count = "${var.total_instances}"

  triggers {
    cluster_instance_ids = "${join(",", digitalocean_droplet.manager.*.id)}"
  }

  connection {
    host        = "${element(digitalocean_droplet.manager.*.ipv4_address, count.index)}"
    type        = "ssh"
    user        = "${var.provision_user}"
    private_key = "${file("${var.provision_ssh_key}")}"
    timeout     = "2m"
  }

  provisioner "remote-exec" {
    inline = [
      "while [ ! $(docker info) ]; do sleep 2; done",
      "if [ ${count.index} -gt 0 ] && [! sudo docker info | grep -q \"Swarm: active\" ]; then sudo docker swarm join --token ${lookup(data.external.swarm_tokens.result, "manager")} ${element(digitalocean_droplet.manager.*.ipv4_address_private, 0)}:2377; exit 0; fi",
    ]
  }
}

Joining the workers

Joining worker nodes to the swarm is easy. You only need to know one or more of the manager ip addresses and the worker token. I’ve put this in it’s own Terraform module, which can also be found on Github.

It will create a droplet, followed by running a provisioner that will join the Swarm. When a droplet gets deleted, it will run the docker swarm leave command.


resource "digitalocean_droplet" "node" {
  ssh_keys           = "${var.ssh_keys}"
  image              = "${var.image}"
  region             = "${var.region}"
  size               = "${var.size}"
  private_networking = true
  backups            = "${var.backups}"
  ipv6               = false
  user_data          = "${var.user_data}"
  tags               = ["${var.tags}"]
  count              = "${var.total_instances}"
  name               = "${format("%s-%02d.%s.%s", var.name, count.index + 1, var.region, var.domain)}"

  connection {
    type        = "ssh"
    user        = "${var.provision_user}"
    private_key = "${file("${var.provision_ssh_key}")}"
    timeout     = "2m"
  }

  provisioner "remote-exec" {
    inline = [
      "while [ ! $(docker info) ]; do sleep 2; done",
      "sudo docker swarm join --token ${var.join_token} ${element(digitalocean_droplet.manager.*.ipv4_address_private, 0)}:2377",
    ]
  }

  provisioner "remote-exec" {
    when = "destroy"

    inline = [
      "docker swarm leave",
    ]

    on_failure = "continue"
  }
}

Firewall

When running a cluster on DigitalOcean for an extended period, please consider adding firewall rules for the cluster droplets. You will need to allow the following ports:

  • TCP port 2377 for cluster management communications
  • TCP and UDP port 7946 for communication among nodes
  • UDP port 4789 for overlay network traffic

You can use the Digital Ocean Swarm mode firewall rules Terraform module as an example.

Using Terraform modules

I’ve prepared all of these into a single Terraform module.

module "swarm-cluster" {
  source           = "github.com/thojkooi/terraform-digitalocean-docker-swarm-mode"
  domain           = "do.example.com"
  total_managers   = 3
  total_workers    = 2
  do_token         = "${var.do_token}"
  manager_ssh_keys = [1234, 1235, ...]
  worker_ssh_keys  = [1234, 1235, ...]
}

There are also multiple examples available for running with a firewall, adding different droplets to the cluster or including user data. Feel free to use or adapt any of these modules for your use. Please let me know if you have any improvements or feedback.