In 2023, pretty much every single software product that might ever be run on a Cloud infrastructure will support some form of High Availability deployment.
In the best scenarios these will depend on an external data store, and we will be able to keep the Application servers truly immutable (or even with read-only disks!). Sometimes, the application we are trying to set up is the data store itself, like with MongoDB, Influx, or Prometheus, or some poorly designed (sorry - I meant non cloud native) software that just wants a disk to write stuff on.
For example, if we are running a Jenkins or a WordPress instance, we will need to have a disk attached to the instance, and we will have to make sure that the disk is not lost when the instance dies.
The solution, depending on the specific software we are using, can be an NFS (or EFS on AWS) volume or some more advanced system with Active / Passive setup and automatic failovers.
These setup, however, can get very expensive very quickly, and they are not always necessary. Sometimes you might even need to purchase a license for the software you are running, which can be a deal breaker for a cash-strapped startup.
So, at the beginning, and for a while, you might have to make do with single node configurations, Mostly-Available™ services.
Anatomy of a Mostly-Available service
Let’s talk about that Jenkins instance that your developers (or you, dear 10x full stack developer + DevOps) want. You can probably afford to lose it for a few minutes or even hours and no one would notice.
That’s your MA service. Something you do want to be online, at least 99% of the time. But preparing a clustered service is overkill, as is paying ~$25 every month for an ALB.
It obstinately writes to a disk, so we will have to save that. But if the instance dies, we want it to recover automatically, with an Autoscaling Group.
Setting up the EBS volume
EBS volumes are AZ-local, which means that if it is created in eu-west-1a
, it
will not be possible to attach it in eu-west-1b
or 1c
.
So, choose an AZ, and remember it because we will have to configure the ASG to launch instances in a Subnet set in the same AZ as our Volume.
resource "aws_ebs_volume" "jenkins" {
availability_zone = "eu-west-1a"
size = 32
}
Setting up the Autoscaling Group
Provided that you already built an Amazon Machine Image (AMI) for your service, we are going to create the ASG specifiying a Userdata script that will mount the volume at boot. For this purpose you can use this Terraform code:
data "aws_ami" "jenkins" {
most_recent = true
owners = ["self"]
filter {
name = "name"
values = ["jenkins-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
resource "aws_launch_template" "jenkins" {
name = "jenkins"
image_id = data.aws_ami.jenkins.id
instance_type = var.instance_type
user_data = base64encode(data.template_file.jenkins_userdata.rendered)
update_default_version = true
}
resource "aws_autoscaling_group" "jenkins" {
name = "jenkins"
max_size = 1
min_size = 1
desired_capacity = 1
# Make sure the subnet specified here lives in the same AZ as the Volume.
vpc_zone_identifier = [aws_subnet.private_1a.id]
launch_template {
id = aws_launch_template.jenkins.id
version = "$Latest"
}
}
data "template_file" "jenkins_userdata" {
template = file("${path.module}/userdata.sh")
vars = {
region = var.region
volume_id = aws_ebs_volume.jenkins.id
}
}
You will also need to give the instances a role that allows the
ec2:AttachVolume
action.
Finally, the Userdata script (this is tuned for Debian 11, feel free to adapt):
#!/bin/bash
# Configure awscli to use the current region
export AWS_REGION="${region}"
mkdir -p /root/.aws
cat <<EOF > /root/.aws/config
[default]
region = $AWS_REGION
EOF
# Attach the EBS volume
INSTANCE_ID=$(curl http://169.254.169.254/latest/meta-data/instance-id)
aws ec2 attach-volume --volume-id ${volume_id} --instance-id $INSTANCE_ID --device /dev/xvdb
# Mount the Volume to the Service data directory
mount /dev/xvdb /opt/jenkins_data
# Ensure an ext4 filesystem exists on the volume.
# We use blkid to detect the current file-system and if none is found we use `mkfs`.
# Sleep is needed because blkid might not have detected the disk yet.
sleep 3
# Ensure Volumes have a filesystem
VOLUME_FS=$(blkid -o value -s TYPE /dev/xvdb)
if [ "$VOLUME_FS" != "ext4" ]; then
mkfs -t ext4 /dev/xvdb
fi
# Allow Jenkins to use the volume (or it will default to root:root)
chown -R jenkins: /opt/jenkins_data
# And, run!
service enable jenkins
service start jenkins
Yeah, I know, it looks more complicated than it actually is. There’s a few lines in that script that are effectively only needed the very first time a volume is used.
If your Volume has already been used for a while, or you have initialized it in
another instance, you can remove all the lines from the sleep
to the chown
.