Using Amazon S3 to Store Database Backups

Using S3 to store your MySQL or PostgreSQL Backups

"I'd like to restore a Database Backup" said no one ever. When you have to do such a thing, it means that your production system is pretty much screwed up and your company is probably losing thousands if not millions of revenue per hour. pr It is clear that storing your Database Backups correctly is of extreme importance. You always hope that you will not need it, but you want to be safe that in case of need, the backup will be there, ready to be restored.

Now:

If you are using Amazon RDS you are pretty much sorted out as RDS will create and store backups for you every night automatically.

But if you are not in the Cloud, or running a Database server on an EC2 instance to retain more control, or on a third-party VPS to save some cents, you will have to manage this your self. In this case, you can use Amazon S3.

AWS Cloud A safe place for your backups

Amazon S3 has many nice features that make it a great place to save your backups. In this tutorial we are going to use:

  • The AWS CLI: a command line interface to the AWS API, to manage the bucket and to upload the backups;
  • Object Lifecycle Management: reducing costs by moving old backups to Infrequent Access Storage Class;
  • Server-Side Encryption: make sure that data is encrypted at rest;
  • MFA Delete: avoid accidental or malicious backup deletion with a Two Factor Authentication token;
  • Logging: keep track of who, where and why uploads and downloads.

If you have some, even tiny, experience with AWS, it will make your life much easier. This guide, however, should be easy for you even if it is your first time in the Cloud.

Create an AWS account

If you have an AWS account, you can skip this section

Open the AWS Homepage and click on "Create an AWS Account":

Create an AWS Account

Insert your email address, select "I am a new user" and click on "Sign in using our secure server":

AWS Login

At this point you will be asked to fill in some informations about you or your company and a telephone number that must be valid (they will make an automated call to verify the number).

Finally they will ask you for credit card details. But don't worry, you don't pay a penny for having an AWS accounts. All AWS services are priced to the hour or to the GB of storage. An empty AWS account is free.

Get API Credentials

If you have some experience with AWS you should already know how this works. Do not use Root Credentials, use an IAM user with restricted access. Do not leak the keys.

For all others, some more details:

When you create an AWS Account you login using what is called the "Root Account". This Root Account is pretty much the Unix root user. He is almighty, powerful and pays the bills. Its credentials should be kept with the maximum secrecy.

Every User in AWS may have Login credentials - that is, username and password for the AWS Console -, API credentials that you use to make API calls from your applications, and more, but this is enough for this post.

Just as you would never disclose your Root Login Credentials, you should never disclose the API keys. They are too powerful, and if they end up in the wrong hands, someone could setup a botnet on your name and on your wallet. So the best way to protect your Root Account is to not generate this credentials at all.

What we want to use instead is an IAM User (think of it as an AWS Subaccount), with very limited powers, just to do what its needed. Now the credentials for the user can still do some damage, but we greatly limit the scope.

Open the IAM Console. On the left sidebar select "Users", then, "New User":

Create an IAM User

Give the user a name (for example backups), and check the "Programmatic Access" checkbox. This will enable API Access for the user and generate its credentials. Proceed to the next page.

Here you have to choose permissions for the user. If you are an experienced AWS user, you may want to write your own policy to pick only the exact permissions needed for the job. In this tutorial we will use the AWS managed S3FullAccess Policy, which grants S3 superpowers to the user:

IAM User - S3FullAccess

Review the settings and on the final page you will receive your secret credentials. Keep them safe! The safety of your backups and of your S3 account in general depends on it.

First rule: do not push them to repositories, whether public or private.

From now on, I will assume the user executing the commands has all the right permissions to do so. In a production environment, however, you should always follow the Principle of Least Privilege, which in this case means that the user creating the bucket should not be the same uploading the backups.

Install the AWS CLI

The Command Line Interface is a great tool to experiment and to interface with the AWS Platform. Most of the steps listed in this tutorial could be manually applied from the AWS Console.

But you don't want to upload backups manually every day. You will need the CLI anyway.

If you are lucky enough to work with a Mac, and already use Homebrew, install the CLI using:

brew install awscli

On Linux and on Windows, ensure that you have Python and pip installed, then run:

pip install awscli

Before running any commands with the CLI, you must set the access credentials that the console uses to authenticate the API requests. Run aws configure and set the appropriate values for Access Key Id and Secret Access Key.

You will also be asked for a region. In this case you should probably use the region that is geographically close to you. You can find a list of available regions on the AWS Documentation.

Setting up the Bucket

At this point you have to create a bucket where to save backups. You could use one that you already own, but the purpose of buckets is to separate objects of different nature. Also some of the customizations that we are going to apply may not work for your other objects.

Time to choose your bucket name. Remember that bucket names must be globally unique, so "database-backups" is not going to work for everyone of you. My suggestion for non-public buckets is to suffix them with an UUID to ensure randomness. Once you have the name:

aws s3api create-bucket --bucket <bucketname>

Since we are going to enable logging, we'll use the backups/ key prefix (or directory if you insist) to store the actual database dumps, and the logs/ prefix for S3 access logs.

If you prefer using your mouse, open the S3 Console, and click on "Create Bucket":

Create a new S3 bucket

When the popup appears, insert the name for your bucket and the region. Then leave all other settings to defaults and continue. By default your bucket is only accessible by you and to authorized IAM Users.

Containing backup costs

Using Object Lifecycle Management we are going to move objects older than 30 days to the Infrequent Access Storage Class (pay less for storage, more for requests). After 6 months the backup are probably going to be so old that would have no real benefit, so we are going to expire them.

Copy the following JSON Lifecycle Configuration to a file (I will name mine lifecycle.json) and feel free to make the appropriate edits for your case:

{
  "Rules": [
    {
      "ID": "Backups Lifecycle Configuration",
      "Status": "Enabled",
      "Prefix": "backups/",
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        "Expiration": {
          "Days": 180
        }
      ],
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 2
      }
    }
  ]
}

This configuration contains a single "Rule" definition, which is applied to objects prefixed with backups/. We are instructing the bucket to move objects to the STANDARD_IA Storage Class, to expire them after 180 days (roughly 6 months). Finally we are making sure that incomplete Multipart Uploads are aborted after 2 days, which is a best practice to apply to any bucket.

Run the following command to apply this configuration to your newly created bucket:

aws s3api put-bucket-lifecycle-configuration \
          --bucket <bucketname> \
          --lifecycle-configuration file://lifecycle.json

The same can be easily achieved on the S3 Console too. Select your bucket Management tab and click on the "Add Lifecycle rule" button. Get yourself confident with the wizard and fill all the required fields.

Protecting against accidental deletion

The last thing you want is to accidentally delete your precious backups. Or even worse to have some malicious actor trying to ruin the company.

MFA Delete protects against this scenario. All delete requests must be further authenticated with a Two Factor Authentication token, like the one used to protect your Gmail, Facebook or Bank account, to put it simply. There are two requirements to enable it:

  • First, your AWS Root account must have MFA enabled. Head over to your IAM Dashboard and enable it with either a Virtual or Physical device.
  • Second, MFA Delete requires Bucket Versioning to be enabled. We don't really need it as we are not going to overwrite our objects, and it may add some costs should it happen...

Note: this step is optional. If you it feels an overkill for your Kitten Blog Wordpress database, feel free to skip to the next section.

Once you have MFA Enabled on your account proceed to enable MFA Delete on the bucket. This is done using the same command as to enable versioning, which comes handy:

aws s3api put-bucket-versioning \
          --mfa <otp> \
          --bucket <bucketname> \
          --versioning-configuration Status=Enabled,MFADelete=Enabled

You will notice that this command line requires a One Time Password. Yes, AWS authenticates the request to enable MFA with MFA. In this way they are assured that your root account has MFA enabled and that you are authorized to make such change.

Unfortunately this operation cannot be completed with the S3 Console. You'll have to use the CLI to enable MFA Delete.

Logging

By enabling Logging on the bucket we keep track of all uploads and downloads, authorized or malicious. This can be required for compliance reasons or just to have an IP trace in case of a data leak. This step is optional too.

We are going to prefix log objects with logs/.

Create a second JSON file (logging.json) with this content:

{
    "LoggingEnabled": {
        "TargetBucket": "<bucketname>",
        "TargetPrefix": "logs/"
    }
}

And execute:

aws s3api put-bucket-logging \
          --bucket <bucketname> \
          --bucket-logging-status file://logging.json

Again, to do the same using the friendlier AWS Console, select your bucket from the S3 console, click on the Properties tab then on the Logging box:

Enabling Logging on an S3 Bucket

Select the current bucket as target and logs/ as prefix and Save.

Great! Your bucket is all set to receive your DB Backups!

Generating a backup

The first step to saving backups is of course creating them.

If you are running a MySQL server, you can backup any entity with this single line:

mysqldump -u [user] \
          -p [password] \
          -h [host] \
          --single-transaction \
          --routines --triggers \
          --all-databases

That will write a huge blob of SQL to the stdout, so you may want to compress it on the fly and save to a file:

mysqldump -u [user] [...] | gzip > mysql_backup.sql.gz

If you are instead using PostgreSQL you can use pg_dumpall:

pg_dumpall -h [host] \
           -U [user] \
           --file=postgresql_backup.sql
gzip postgres_backup.sql

If you are backing up a single database, you can exploit the Postgres "Custom" dump format, which is an already compressed and optimized backup format:

pg_dump -U [user] \
        -h [host] -Fc \
        --file=postges_db.custom [database_name]

Naturally, you can follow this tutorial with any other database engine, like Oracle or SQL Server, but you'll have to figure out how to take a database snapshot yourself.

Storing the Backup in the Bucket

We're almost there. Now we have a snapshot of the whole database into a single file. The last step is to actually upload the file to the bucket.

If you can use standard uploads, the next command will do the job:

S3_KEY=<bucketname>/backups/$(date "+%Y-%m-%d")-backup.gz
aws s3 cp <backupfile> s3://$S3_KEY --sse AES256

For the first time in this tutorial we used aws s3 instead of aws s3api. The latter is the "official" S3 client, supporting all API operations. While the s3 client is a useful abstraction on top of s3api, which supports way less operations with less options. But in this case it makes our life easier: if your backups are larger than 5GB you are forced to use the Multipart Upload process. Actually S3 suggests to use them for any file larger than 100MB.

Using Multipart Uploads with the s3api is a real pain. The s3 client takes care of all the nitty-gritty details for us and it just works nicely.

By using --sse AES256 we are asking S3 to perform encryption for data at rest. This is usually only needed for compliance reasons, unless you're scared that an AWS employee may steal your data.

You made it this far! If you are enjoying the article, do not miss out the next!
* I promise to keep your email safe and to not send spam

So you're looking for a script to automate this?

Once you have set up the bucket, it's very easy to script this and run it daily:

#!/bin/bash

AWS_ACCESS_KEY_ID=<iam_user_access_key>
AWS_SECRET_ACCESS_KEY=<iam_user_secret_key>
BUCKET=<bucketname>

MYSQL_USER=<user>
MYSQL_PASSWORD=<password>
MYSQL_HOST=<host>

mysqldump -u $MYSQL_USER \
          -p $MYSQL_PASSWORD \
          -h $MYSQL_HOST \
          --single-transaction \
          --routines --triggers \
          --all-databases | gzip > backup.gz

S3_KEY=$BUCKET/backups/$(date "+%Y-%m-%d")-backup.gz
aws s3 cp backup.gz s3://$S3_KEY --sse AES256

rm -f backup.gz

Save this to a file somewhere on your server, for example in your home, and make it executable:

chmod +x ./backup.sh

of course replace the <placeholders> with your actual values. Also if you're not using MySQL replace the snapshot line with the appropriate command.

Setting up a cron job

To run this every day, at 12pm for example, run crontab -e and add the following line:

0 12 * * * /home/<youruser>/backup.sh

Save and celebrate. 🎉

Bonus points

For you, loyal reader, that got to this point, have a nice CloudFormation stack script to generate your bucket in a automated and repeatable fashion: download.

Using the S3 Console or the CLI are great ways to get yourself confident, and to be fair, a lot of infrastructure I saw just did this: building it all manually. When you're ready to grow up, move to automation, whether it is using CloudFormation or Terraform or anything else... they're life savers.

Unfortunately MFA Delete and Incomplete Multipart Upload Expiration cannot be enabled with CloudFormation and you will have to resort to the CLI for these two.

Congratulations

You are now generating and storing backups in an nice and secure way.

Naturally, your legal or compliance requirements may influence what and how you actually do much of what is explained. You may not need Server Side Encryption or Logging for example. Or you may be required to never expire Database backups. In that case, take a look at Amazon Glacier for long term storage of cold data.