Using S3 to Store your PostgreSQL or MySQL backups

Few things can strike fear in the hearts of IT professionals like the thought of restoring a database backup 👻.

Yet, having a reliable backup solution is more critical than ever in today’s data-driven world. A failure to adequately prepare can result in significant financial losses, damage to your company’s reputation, and even legal liabilities. Don’t be the one who didn’t care about backups.

Using cloud storage solutions like Amazon S3 can provide a scalable, durable, and cost-effective way to store your database backups. With automated backups, you can rest easy knowing that your data is safely stored in the cloud.

But remember, backups require ongoing maintenance and testing to ensure their effectiveness.

As a DevOps engineer or SysAdmin, it’s your responsibility to regularly monitor your backup solution and make adjustments as necessary. When the day comes that you need to restore your database, you can either be the hero that saved the day or the one who didn’t care about taking backups. The choice is yours.

Storing your Database Backups with Amazon S3

If you are using Amazon RDS you’re likely covered with automatic nightly backups.

But if you’re running your own database server on an EC2 instance or a third-party VPS, you’ll need to manage backups yourself. Amazon S3 is an ideal solution for storing your database backups securely and cost-effectively.

To get started, you’ll need the AWS CLI, which provides a command-line interface to the AWS API. With the AWS CLI, you can create and manage your S3 bucket, as well as upload your backups.

To ensure maximum security, we will use S3’s server-side encryption to encrypt your data at rest. Additionally, enable MFA delete to prevent accidental or malicious deletion of your backups. And take advantage of object lifecycle management to move old backups to infrequent access storage classes and reduce costs.

Even if you’re new to the cloud, don’t worry. This guide will walk you through the steps necessary to store your database backups safely and securely on Amazon S3.

Create an AWS account

If you don’t already have an AWS account, you can create one by visiting the AWS Homepage and click on “Get Started for Free”. From there, you’ll be asked to provide some information about yourself or your company, as well as a phone number for verification.

You’ll also need to provide a valid credit card, but don’t worry, you won’t be charged anything just for having an AWS account. All AWS services are priced based on usage, so you only pay for what you use. And the first year of AWS offers many free services to help you get started and experiment.

It’s important to ensure that your AWS account is secure, as it will contain sensitive information and have access to powerful (and expensive) computing resources. Use strong passwords and enable multi-factor authentication, and restrict access to your AWS account to only those who need it.

Access management is a critical aspect of AWS account security and can be complex to manage. We won’t dive into the details in this tutorial, but if multiple individuals require access to your account you should consider using AWS Organizations and AWS IAM Identity Center to manage user access.

Creating API Credentials for AWS

In order to programmatically interact with AWS services, you’ll need to create API credentials. While you can create credentials for your root user (the user you signed up with), this is a Bad Idea™. The root user has unrestricted access to your AWS account and if the credentials are compromised, it could lead to unauthorized access and potentially significant financial damage.

Instead, use AWS Identity and Access Management (IAM) to create IAM Users. These subaccounts have well-defined permissions and capabilities, allowing you to grant access only to what is needed for the user’s job. By default, a new IAM User has no permissions at all, and policies are used to define what they can and cannot do.

If your database server (or backup performer) is running on an AWS EC2 instance, you may instead use a credential-less IAM Role. This is a safer solution as it doesn’t involve handling sensitive credentials.

To create an IAM User, open the IAM Console and select “Users” from the left sidebar. Click “New User” and give the user a name, such as “backups”.

Next, you’ll need to choose permissions for the user. Experienced AWS users may want to write their own policy to select only the necessary permissions. For this tutorial, we will use the AWS-managed S3FullAccess policy, which grants the user full S3 access.

After creating the user, open its details page and select the “Security Credentials” tab. Here you’ll find a “Create Access Keys” button. AWS will reasonably discourage you from generating access keys and propose better alternatives. Select “Application running outside of AWS” and click “Next”.

You can optionally give a name to this set of credentials and then you will be able to download a CSV file containing the credentials. Remember to keep these credentials safe and never push them to public or private repositories.

With your IAM User or IAM Role created and your API credentials in hand, you’re ready to start saving your database backups securely on Amazon S3.

Setting up the AWS CLI

The Command Line Interface is a great tool to experiment and to interface with the AWS Platform. While most of the steps in this tutorial can be manually applied from the AWS Console, the CLI is essential for automating the backup process.

The AWS CLI is available for macOS, Linux, and Windows. The setup instructions are available on the AWS CLI Documentation.

Before running any commands with the CLI, you must set the access credentials that the console uses to authenticate the API requests. This can be done by running the aws configure command and providing the appropriate values for Access Key ID and Secret Access Key.

It’s important to select the appropriate region for your backups when configuring the CLI. This is the region where your backups will be stored, so it should be geographically close to you for optimal performance. Compliance with data locality regulations must also be considered. See the list of available regions.

Setting up the Bucket

To get started with storing your database backups on S3, you’ll need to create an S3 bucket. While you can use an existing bucket, it’s generally recommended to create a separate bucket specifically for your backups to keep them organized and easily accessible.

Before creating a bucket, you must decide if you want to use S3 Object Lock. Enabling Object Lock after creating a bucket will require the intervention of AWS Support.

Choose a bucket name, but note that they have to be globally unique. So for example, you can prefix the bucket name with your company name.

Once you’ve selected a name, you can create the bucket using the AWS CLI:

aws s3api create-bucket --bucket <bucketname>

If you want to enable Object Lock, you can add the --object-lock-enabled-for-bucket flag to the command above.

Prefixes are not directories, they are just a way to organize your objects. You can’t really create a directory inside a bucket, but you can create a prefix that will be used to group objects. The S3 console will mimic the directory structure, but it’s just a visual representation.

Containing backup costs

One of the benefits of using S3 to store your backups is the ability to leverage object lifecycle management to reduce storage costs over time. By moving older backups to a lower-cost storage class and eventually expiring them, you can save money while still retaining the backups you need.

To set up object lifecycle management, you create a JSON configuration file that specifies the rules for your backups. In the example below, we’re instructing S3 to move objects that are older than 30 days to the ONEZONE_IA storage class, and to expire backups after 180 days:

Copy the following JSON Lifecycle Configuration to a file (I will name mine lifecycle.json) and feel free to make the appropriate edits for your case:

{
  "Rules": [
    {
      "ID": "Backups Lifecycle Configuration",
      "Status": "Enabled",
      "Prefix": "backups/",
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "ONEZONE_IA"
        },
      ],
      "Expiration": {
        "Days": 180
      },
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 2
      }
    }
  ]
}

This sets up a single rule that applies to objects prefixed with backups/. In addition to moving backups to the ONEZONE_IA storage class and expiring them after 180 days, we’re also ensuring that incomplete multipart uploads are aborted after 2 days, to avoid incurring unnecessary storage costs.

To apply this configuration to your bucket using the AWS CLI, run the command:

aws s3api put-bucket-lifecycle-configuration \
          --bucket <bucketname> \
          --lifecycle-configuration file://./lifecycle.json

If you prefer to use the AWS S3 Console, you can set up lifecycle rules by navigating to the “Management” tab for your bucket and clicking on “Add Lifecycle rule”. This will open a wizard that will guide you through the process of creating and applying lifecycle rules to your backups.

Protecting against accidental or malicious deletion with S3 Object Lock

S3 Object Lock is a feature that allows you to protect objects in your S3 bucket from being deleted or overwritten for a specific period of time. This can be useful in scenarios where you need to ensure that your backups are retained for a certain length of time or are protected from accidental or malicious deletion.

Object Lock should be enabled at the time you created your bucket. If you didn’t enable it when you created the bucket, it’s possible to enable it later, but you’ll need to contact AWS Support to do so.

To enforce Object Lock on all uploads you have to define a retention policy.

The following command will set the default retention policy to 50 days in COMPLIANCE mode. This means that all objects uploaded to the bucket will be protected from deletion for 50 days.

aws s3api put-object-lock-configuration \
    --bucket <bucketname> \
    --object-lock-configuration '{ "ObjectLockEnabled": "Enabled", "Rule": { "DefaultRetention": { "Mode": "COMPLIANCE", "Days": 50 }}}'

This can provide an added layer of protection for critical data and ensure that your backups are available when you need them.

Enabling Default Server Side Encryption

By default, objects uploaded to Amazon S3 are not encrypted. However, you can enable server-side encryption to protect the data at rest. Server-side encryption provides an additional layer of security by encrypting the data stored in your bucket.

You can enable server-side encryption when you upload the data by adding --sse to the aws s3 cp command. But it’s better to enable default server-side encryption for your bucket, so that all objects uploaded to the bucket are automatically encrypted.

To enable default server-side encryption, you need to choose a server-side encryption method and specify it as the default encryption for your bucket. AWS S3 supports two server-side encryption methods:

AES256: Uses the AES-256 algorithm. This is the default encryption method used by S3, and it provides strong encryption for your data.
AWS Key Management Service (KMS): Uses the AWS Key Management Service (KMS) to manage the encryption keys. This method provides additional benefits, such as the ability to rotate encryption keys, to audit key usage and most importantly, to separate access to the encryption keys from access to the data.

If you are implementing Server Side Encryption for compliance reasons, KMS should be preferred, but it might be more complex to implement.

To enable default server-side encryption for your bucket, you can use the aws s3api command:

aws s3api put-bucket-encryption \
    --bucket <bucketname> \
    --server-side-encryption-configuration '{"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]}'

Blocking Public Access

By default, newly created buckets are private and can only be accessed by authorized IAM users and the root account.

However, it is possible for the bucket or objects within it to accidentally become publicly accessible, either due to human error or malicious intent.

To mitigate this risk, AWS provides the S3 Public Access Block. This feature provides an additional layer of protection by preventing you from setting public access policies on your bucket or individual objects.

To enable S3 Public Access Block, you can use the following command:

aws s3api put-public-access-block \
    --bucket <bucketname> \
    --public-access-block-configuration '{"BlockPublicAcls":true,"IgnorePublicAcls":true,"BlockPublicPolicy":true,"RestrictPublicBuckets":true}'

Enabling Logging for your S3 Bucket

Enabling Logging for your S3 bucket can help you keep track of all uploads and downloads, whether authorized or malicious. This can be useful for compliance reasons or in case of a data leak.

To enable logging for your bucket, you’ll need to create a separate bucket to store the log files and specify a prefix for the log objects. This allows you to separate the log data from the actual backup data.

You can use the following JSON configuration to enable logging for your bucket:

{
    "LoggingEnabled": {
        "TargetBucket": "<targetbucketname>",
        "TargetPrefix": "logs/"
    }
}

And execute:

aws s3api put-bucket-logging \
          --bucket <bucketname> \
          --bucket-logging-status file://./logging.json

With logging enabled, you’ll be able to keep track of all activity in your bucket and ensure that your backups are secure. This can provide an added layer of protection for your critical data and help you meet compliance requirements.

Today, bucket Logging might be considered obsolete, as AWS CloudTrail can be used as a more complete solution to track all API calls made to your AWS account. If you’re using AWS exclusively for your backups, bucket logging will be enough, but if need wider coverage, you should consider using CloudTrail.

Creating Backups for Your Database

The first step to saving backups is creating them. The commands you use will depend on the database engine you’re using.

If you are using PostgreSQL you can use the pg_dumpall command to create a backup of your entire database:

pg_dumpall -h [host] \
           -U [user] \
           --file=postgresql_backup.sql | gzip > postgres_backup.sql.gz

If you’re backing up a single database in PostgreSQL, you can use the “Custom” dump format, which is already compressed and optimized:

pg_dump -h [host] \
        -U [user] \
        -Fc --file=postgres_db.dump [database_name]

If you are instead running a MySQL server, you can backup your database with:

mysqldump -u [user] \
          -p [password] \
          -h [host] \
          --single-transaction \
          --routines --triggers \
          --all-databases | gzip > mysql_backup.sql.gz

Note that you can follow this tutorial with any other database engine, such as Oracle or SQL Server, once you find the appropriate command to generate a snapshot.

Storing the Backup in the Bucket

With the database snapshot file in hand, the final step is to upload it to the S3 bucket.

To perform the upload, execute the following command:

S3_KEY=<bucketname>/backups/$(date "+%Y-%m-%d")-backup.gz
aws s3 cp <backupfile> s3://$S3_KEY

Here, we are using the aws s3 client instead of aws s3api as it provides an easier way to upload large files by automatically using multipart uploads, which can be tedious with the s3api client.

Note that you can customize the S3 key prefix and backup filename as needed to fit your use case.

Automating Backups with a script

To automate backups, you can create a script that performs the backup and uploads it to the S3 bucket. Here is an example script that you can modify to suit your needs:

#!/bin/bash

export AWS_ACCESS_KEY_ID=<iam_user_access_key>
export AWS_SECRET_ACCESS_KEY=<iam_user_secret_key>
BUCKET=<bucketname>

PG_USER=<user>
PG_PASSWORD=<password>
PG_HOST=<host>

# Create a backup using pg_dumpall for PostgreSQL
pg_dumpall -h $PG_HOST \
           -U $PG_USER \
           --file=postgresql_backup.sql | gzip > postgres_backup.sql.gz

# Upload the backup to S3
S3_KEY=$BUCKET/backups/$(date "+%Y-%m-%d")-backup.gz
aws s3 cp postgres_backup.sql.gz s3://$S3_KEY

# Clean up the local backup file
rm -f postgres_backup.sql.gz

Save this script to a file, for example backup_script.sh, and make it executable:

chmod +x backup_script.sh

Make sure to replace the <placeholders> with your actual values, and modify the backup command if you are using a different database engine.

Setting up a cron job

To run this every day, at 12pm for example, run crontab -e and add the following line:

0 12 * * * /home/<youruser>/backup_script.sh

Save and celebrate. 🎉

If you prefer a more “cloud native” approach, check out my article on how to use a Lambda function to trigger scheduled events

Bonus: Infrastructure as Code with Terraform

If you prefer to manage your AWS resources programmatically, you can use Terraform to create and configure your S3 bucket automatically. The following Terraform code creates a bucket with the same settings as we have discussed up to this point.

variable "bucket_name" {}
variable "region" {}

provider "aws" {
  version = "~> 4.61"
  region = var.region
}

# Create the S3 bucket
resource "aws_s3_bucket" "backup" {
  bucket = var.bucket_name

  object_lock_enabled = true
}

# Prevent this bucket from being public
resource "aws_s3_bucket_public_access_block" "backup" {
  bucket = aws_s3_bucket.backup.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# Enable server side encryption by default
resource "aws_s3_bucket_encryption" "backup" {
  bucket = aws_s3_bucket.backup.id

  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        sse_algorithm = "AES256"
      }
    }
  }
}

# Enable the lifecycle policy
resource "aws_s3_bucket_lifecycle_configuration" "backup" {
  bucket = aws_s3_bucket.backup.id

  rule {
    id      = "backups"
    status  = "Enabled"

    filter {
      prefix  = "backups/"
    }

    transition {
      days          = 30
      storage_class = "ONEZONE_IA"
    }

    expiration {
      days = 180
    }

    abort_incomplete_multipart_upload_days = 2
  }
}

# Enable versioning, as required for object locking
resource "aws_s3_bucket_versioning" "bucket" {
  bucket = aws_s3_bucket.bucket.id

  versioning_configuration {
    status = "Enabled"
  }
}

# Enable object locking by default
resource "aws_s3_bucket_object_lock_configuration" "backup" {
  bucket = aws_s3_bucket.backup.id

  object_lock_configuration {
    object_lock_enabled = "Enabled"

    rule {
      default_retention {
        mode = "COMPLIANCE"
        days = 50
      }
    }
  }
}

# Optional: Enable logging. Prefer CloudTrail for auditing.
resource "aws_s3_bucket_logging" "bucket" {
  bucket = aws_s3_bucket.bucket.id

  # We're logging to the same bucket, but you can use a different one
  target_bucket = aws_s3_bucket.bucket.id
  target_prefix = "log/"
}

Congratulations

In conclusion, by following the steps outlined in this tutorial, you should now have a better understanding of how to create a secure and efficient backup strategy for your databases using Amazon S3.

While the specific configurations and requirements may vary based on your organization’s needs, this guide provides a strong foundation for creating a solid backup system that should meet the security and compliance standards necessary for any industry.

Remember that as the AWS expert, it is your responsibility to customize and fine-tune these solutions to best suit your specific use case. Don’t forget to regularly test and review your backup strategy to ensure its effectiveness and accuracy over time.