This post is about the Cloud superpowers. You see, most of us use S3 as a dumb, cheap, durable storage. And it’s perfectly fine as such. However, you are missing out, big time.

One of my favourite tricks with S3 is its ability to trigger a Lambda function whenever a file is uploaded, usually to automatically transform it, or notify a third party.

Story time

In one of my previous work experiences, we dealt with PDF files. Many of them, and big ones.

Mainly we wanted to convert every single page of the file to a JPG image to show on our website, and extract whatever textual information was available to place in an ElasticSearch thingy.

We were dealing with a few thousand PDF files per day, some of them, over a gigabyte in size. So we were using a mix of ghostscript and some other custom tool to do all of this on a small group of machines that we would scale up during the day, and down at night.

This setup was flaky, expensive, and often, not enough. Some files would fill up our memory and crash the instance.

Rebuilding with S3 and Lambda

We were already using S3 for long term storage of our original PDF files and as the origin for the rendered JPG files. AWS Lambda had been out for a few years already.

So I started building a small proof of concept. The idea was that we would upload the source PDF files to our S3 bucket. This would automatically invoke a function to split the PDF file into N smaller files, one per page. We did this using pdftk.

The individual pages PDF files would then be uploaded again, triggering a second function that would convert a single PDF page to a JPG image and upload it to S3 once more. This was done with ghostscript.

This was not too easy to set up (and to coordinate once the conversion had completed), but it worked so beautifully we could not believe it.

  • First of all, the processing times for very large PDF files went from many minutes to a few seconds. See, it didn’t matter if the PDF was made of one page or a thousand. Lambda functions are invoked with high level of parallelism.
  • The conversion process became almost free, thanks to the high threshold of Lambda free tier.
  • While serverless is just someone else’s servers, it was a relief not having to manage one more (flaky) thing.

Other use cases

This same technique can be used, for example, for:

  • Cropping, resizing and compressing images
  • Putting an image in a SQS queue for AI based object / face detection
  • Analysing CSV files or log files looking for specific patterns or contents
  • Scanning uploaded files using antivirus software
  • Accounting for data sizes uploaded and billing your customers accordingly
  • Generate hashes and/or signatures for uploaded files for integrity checks

The sky, and your imagination, are the limit here. Some tasks, of course, might fit better your traditional “monolithic” workflow, so don’t just use Lambda for the sake of it.

Implementation

First, let’s create an S3 bucket:

resource "aws_s3_bucket" "my_pdf_files" {
  bucket = "my-pdf-files-example"
}

And a Lambda function to be invoked whenever PDF files are uploaded:

resource "aws_lambda_function" "convert" {
  function_name    = "convert-pdf"

  # [... the rest of your lambda function configuration ...]
}

Set Lambda permissions so that S3 can invoke the function:

resource "aws_lambda_permission" "test" {
  statement_id  = "UploadFromS3"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.convert.arn
  principal     = "s3.amazonaws.com"
  source_arn    = aws_s3_bucket.my_pdf_files.arn
}

Now create an S3 Bucket Notification setting to invoke the lambda function once uploads are finalized:

resource "aws_s3_bucket_notification" "my-trigger" {
  bucket = aws_s3_bucket.my_pdf_files.bucket

  lambda_function {
    lambda_function_arn = aws_lambda_function.convert.arn
    events              = ["s3:ObjectCreated:*"]
    filter_suffix       = ".pdf"
  }
}

Conclusion

When I first started using S3 Events to trigger Lambda functions, that was the first time I actually realised what “Using the Cloud” means.

If you are setting up EC2 instances for all your tasks, with little use of other AWS Services, ask yourself if you might be better of, and spending less, with other hosting providers. Every Cloud provides you with a large number of services and little gems that can completely disrupt how you run your business, saving you time, effort, and money.