IaC for S3 Event Notifications with AWS SAM, AWS CDK, and Terraform

S3 Event Notification using AWS Lambda, SNS and SQS

IaC for S3 Event Notifications with AWS SAM, AWS CDK, and Terraform

Introduction

Monorepo:https://github.com/adityawdubey/IaC-for-S3-Event-Notifications-with-AWS-SAM-AWS-CDK-and-Terraform

Infrastructure as Code (IaC) is a crucial tool and concept for every tech professional. IaC is like having a detailed blueprint for your digital infrastructure. Instead of manually configuring servers and networks, you use code to define and manage your IT resources.

With IaC, you can rapidly deploy consistent infrastructure, detect errors early, and easily track changes. It's a key part of modern DevOps practices, helping teams work more efficiently and reliably.

Project Overview

Amazon S3 Event Notifications: You can use Amazon S3 Event Notifications to get alerts when specific events occur in your S3 bucket. To set up notifications, add a configuration that specifies the events you want Amazon S3 to track. Also, specify where you want Amazon S3 to send these notifications.

Notification types and destinations: Amazon S3 can publish notifications for the following events: New object created events, Object removal events, Restore object events, Reduced Redundancy Storage (RRS) object lost events, Replication events, S3 Lifecycle expiration events, S3 Lifecycle transition events, S3 Intelligent-Tiering automatic archival events, Object tagging events, Object ACL PUT events.You can use different methods for these notifications, including SQS, SNS, and Lambda as well as EventBridge.

In this project, we will focus on creating a notification system that triggers on s3:ObjectCreated:* events. This system will monitor an S3 bucket for new file uploads, trigger a Lambda function to process the uploaded files, send a notification using Simple Notification Service (SNS), and store metadata in a Simple Queue Service (SQS) queue for further processing.

s3:ObjectCreated:* event type to request notification regardless of the API that was used to create an object.

To follow the project step-by-step using the AWS Management Console, check out this on my blog website: https://adityadubey.cloud/s3-file-upload-notification-system

There are multiple IaC tools you can use, but we are going to mainly cover AWS CloudFormation, AWS Cloud Development Kit (CDK), and Terraform in this article. My goal is to explore different IaC tools, and I believe this journey will be both enlightening and rewarding.

  1. Cloudformation

Github:https://github.com/adityawdubey/S3-File-Upload-Notification-using-AWS-SAM

AWS CloudFormation is a service that helps you set up cloud infrastructure automatically. It uses templates written in JSON or YAML to define the resources and their settings. When you give CloudFormation these templates, it sets up or changes your infrastructure based on the specifications you've provided.

AWS CloudFormation automates the setup of cloud infrastructure using templates in JSON or YAML. Simply describe your configuration, submit the template, and CloudFormation will create or update your infrastructure. Use Change Sets to review changes before applying them, estimate costs, rollback failed updates, and detect manual changes with Drift Detection.

Writing Cloudformation

This AWS CloudFormation template templates/main.yaml , using AWS SAM, sets up an infrastructure to manage file uploads to an S3 bucket. It creates an SNS topic to send notifications to a specified email address when files are uploaded. An SQS queue is configured to receive these notifications.

The Lambda function lambda/s3_event_handler is triggered by new file uploads to the S3 bucket and can perform actions based on these events.

AWS SAM

AWS SAM templates provide a simple syntax for defining Infrastructure as Code (IaC) for serverless applications. As an extension of AWS CloudFormation, these templates let you deploy serverless resources using CloudFormation, taking advantage of its powerful IaC capabilities.

From CloudFormation's perspective, SAM acts as a transform. This means SAM templates, although simpler in syntax, are converted into standard CloudFormation templates during deployment. The main benefit is that SAM templates require less code to define serverless applications compared to traditional CloudFormation templates.

Each SAM template includes a Transform statement at the top, telling CloudFormation to process the SAM-specific syntax and convert it into a full CloudFormation template. This transformation allows users to enjoy the simplicity of SAM templates while still using CloudFormation's features.

AWS::Serverless::Function is for when it is used with AWS SAM, whereas AWS::Lambda::Function is for usage with plain Cloudformation.

Create Lambda Functions

Define your Lambda functions in the /lambda directory, ensuring that each specific Lambda directory contains a requirements.txt file. This configuration enables our deployment script, scripts/deploy.sh, to automatically package the Lambda functions located in this directory and upload them to S3.

Environment Variables in Lambda

The SAM template sets environment variables for the Lambda function using the Environment property. These variables are accessible within the Lambda function via the os.environ module in Python.

templates/main.yaml

....
s3_event_handler:
  Type: AWS::Serverless::Function
  Properties:
    Handler: handler.lambda_handler
    Runtime: python3.12
    CodeUri: lambda/s3_event_handler
    MemorySize: 256
    Timeout: 30
    Environment:
      Variables:
        SNS_TOPIC_ARN: !Ref S3NotificationSNSTopic
        SQS_QUEUE_URL: !GetAtt MyQueue.QueueUrl
  ....
  ....

Accessing Environment Variables in Lambda Function

lambda/s3_event_handler/handler.py

import os
....
sns_topic_arn = os.environ['SNS_TOPIC_ARN']
sqs_queue_url = os.environ['SQS_QUEUE_URL']
....
....

Deploying SAM

Make sure you install and configure the AWS CLI and install the SAM CLI by running pip install awscli, aws configure, and pip install aws-sam-cli.

Deployment Script

scripts/deploy.sh

#!/bin/bash

# Ensure the script stops on errors
set -e

STACK_NAME=$1
TEMPLATE_FILE=$2
PARAMETERS_FILE=$3
BUCKET_FOR_PACKAGING=$4

if [ -z "$STACK_NAME" ] || [ -z "$TEMPLATE_FILE" ] || [ -z "$PARAMETERS_FILE" ] || [ -z "$BUCKET_FOR_PACKAGING" ]; then
  echo "Usage: $0 <stack-name> <template-file> <parameters-file> <bucket-for-packaging>"
  exit 1
fi

# Temporary directory for packaging
PACKAGE_TEMP_DIR=$(mktemp -d)

# Find all subdirectories under the lambda directory
LAMBDA_DIRS=($(find lambda -maxdepth 1 -mindepth 1 -type d))

# Prepare the SAM template for dynamic updates
PACKAGED_TEMPLATE_FILE="packaged-template.yaml"
cp $TEMPLATE_FILE $PACKAGED_TEMPLATE_FILE

# Package each Lambda function
for LAMBDA_DIR in "${LAMBDA_DIRS[@]}"; do
  LAMBDA_NAME=$(basename "$LAMBDA_DIR")
  PACKAGE_FILE="$PACKAGE_TEMP_DIR/lambda_package.zip"

  echo "Packaging Lambda function in $LAMBDA_DIR"

  # Copy the lambda function code to the temporary directory
  cp -r "$LAMBDA_DIR/"* "$PACKAGE_TEMP_DIR/"

  # Install dependencies into the temporary directory
  if [ -f "$LAMBDA_DIR/requirements.txt" ]; then
    pip install -r "$LAMBDA_DIR/requirements.txt" -t "$PACKAGE_TEMP_DIR/"
  fi

  # Create a ZIP file for the Lambda function
  (cd "$PACKAGE_TEMP_DIR" && zip -r "$PACKAGE_FILE" .)

  # Upload the package to S3
  S3_URI="s3://$BUCKET_FOR_PACKAGING/$LAMBDA_NAME/lambda_package.zip"
  aws s3 cp "$PACKAGE_FILE" "$S3_URI"

  # Update the SAM template with the new S3 URL
  sed -i.bak "s|lambda/$LAMBDA_NAME/|$S3_URI|g" $PACKAGED_TEMPLATE_FILE

  # Clean up temporary files
  rm -rf "$PACKAGE_TEMP_DIR"/*
done

# Convert parameters file to SAM parameter overrides format
PARAMETER_OVERRIDES=$(jq -r '[ .[] | "ParameterKey=\(.ParameterKey),ParameterValue=\(.ParameterValue)" ] | join(" ")' "$PARAMETERS_FILE")

# Package the application using AWS SAM CLI
sam package \
  --template-file $PACKAGED_TEMPLATE_FILE \
  --output-template-file $PACKAGED_TEMPLATE_FILE \
  --s3-bucket $BUCKET_FOR_PACKAGING

# Deploy the packaged application
sam deploy \
  --template-file $PACKAGED_TEMPLATE_FILE \
  --stack-name $STACK_NAME \
  --parameter-overrides $PARAMETER_OVERRIDES \
  --capabilities CAPABILITY_NAMED_IAM CAPABILITY_AUTO_EXPAND \
  --no-fail-on-empty-changeset

# Clean up
rm -rf "$PACKAGE_TEMP_DIR"

echo "Deployment complete"

This deployment script automates the process of packaging and deploying AWS Lambda functions using the AWS SAM CLI. It takes care of preparing the Lambda function code, uploading it to S3, updating the SAM template with the new S3 URLs, and deploying the stack with the specified parameters.

  • The deployment script takes the TEMPLATE_FILE argument, which should be the path to your main.yaml CloudFormation template.

  • The PARAMETERS_FILE argument is the path to a JSON file containing values for the parameters defined in the template. This allows you to customize the deployment (e.g., specifying the email endpoint and S3 bucket name).

  • If the parameters file contains sensitive information such as passwords, secret keys, or any other credentials, it should be included in .gitignore.

parameters/dev-parameters.json

[
    {
      "ParameterKey": "EmailSubscriptionEndpoint",
      "ParameterValue": "your_email@gmail.com"
    },
    {
      "ParameterKey": "S3BucketName",
      "ParameterValue": "Any_Bucket_Name"
    }
  ]

Run the Deployment Script

Open a terminal, navigate to your project directory, and run the deployment script. For example:

Make the script Executable

chmod +x ././scripts/deploy.sh

Run the script

./scripts/deploy.sh <stack_name> templates/main.yaml parameters/dev-parameters.json <bucket-for-packaging>

This should correctly package your Lambda functions, update the SAM template, and deploy your stack with the specified parameters.

Overcoming Challenges

Specify SAM template parameters using parameters.json

One challenge I faced during the deployment process was that the AWS SAM CLI doesn't accept a JSON file for parameters like CloudFormation does. To get around this, I used jq in the deployment script to convert the parameters from a parameters.json file into a format that the AWS SAM CLI can understand.

The SAM CLI needs parameters to be passed as a string in the format ParameterKey=key,ParameterValue=value. By using jq, I can transform the JSON structure of the parameters file into this required format.

Here’s how jq is used in the deployment script:

# Convert parameters file to SAM parameter overrides format
PARAMETER_OVERRIDES=$(jq -r '[ .[] | "ParameterKey=\(.ParameterKey),ParameterValue=\(.ParameterValue)" ] | join(" ")' "$PARAMETERS_FILE")

This command reads the JSON file specified by PARAMETERS_FILE and processes each element to create a string in the required format. The final result is a single string with all parameters properly formatted for the SAM CLI.


  1. CDK

Github:https://github.com/adityawdubey/S3-File-Upload-Notification-using-AWS-CDK

AWS Cloud Development Kit (CDK) lets you build cloud infrastructure using your favorite programming languages like Python, TypeScript, Java, and .NET etc. CDK makes Infrastructure as Code (IaC) easier and more fun, letting you write neat and tidy code for even the most complex cloud setups.

CDK offers support for various languages and IDEs, so you can use cool features like syntax highlighting and autocompletion to make your life easier :)

Constructs

In the world of CDK, everything is a construct. Think of constructs as the LEGO blocks of AWS CDK apps. They bundle AWS resources together in a reusable and customizable way.

For beginners, check out this video to understand the basic Python CDK project structure.

Writing CDK

Create a CDK app

From a starting directory of your choice, create and navigate to a project directory name on your machine. Use the cdk init command to create a new project in your preferred programming language:

cdk init --language python
source .venv/bin/activate
python3 -m pip install -r requirements.txt

In the initial stack file main_stack.py (which I renamed; it's usually <project_name>.py), the AWS CDK performs the following: your CDK stack instance is created from the Stack class, and the Constructs base class is imported and used as the scope or parent of your stack instance.

Create your Lambda function

Within your CDK project, create a lambda/s3_event_handler directory. This directory should include a new handler.py file and a requirements.txt file. The handler.py file will contain the Lambda function logic, and the requirements.txt file will list the necessary packages.

When defining your Lambda function resource in AWS CDK, you can utilize the aws-lambdaL2 construct from the AWS Construct Library. This simplifies the process of creating and managing Lambda functions.

Lambda Layers are optional but highly useful for managing dependencies and keeping deployment packages smaller. Instead of including all dependencies in each Lambda function's deployment package, you can create a layer with the shared dependencies and attach it to multiple functions.

Automatic Packaging for AWS Lambda with CDK

Developing serverless applications with AWS Lambda can be quite a headache when managing functions and their dependencies. Before, I had to manually package and deploy everything, which was tedious and error-prone. I discovered the AWS Cloud Development Kit (CDK) has a fantastic module aws-lambda-python-alpha (currently in alpha stage). This module makes the entire process so much smoother, especially for Python functions and layers. Now, I can focus on writing code instead of dealing with zip files :):

Traditional Approach: Manual Grind

Automatic Packaging with aws-lambda-python-alpha

With aws-lambda-python-alpha you don't need to manually create deployment packages or handle the packaging and uploading of your Python code and dependencies.

In my project, I have used the aws-lambda-python-alpha module to create layer and the usual aws_cdk.aws_lambda.Function for creating lambda function.

What if the dependencies in requirements.txt are greater than 250mb ?

If the dependencies in your requirements.txt file make the Lambda layer larger than 50 MB (zipped for direct upload) or 250 MB (unzipped), see Lambda quotas. This can be a problem with large libraries like pandas or numpy. To solve this, you have a few options:

  1. Split the dependencies into multiple layers : Instead of having a single layer with all dependencies, you can create multiple layers, each containing a subset of the dependencies. This way, you can stay within the 250 MB limit for each layer. However, you will need to manage and attach multiple layers to your Lambda functions.

  2. Use AWS Lambda container images: Instead of using Lambda layers, package your dependencies and code into a container image and deploy it as an AWS Lambda container image. Container images have a larger size limit of 10 GB.

    To create a Lambda function from a container image, build your image locally and upload it to an Amazon Elastic Container Registry (Amazon ECR) repository. Then, specify the repository URI when you create the function.

  3. Use Amazon S3 or Amazon EFS for larger dependencies : If you have large dependencies that cannot be included in a Lambda layer or container image, you can consider storing them in Amazon S3 or Amazon EFS and downloading them at runtime in your Lambda function.

  4. Use AWS Lambda Extensions : AWS Lambda Extensions allow you to extend the capabilities of your Lambda functions by running additional processes alongside your function code. You can use an Extension to download and install dependencies at runtime, effectively bypassing the layer size limit.

These workarounds can help you get past the 250 MB limit for Lambda layers, they might add some extra complexity and could affect performance. Make sure to weigh the pros and cons and pick the solution that best suits your needs.

Managing Environment Variables

Local Development

To manage environment variables for local development, create a .env file at the root of your project. This file should contain key-value pairs representing your environment variables.

.env file:

EMAIL_SUBSCRIPTION_ENDPOINT = "luffy@gmail.com"
FILE_UPLOAD_BUCKET = bucket_name

To keep sensitive information safe and out of your github, make sure to add .env to your .gitignore file.

CI/CD (GitHub Actions)

For CI/CD .github/workflows/dev.yaml: Add the environment variables and secrets to your GitHub repository's environment.

Deploying CDK

Before deploying your CDK stack, you must perform a one-time bootstrapping of your AWS environment. This sets up necessary resources used by the CDK toolkit. This is included in the deployment script deploy.sh.

After bootstrapping, you can synthesize your CDK stack by running:

cdk synth: This command generates the CloudFormation template for your stack. Once synthesized, you can deploy your stack using:

cdk deploy: This will deploy your resources to your AWS environment.

Running the Deployment Script

Make the script executable and run the script: scripts/deploy.sh

chmod +x ./scripts/deploy.sh
./scripts/deploy.sh

This script will: bootstrap the CDK environment, synthesize the CloudFormation templates, and deploy the CDK stack.

Once deployed, any file uploaded to the specified S3 bucket will trigger the Lambda function. The Lambda function processes the file and sends a notification via SNS.


  1. Terraform

Github:https://github.com/adityawdubey/S3-File-Upload-Notification-using-Terraform

Terraform can manage infrastructure on multiple cloud platforms like AWS, Google Cloud Platform, and Microsoft Azure. Its main advantage is that it allows developers to use a wide range of modules and providers.

Terraform State

When you run Terraform, it creates a terraform.tfstate file in your working directory. This file is crucial as it keeps track of the real state of your infrastructure. If multiple team members are working on the same Terraform codebase, it's essential to use a shared, remote state file to avoid conflicts. Typically, this remote state is stored in an S3 or GCS bucket. This approach ensures everyone has a consistent view of the infrastructure, preventing issues like trying to recreate existing resources.

https://developer.hashicorp.com/terraform/language/state/remote

Writing Terraform

Automatic Packaging of Lambda Dependencies with Terraform

Managing dependencies for AWS Lambda functions can be cumbersome, especially when dealing with multiple functions and third-party libraries. Traditionally, you might bundle dependencies manually, but this becomes unmanageable as the project scales.

Terraform does not have built-in automatic packaging for Lambda function dependencies like the AWS CDK's aws_lambda_python_alpha module. However, there are several ways to manage and package dependencies when using Terraform to create and manage AWS Lambda functions:

Manual Packaging : You can manually create a ZIP or Docker container image with your Lambda function code and dependencies, and then upload it to an Amazon S3 bucket or Amazon Elastic Container Registry (ECR). Terraform can then reference the uploaded artifact when creating or updating the Lambda function.

Using thearchive_fileData Source: Terraform offers an archive_file data source that lets me create a ZIP archive from a directory containing my Lambda function code and dependencies. This ZIP archive can then be used as the deployment package for the Lambda function. Typically, Lambda function deployments include the function code and dependencies in one archive, which can lead to larger deployments and versioning issues. This is what I'll be using.

Third-Party Providers or Modules : There are several third-party Terraform providers and modules that can help with packaging and deploying Lambda functions with dependencies. For example, the terraform-aws-lambdahttps://github.com/terraform-aws-modules/terraform-aws-lambda

Creating Lambda Layer

Creating a Lambda layer in Terraform involves a few steps to ensure your dependencies are properly installed and packaged. First, we use the archive_file data source to create a ZIP file of the dependencies. Then, we create a Lambda layer resource that references this ZIP file

...
data "archive_file" "lambda_layer" {
  type        = "zip"
  source_dir  = "${path.module}/lambda_layer"
  output_path = "${path.module}/lambda_layer.zip"
}

resource "null_resource" "install_layer_dependencies" {
  triggers = {
    requirements_md5 = filemd5("${path.module}/lambda_layer/requirements.txt")
  }

  provisioner "local-exec" {
    command = "pip install -r ${path.module}/lambda_layer/requirements.txt -t ${path.module}/lambda_layer/python"
  }
}

resource "aws_lambda_layer_version" "lambda_layer" {
  filename         = data.archive_file.lambda_layer.output_path
  layer_name       = "my_lambda_layer"
  source_code_hash = data.archive_file.lambda_layer.output_base64sha256

  compatible_runtimes = ["python3.9"]

  depends_on = [null_resource.install_layer_dependencies]
}
...

The null_resource is a special type of resource in Terraform that doesn't manage any specific infrastructure. Instead, it's a flexible and powerful tool that can be used to execute arbitrary commands or scripts during your Terraform runs. This makes it perfect for tasks such as installing dependencies, running custom scripts, or performing other operations that don't directly correspond to a specific infrastructure resource.

The null_resource is used to install the Python dependencies listed in the requirements.txt file into a directory that will later be packaged into a Lambda layer.

The triggers block is a map of key-value pairs that Terraform uses to determine when the null_resource should be recreated. Any change to the values in this map will cause the null_resource to be recreated

requirements_md5 = filemd5("${path.module}/lambda_layer/requirements.txt"): This line calculates the MD5 checksum of the requirements.txt file. If the contents of the requirements.txt file change, the MD5 checksum will also change, which in turn will trigger the recreation of the null_resource.

Creating lambda Function

After creating the Lambda layer, you can create your Lambda function and attach the layer to it. This ensures that your function code remains lightweight and only includes the necessary dependencies through the layer.

...
data "archive_file" "lambda_function" {
  type        = "zip"
  source_dir  = "${path.module}/lambda_functions/s3_event_handler"
  output_path = "${path.module}/lambda_functions/s3_event_handler.zip"
}

resource "aws_lambda_function" "lambda_function" {
  filename         = data.archive_file.lambda_function.output_path
  function_name    = "s3_event-handler"
  role             = aws_iam_role.lambda_role.arn
  handler          = "handler.handler"
  runtime          = "python3.9"
  source_code_hash = data.archive_file.lambda_function.output_base64sha256
  layers           = [aws_lambda_layer_version.lambda_layer.arn]

  environment {
    variables = {
      SNS_TOPIC_ARN = aws_sns_topic.sns_topic.arn
      SQS_QUEUE_URL = aws_sqs_queue.sqs_queue.id
    }
  }
}
...

Terraform will create a zip file for us, as this is needed for deploying it to an AWS lambda function. Then we reference the output_path of that resource as the filename for our lambda.

The source_code_hash is an option for lambda functions that prevent deployment of the lambda function when the hash of the zip file is unchanged. So when there's no change in code, there should not be a new deployment. This feature is particularly useful when deploying multiple functions, as it ensures that only those with relevant changes are redeployed.

Deploying terraform

Install Terraform

The easiest way to install Terraform on macOS is by using Homebrew.

brew tap hashicorp/tap
brew install hashicorp/tap/

Initialize Terraform

Initialize Terraform to download the necessary providers

terraform init

Plan Terraform Deployment

Before applying the Terraform configuration, it's good practice to review the changes that will be made. Run the following command to generate an execution plan:

terraform plan

This will create a plan file named tfplan which you can review to understand what changes will be applied.

Apply Terraform Configuration

Apply the Terraform configuration to create the resources:

terraform apply tfplan

Conclusion

Implementing the S3 event notification system with AWS Lambda, SNS, and SQS was a success. It also highlighted the strengths and differences between Terraform, AWS CloudFormation, and AWS CDK.

Terraform is known for its strong state management, fixing issues automatically. CloudFormation has a similar feature called Drift Detection, which checks for changes in your stack.

CloudFormation, being AWS-native, directly manages your resources. Terraform, usable across multiple services, plans API calls to AWS to build your infrastructure. This makes Terraform great for multi-cloud setups.

In terms of supported languages, Terraform uses HashiCorp Configuration Language (HCL) and JSON, while CloudFormation supports JSON and YAML. Additionally, CloudFormation handles state management automatically, whereas Terraform requires manual setup but offers greater control.

Ultimately, choosing the right tool depends on your project's specific needs. This project demonstrated how each tool can effectively manage infrastructure in different ways.

References and Resources

Further Exploration

  • Input Validation: Implement input validation in your Lambda function to ensure data integrity and prevent unexpected behavior. Explore libraries like marshmallow or pydantic for data validation in Python. Check out this guide on input validation.

  • How to send an email notification after uploading multiple files to S3 instead of after each file? batch processing?

  • Explore using Amazon EventBridge for event routing instead of the traditional SQS/SNS combination.

  • Explore using Pulumi for infrastructure as code, which allows you to use familiar programming languages to define your infrastructure. Pulumi offers cross-cloud support, combining the strengths of both CDK and Terraform.

Did you find this article valuable?

Support Aditya Dubey by becoming a sponsor. Any amount is appreciated!