This folder contains code and intructions to deploy the platform infrastructure illustrated in the main README.md.
.
├── .github/ // GitHub Actions definitions
├── images/ // Docker images' definitions
├── infra/ // CDK project
├── scripts/ // Automation scripts
├── .env // Environment variables
├── Makefile // Make rules for automation
└── requirements-dev.txt // Python packages for automation scripts
This project is defined and deployed using AWS Cloud Development Kit (AWS CDK). CDK Stacks' definitions are located in the infra/stacks folder.
Make rules are used to automate deployment steps. Available rules are covered in the Deployment section.
You need to login with user credentials when accessing Airflow web UI:
- username: user
- password: bitnami
You can alter these credentials by setting environment variables for the Apache Airflow webserver Fargate task in infra/stacks/fargate_services/airflow.py:
environment={
"AIRFLOW_USER": "<YOUR_USERNAME>",
...
},
secrets={
"AIRFLOW_PASSWORD": ecs.Secret.from_secrets_manager(
<YOUR_USER_PASSWORD_SECRET>
),
...
}You need to perform a few steps to set up the local environment.
Before moving on with the project deployment, complete the following checks:
- Install
npmon your machine - Install
Pythonon your machine - Ensure that AWS CLI is installed and configured on your machine
- Ensure that CDK is installed and configured on your machine
NOTE: 1.90.0, hence the same version or higher is required.
To create a virtual environment run the following make rule:
# from the root directory
$ make venvThis rule will create a virtual environment in infra/venv and install all necessary dependencies for the project.
Airflow uses Fernet to encrypt passwords in the connection configuration and the variable configuration. To generate a new Fernet key for the project run:
# from the root directory
$ make generate_fernet
This is your Fernet key: <fernet_key>Store your fernet_key to AWS Secrets Manager:
aws secretsmanager create-secret –name fernetKeySecret –-description “Fernet key for Airflow” –secret-string YOUR_FERNET_KEYOnce you created the fernet_key secret, you can set environment variables in .env file.
AWS_REGION: AWS region to which you wish to deploy this projectBUCKET_NAME: choose a unique name for an Amazon S3 bucket that will host artifacts for Airflow and dbt DAGsFERNET_SECRET_ARN: ARN of the secret with thefernet_keyECR_URI: a unique identifier for the Amazon ECR repository. It can be easily composed with your AWS Account ID and AWS region:<AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com
Assuming that the project will be deployed in eu-west-1 region, the .env file will look like this:
AWS_REGION=eu-west-1
BUCKET_NAME=my-unique-dataops-bucket-name
FERNET_SECRET_ARN=arn:aws:secretsmanager:eu-west-1:123456789012:secret:airflow/fernet_key-AbCdEf
ECR_URI=123456789012.dkr.ecr.eu-west-1.amazonaws.comIf you've performed all steps from the Prerequisites, you can now deploy the project.
The deployment process is divided in three make rules:
bootstraprule deploys infrastructure components which are not frequently updated (VPC, S3, ECR, Redis, RDS, Redshift)push_imagesrule uploads Airflow and dbt Docker images to Amazon ECRdeployrule deploys ECS cluster, and Airflow and dbt services
Let's bootstrap an AWS CDK environment and deploy baseline resources:
# from the root directory
$ make bootstrapNOTE: y and press Enter.
Now that the baseline resources are created, let's upload Docker images for Airflow and dbt to Amazon ECR, which will be used in ECS task definitions later on.
Docker needs to be installed and running on your machine in order to upload images to Amazon ECR. To install and configure Docker please refer to the official documentation.
Make sure that Docker is running on your machine and then execute the push_images rule:
# from the root directory
$ make push_imagesFinally, let's deploy the ECS cluster, and Aiflow and dbt services. To do that, execute the deploy rule:
# from the root directory
$ make deployNOTE: y and press Enter.
Follow this tutorial to load example data into a Amazon Redshift cluster using the Query Editor. To log in to the Query Editor, use the following:
- Database name:
redshift-db - Database user:
redshift-user
For uploading the sample data into Amazon S3, use the bucket that was created during deployment.
To copy data from Amazon S3 into Redshift, the copy command needs ARN of the Redshift IAM role that was created during deployment. Execute the following command to retrieve the ARN:
aws redshift describe-clusters --query 'Clusters[*].IamRoles[*].IamRoleArn'To destroy all resources created for this project execute the destroy rule:
# from the root directory
$ make destroyNOTE: y and press Enter.
We have also provided preconfigured GitHub Actions workflows to automate the upload of new versions of Docker images to Amazon ECR, and the deployment of Fargate tasks.
These workflows are designed to work in conjuction with AWS CodeBuild using the aws-actions/aws-codebuild-run-build action. Build specifications files are located in images/airflow_buildspec.yml and images/dbt_buildspec.yml, respectively.
To use provided GitHub Actions workflows, you need to create a AWS CodeBuild project and connect it with your GitHub repository. You can follow this documentation page to do that from your AWS Console. It is worth mentioning that GitHub personal access token needs to be generated and added to the CodeBuild Source in order to configure GitHub repository as a source for the project.
When creating AWS CodeBuild project, pay attention to the following:
- add necessary IAM policies to the CodeBuild service IAM Role to grant access to Amazon ECR
- when creating the project choose these settings:
- Ubuntu, for the Operating system
- Standard, for the Runtime
- aws/codebuild/standard:4.0, for the Image
- enable Privileged mode
All these details can be found in the Docker sample section of AWS CodeBuild documentation.
Finally, to use provided GitHub actions workflows in this project do the following:
- replace
<AWS_ACCOUNT_ID>with your AWS Account ID - replace
<AWS_REGION>with your AWS region - replace
<CODEBUILD_PROJECT_NAME>with the name of AWS CodeBuild project that you created - update the trigger rule based on on preferred events