Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

amideploy

Manage AMI Connect infrastructure and deploy new code.

Structure:

  • infrastructure: Code to create AWS resources using terraform
  • deploy: Code to stand up AMI Connect on AWS resources using Docker

This README describes how to create an AMI Connect deployment from scratch. But you may also use it to set up your local environment to work on an existing AMI Connect deployment.

Prerequisites

AMI Connect is built to run on AWS infrastructure. You'll need an AWS account with credentials that permit you to create AWS resources.

We recommend you use the AWS CLI and its configuration pattern to store your AWS credentials. Specifically, you should have an ~/.aws/credentials file with an access key and secret access key.

We use terraform to manage infrastructure resources. Here are installation instructions.

(If you are creating a new AMI Connect site) Pick a hostname for your Airflow application

AMI Connect will build an Airflow application accessible on the public internet (but protected by a username and password). You'll want to pick a domain name for this site, then register a domain with Route 53 in your AWS account. The Route53 domain is left out of our terraform code because we don't want anyone to accidentally create multiple domain names. Terraform will link the hosted zone for your domain name to your Airflow webserver.

As an example: CaDC's AMI Connect deployment uses the cadc-ami-connect.com domain.

Get started with terraform

Go to the ./amideploy/infrastructure directory:

cd amideploy/infrastructure

We expect that you'll create a terraform workspace here. This will isolate your terraform state and allow you to create a *.tfvars file for your environment. Using a workspace adds complexity, but it allows us to deploy multiple instances of AMI Connect from this single open source repo. If you know that you won't deploy to multiple environments, you may skip this step but the instructions below might not work for you.

From amideploy/infrastructure, run:

terraform init
terraform workspace new <your workspace>

mkdir environments/<your workspace>
touch environments/<your workspace>/<your workspace>.tfvars

<your workspace>.tfvars should contain your own AMI Connect infrastructure configuration. If your teammates have already created an environment, you should copy their <your workspace>.tfvars file.

Otherwise, you'll create your own. See variables.tf for the variables you'll specify. Here's an example <your workspace>.tfvars file:

# cat ./environments/my-workspace/my-workspace.tfvars 
aws_profile = "my-aws-profile-name-associated-with-local-credentials"
aws_region = "us-west-2"
airflow_db_password = "myairflowdbpwd"
airflow_hostname = "my-ami-connect-domain.com"
ami_connect_s3_bucket_name = "my-s3-bucket-name"

The value in aws_profile should match the AWS profile name in your ~/.aws/credentials file, where an AWS access key's details give access to your AWS account.

Copy teammate's local terraform state

If you're setting up an environment that already exists, you should copy your teammate's local terraform state (i.e. the entire terraform.tfstate.d directory) to your ./amideploy/infrastructure directory.

Run terraform to create the infrastructure

Each terraform command will need to reference the *.tfvars file you created above. Run this to make sure everything is working:

terraform plan -var-file="./environments/$(terraform workspace show)/$(terraform workspace show).tfvars" 

This command should exit without error. It should describe a number of resources we'll create:

  • An EC2 that will host our Airflow application
  • A Postgresql database for Airflow's metastore
  • An Elastic IP assigned to our EC2
  • An "A" record that connects your Route53 instance to that Elastic IP
  • Security groups to make all of the networking work
  • An S3 bucket
  • An IAM role that grants permission for resources to access S3
  • More!

If that looks good, create the infrastructure:

terraform apply -var-file="./environments/$(terraform workspace show)/$(terraform workspace show).tfvars" 

Create Terraform output file

Our terraform code specifies outputs that include things like the IP address of the Airflow server. These outputs can be useful as you develop. They're also used by our deploy.sh script to connect to the Airflow server. You should tell terraform to make a file with the outputs by running a command like this:

mkdir ../configuration
terraform output -json > ../configuration/$(terraform workspace show)-output.json
terraform output -raw airflow_server_private_key_pem > ../configuration/$(terraform workspace show)-airflow-key.pem
chmod 600 ../configuration/$(terraform workspace show)-airflow-key.pem

Configure the infrastructure

Now that your servers are created, you'll need to configure them to run Airflow.

Access the Airflow EC2's private key, store it on your machine, and prep it for SSH:

terraform output -raw airflow_server_private_key_pem > ../configuration/airflow-key.pem
chmod 600 ../configuration/airflow-key.pem

Never check these files into version control!

Go to the ./amideploy/configuration directory:

cd ../configuration

SSH into your EC2 server

Find the public hostname of your EC2 server. SSH into it with:

ssh -i ./airflow-key.pem ec2-user@<public hostname>

Install and run Airflow on the EC2

On the EC2, install software we'll need to deploy the system:

sudo yum install git
sudo yum install docker
sudo systemctl start docker
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-linux-x86_64" -o /usr/libexec/docker/cli-plugins/docker-compose
sudo chmod +x /usr/libexec/docker/cli-plugins/docker-compose

Back on your laptop, run the deploy.sh script to start Airflow and your pipeline.

Now Airflow is running inside Docker on the server. You should see a response from Airflow if you run curl localhost:8080 on the server.

Inside the docker container, you can create an admin user with the following:

airflow users create   --username admin   --firstname Admin   --lastname User   --role Admin   --email <pick an email address>   --password <pick a password>

Create nginx reverse proxy

We use nginx to create a reverse proxy on the server. Our security groups are configured to allow HTTP traffic on port 80, and nginx will forward that traffic to Airflow.

Install and run nginx:

sudo yum install nginx -y
sudo systemctl start nginx
sudo systemctl enable nginx

Gather your AMI Connect domain name, then add following to /etc/nginx/nginx.conf inside the http block:

server {
    listen 80;
    server_name <your AMI connect domain, e.g. cadc-ami-connect.com>;

    location / {
        proxy_pass http://localhost:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Restart nginx:

sudo systemctl restart nginx

Now you should be able to access your Airflow site in your browser using the domain name you picked. Login with the username and password we created above. Use "http", not "https".