Skip to content

Commit 1b86971

Browse files
authored
Airflow integration (#39)
* Airflow eks template for terraform. * rename `eks` folder to `eks_argo`
1 parent 2e78a28 commit 1b86971

18 files changed

+445
-1
lines changed

examples/eks_airflow/README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# An example of deploying Metaflow with Airflow on an EKS cluster
2+
3+
This example will create Metaflow infrastructure from scratch, with a Kubernetes cluster using Amazon EKS. It uses [`datastore`](../../modules/datastore/) and [`metadata-service`](../../modules/metadata-service/) submodules to provision S3 bucket, RDS database and Metaflow Metadata service running on AWS Fargate.
4+
5+
To run Metaflow jobs, it provisions a EKS cluster using [this popular open source terraform module](https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest). In that cluster, it also installs [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) and [Airflow](https://airflow.apache.org/) using Helm.
6+
7+
Specifically, it'll create following resources in your AWS account:
8+
* General networking infra:
9+
* AWS VPC
10+
* NAT gateway for private subnets in the VPC
11+
* For storing data artifacts:
12+
* S3 bucket
13+
* For Metaflow metadata:
14+
* RDS Database instance (on-demand, Multi-AZ, db.t2.small)
15+
* ECS service for Metaflow Metadata service
16+
* Network load balancer
17+
* API Gateway
18+
* For executing Metaflow tasks:
19+
* Autoscaling EKS cluster with at least one instance running
20+
21+
Note that all this infrastructure costs a non-trivial amount at rest, up to $400/month and more if being actively used.
22+
23+
## Instructions
24+
25+
0. Run `terraform init`
26+
1. Run `terraform apply` to create infrastructure. This command will typically take ~20 minutes to execute.
27+
2. Make note of the EKS cluster name (it is a short string that starts with `mf-`). Use AWS CLI to generate cluster configuration:
28+
```bash
29+
aws eks update-kubeconfig --name <CLUSTER NAME>
30+
```
31+
2. Copy `config.json` to `~/.metaflowconfig/`
32+
3. You should be ready to run Metaflow flows using `@kubernetes`
33+
and be able to deploy them to Airflow.
34+
35+
Airflow UI is not accessible from outside the cluster, but you can use port forwarding to see it. Run
36+
```bash
37+
kubectl port-forward -n airflow deployment/airflow-deployment-webserver 8080:8080
38+
```
39+
..and you should be able to access it at `localhost:8080`.
40+
41+
## Destroying the infrastructure
42+
43+
Note that this will destroy everything including the S3 bucket with artifacts!
44+
45+
Run `terraform destroy`
46+
47+
# What's missing
48+
49+
⚠️ This is meant as a reference example, with many things omitted for simplicity, such as proper RBAC setup, production-grade autoscaling and UI. For example, all workloads running in the cluster use the same AWS IAM role. We do not recommend using this as a production deployment of Metaflow on Kubernetes.
50+
51+
For learn more about production-grade deployments, you can talk to us on [the Outerbounds slack](http://slack.outerbounds.co). We are happy to help you there!

examples/eks_airflow/airflow.tf

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
provider "helm" {
2+
kubernetes {
3+
host = data.aws_eks_cluster.cluster.endpoint
4+
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
5+
token = data.aws_eks_cluster_auth.cluster.token
6+
}
7+
}
8+
9+
resource "helm_release" "cluster_autoscaler" {
10+
name = "autoscaler"
11+
12+
depends_on = [module.eks]
13+
14+
repository = "https://kubernetes.github.io/autoscaler"
15+
chart = "cluster-autoscaler"
16+
namespace = "kube-system"
17+
18+
set {
19+
name = "autoDiscovery.clusterName"
20+
value = local.cluster_name
21+
}
22+
23+
set {
24+
name = "awsRegion"
25+
value = data.aws_region.current.name
26+
}
27+
}
28+
29+
30+
resource "kubernetes_namespace" "airflow" {
31+
metadata {
32+
name = "airflow"
33+
}
34+
}
35+
36+
data "aws_region" "current" {}
37+
38+
variable "airflow_webserver_secret" {
39+
type = string
40+
default = "mysupersecr3tv0lue"
41+
}
42+
43+
# This secret is that the airflow webserver used to sign session cookies.
44+
# https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#webserver-secret-key
45+
resource "kubernetes_secret" "airflow-webserver-secret" {
46+
metadata {
47+
name = "airflow-webserver-secret"
48+
namespace = kubernetes_namespace.airflow.metadata[0].name
49+
}
50+
type = "Opaque"
51+
data = {
52+
webserver-secret-key = var.airflow_webserver_secret
53+
}
54+
}
55+
56+
57+
locals {
58+
airflow_values = {
59+
"executor" = "LocalExecutor"
60+
"defaultAirflowTag" = "2.3.3"
61+
"airflowVersion" = "2.3.3"
62+
"webserverSecretKeySecretName" = kubernetes_secret.airflow-webserver-secret.metadata[0].name
63+
"env" = [
64+
{
65+
"name" = "AIRFLOW_CONN_AWS_DEFAULT"
66+
"value" = "aws://"
67+
},
68+
{
69+
"name" = "AIRFLOW__LOGGING__REMOTE_LOGGING"
70+
"value" = "True"
71+
},
72+
{
73+
"name" = "AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER"
74+
"value" = "s3://${module.metaflow-datastore.s3_bucket_name}/airflow-logs"
75+
},
76+
{
77+
"name" = "AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID"
78+
"value" = "aws_default"
79+
}
80+
]
81+
}
82+
}
83+
84+
resource "helm_release" "airflow" {
85+
86+
depends_on = [module.eks]
87+
88+
name = "airflow-deployment"
89+
90+
repository = "https://airflow.apache.org"
91+
chart = "airflow"
92+
93+
namespace = kubernetes_namespace.airflow.metadata[0].name
94+
95+
timeout = 1200
96+
97+
wait = false # Why set `wait=false`
98+
#: Read this (https://github.com/hashicorp/terraform-provider-helm/issues/683#issuecomment-830872443)
99+
# Short summary : If this is not set then airflow doesn't end up running migrations on the database. That makes the scheduler and other containers to keep waiting for migrations.
100+
101+
values = [
102+
yamlencode(local.airflow_values)
103+
]
104+
}
File renamed without changes.
File renamed without changes.
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
data "aws_api_gateway_api_key" "metadata_api_key" {
2+
id = module.metaflow-metadata-service.api_gateway_rest_api_id_key_id
3+
}
4+
5+
resource "local_file" "foo" {
6+
content = jsonencode({
7+
"METAFLOW_SERVICE_AUTH_KEY" = data.aws_api_gateway_api_key.metadata_api_key.value
8+
"METAFLOW_DATASTORE_SYSROOT_S3" = module.metaflow-datastore.METAFLOW_DATASTORE_SYSROOT_S3,
9+
"METAFLOW_DATATOOLS_S3ROOT" = module.metaflow-datastore.METAFLOW_DATATOOLS_S3ROOT,
10+
"METAFLOW_SERVICE_URL" = module.metaflow-metadata-service.METAFLOW_SERVICE_URL,
11+
"METAFLOW_KUBERNETES_NAMESPACE" = "airflow",
12+
"METAFLOW_DEFAULT_DATASTORE" = "s3",
13+
"METAFLOW_DEFAULT_METADATA" = "service"
14+
})
15+
filename = "${path.module}/config.json"
16+
}
File renamed without changes.
File renamed without changes.
File renamed without changes.

examples/eks/README.md renamed to examples/eks_argo/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# An example of deploying Metaflow with a EKS cluster
1+
# An example of deploying Metaflow with Argo on an EKS cluster
22

33
This example will create Metaflow infrastructure from scratch, with a Kubernetes cluster using Amazon EKS. It uses [`datastore`](../../modules/datastore/) and [`metadata-service`](../../modules/metadata-service/) submodules to provision S3 bucket, RDS database and Metaflow Metadata service running on AWS Fargate.
44

0 commit comments

Comments
 (0)