- I. Project Overview
- II. Architecture / Design
- III. Prerequisites
- IV. Installation / Setup
- V. Usage
- VI. Infrastructure
- VII. Configuration
- VIII. Project Structure
- IX. Limitations / Assumptions
This project deploys Metabase on AWS using Terraform. Metabase is an open-source business intelligence and analytics platform that allows users to query and visualize data.
The infrastructure provisions a fully managed, containerized Metabase instance running on AWS ECS Fargate with an Aurora PostgreSQL database as the application backend. The deployment includes integration with AWS Athena and Glue to enable querying data sources.
Intended users: Data analysts, data engineers, and business users who need to explore and visualize organizational data stored in AWS.
The solution uses a containerized microservices architecture deployed on AWS:
-
Compute Layer (ECS Fargate)
- Metabase runs as a Docker container on AWS ECS Fargate
- Container image is pulled from Docker Hub (metabase/metabase:v0.54.x) and pushed to a private ECR repository
- ECS task definition configures Metabase with 512 CPU units and 2048 MB memory
- Health checks monitor the
/api/healthendpoint
-
Database Layer (Aurora PostgreSQL)
- Aurora Serverless v2 cluster for Metabase application metadata
- PostgreSQL engine version 14.12
- Auto-scaling between 0.0 and 1 ACU (Aurora Capacity Unit)
- Database credentials stored in AWS Systems Manager Parameter Store (SSM)
- Snapshot Management: On database destruction, a snapshot is automatically created with a timestamp-based identifier
- Snapshot Restoration: The database can be restored from the most recent snapshot on creation
-
Load Balancer
- Application Load Balancer (ALB) exposes Metabase on HTTP ports 80 and 3000
- Access is restricted to whitelisted CIDR blocks
- Access logs are stored in S3 with a 31-day retention policy
-
Data Integration
- AWS Athena workgroup for querying data sources
- IAM permissions to access AWS Glue catalog for data discovery
- Metabase can connect to data sources via Athena
-
Networking
- Deployed in an existing VPC with public, private, and intra subnets
- Security groups control traffic between ALB, ECS, and RDS
- NAT Gateway required for outbound internet access from private subnets
- Load Balancer → ECS Service: Forwards HTTP traffic to the Metabase container
- ECS Task → Aurora RDS: Metabase stores its metadata in PostgreSQL
- ECS Task → Athena/Glue: Metabase queries data via Athena
- ECS Task → S3: Writes Athena query results to S3
- Terraform → ECR: Automated script pulls the official Metabase image from Docker Hub and pushes it to ECR
- Terraform: >= 1.5.4
- AWS CLI: Configured with credentials for the target AWS account
- Docker: Required for pulling and pushing the Metabase container image
- Bash: The ECR image upload script requires a Bash-compatible shell
- An existing VPC with the naming convention
{project_name}_network_platform_prod - Subnets tagged with
Tier: Public,Tier: Private, andTier: Intra - NAT Gateway deployed and available in the VPC
- An S3 bucket and DynamoDB table for Terraform state backend (provided at runtime)
The AWS credentials or assumed role must have permissions to:
- Create and manage ECS clusters, services, and task definitions
- Create and manage RDS Aurora clusters and instances
- Create and manage Application Load Balancers and target groups
- Create and manage ECR repositories
- Create and manage S3 buckets and policies
- Create and manage IAM roles and policies
- Create and manage security groups
- Create and manage CloudWatch log groups
- Create and manage SSM parameters
- Create and manage Athena workgroups
- Access Glue catalog resources
- Assume roles (if
role_to_assume_arnis provided)
- AWS Region:
eu-west-1(Ireland) - specified in the Terraform S3 backend configuration
git clone <repository-url>
cd metabaseCreate a terraform.tfvars file in the iac/ directory with the required variables:
project_name = "your-project-name"
git_repository = "https://gitlab.com/your-org/metabase"
cidr_list_to_whitelist = "10.0.0.0/8,172.16.0.0/12"
role_to_assume_arn = "" # Optional: IAM role ARN to assumeCreate a backend.hcl file in the iac/ directory (not tracked in Git):
bucket = "your-terraform-state-bucket"
dynamodb_table = "your-terraform-lock-table"Alternatively, provide backend configuration via environment variables:
export TERRAFORM_BACKEND_BUCKET="your-terraform-state-bucket"
export TERRAFORM_BACKEND_DYNAMODB="your-terraform-lock-table"cd iac
terraform init -backend-config="bucket=$TERRAFORM_BACKEND_BUCKET" -backend-config="dynamodb_table=$TERRAFORM_BACKEND_DYNAMODB"The workspace determines the environment name (stage_name):
# Create a new workspace for a new environment
terraform workspace new dev
# Or select an existing workspace
terraform workspace select prodImportant: If no workspace is selected, the default workspace is used, resulting in resources named with stage_name=default.
- Open
iac/data.tfand comment out the data source that retrieves the latest snapshot:
# data "aws_db_cluster_snapshot" "latest" {
# most_recent = true
# db_cluster_identifier = replace(local.environment_name, "_", "-")
# }- Open
iac/rds_aurora.tfand comment out the snapshot restoration line:
resource "aws_rds_cluster" "main" {
# ... other configuration ...
# snapshot_identifier = data.aws_db_cluster_snapshot.latest.id
# ... rest of configuration ...
}- Deploy the infrastructure:
terraform plan
terraform apply- After the first successful deployment, you can uncomment these lines for future deployments to enable snapshot restoration.
Once a snapshot exists (after at least one successful deployment and destruction), you can deploy with snapshot restoration enabled:
terraform plan
terraform applyThe deployment process will:
- Create the ECR repository
- Pull the Metabase Docker image from Docker Hub and push it to ECR
- Create the Aurora PostgreSQL database (restoring from the most recent snapshot if available)
- Create the ECS cluster, task definition, and service
- Create the Application Load Balancer
- Configure security groups, IAM roles, and other supporting resources
Once deployed, retrieve the load balancer's public IP address:
cd iac
terraform output load_balancer_ip_addressAccess Metabase in your browser:
http://<load_balancer_ip>:80
# or
http://<load_balancer_ip>:3000
Note: Access is restricted to IP addresses in the whitelisted CIDR ranges specified in cidr_list_to_whitelist.
On first access, Metabase will prompt you to:
- Create an admin account
- Configure data sources (e.g., PostgreSQL, MySQL, Athena, etc.)
Metabase is pre-configured with IAM permissions to query Athena. To add an Athena data source:
- In Metabase, go to Settings → Admin → Databases
- Add a new database and select Amazon Athena
- Configure:
- AWS Region:
eu-west-1 - Workgroup:
{project_name}_metabase_{stage_name} - S3 Output Location: Managed automatically by the Athena workgroup
- Authentication: Use IAM role-based authentication (no credentials needed)
- AWS Region:
Application logs are available in CloudWatch:
aws logs tail {project_name}_metabase_{stage_name} --followTo troubleshoot or inspect the running container:
aws ecs execute-command \
--cluster {project_name}_metabase_{stage_name} \
--task <task-id> \
--container {project_name}_metabase_{stage_name} \
--interactive \
--command "/bin/sh"When the Metabase instance is not actively used, you can significantly reduce AWS costs by destroying the ECS service and Application Load Balancer while preserving the database and its data:
cd iac
terraform destroy -target aws_ecs_service.main -target aws_lb.main --auto-approveThis command:
- Destroys the ECS service (stopping the Metabase container)
- Destroys the Application Load Balancer
- Preserves the Aurora database, ECR repository, S3 bucket, and all other resources
- Reduces costs by eliminating ECS Fargate compute charges and ALB hourly charges
To resume the service, simply run terraform apply again:
terraform applyThis will recreate the ECS service and load balancer, restoring full access to Metabase with all data intact.
Note: This cost optimization strategy is also available as a manual job in the GitLab CI pipeline under the destroy-service job.
The infrastructure is organized into logical Terraform files by resource type:
terraform.tf: Provider configuration, backend configuration, and required versionslocals.tf: Local variables including environment naming conventionsvariables.tf: Input variablesdata.tf: Data sources (VPC, subnets, existing resources, database snapshots)outputs.tf: Output values (load balancer IP)
| Resource | Purpose |
|---|---|
ecr.tf |
ECR repository for the Metabase Docker image |
upload_image_to_ecr.tf |
Null resource to trigger image upload script |
upload_image_to_ecr.sh |
Bash script to pull and push the Metabase image to ECR |
ecs_cluster.tf |
ECS Fargate cluster |
ecs_task_definition.tf |
ECS task definition for Metabase container |
ecs_service.tf |
ECS service with desired count and load balancer config |
rds_aurora.tf |
Aurora PostgreSQL Serverless v2 cluster and instance |
load_balancer.tf |
Application Load Balancer, listeners, and target group |
s3.tf |
S3 bucket for ALB logs and Athena query results |
athena_workgroup.tf |
Athena workgroup for Metabase queries |
cloudwatch_log_group.tf |
CloudWatch log group for ECS logs |
| Resource | Purpose |
|---|---|
iam_role_ecs_execution.tf |
Execution role for ECS tasks (pull images, write logs, access SSM) |
iam_role_ecs_service.tf |
Task role for ECS service (application-level permissions) |
Permissions include:
- ECR image pull
- CloudWatch Logs write
- SSM Parameter Store read (for database password)
- S3 read/write access to Metabase bucket
- Glue catalog read access (default database and catalog)
- Athena query execution
| Resource | Purpose |
|---|---|
security_group_load_balancer.tf |
ALB ingress from whitelisted CIDRs, unrestricted egress |
security_group_ecs_service.tf |
ECS ingress from public subnets, unrestricted egress |
security_group_rds.tf |
RDS ingress from public subnets, unrestricted egress |
The Aurora RDS cluster is configured with automatic snapshot management:
When the database cluster is destroyed (via terraform destroy), a final snapshot is automatically created with the naming format:
{project_name}-metabase-{stage_name}-YYYYMMDDhhmmss
Example: poc-metabase-prod-20240315143022
This is configured in rds_aurora.tf:
skip_final_snapshot = false
final_snapshot_identifier = "${replace(local.environment_name, "_", "-")}-${formatdate("YYYYMMDDhhmmss", timestamp())}"When creating a new database cluster, Terraform can restore from the most recent snapshot. This is controlled by:
-
Data source in
data.tf: Retrieves the latest snapshot for the cluster identifier:data "aws_db_cluster_snapshot" "latest" { most_recent = true db_cluster_identifier = replace(local.environment_name, "_", "-") }
-
Snapshot identifier in
rds_aurora.tf: References the snapshot during cluster creation:snapshot_identifier = data.aws_db_cluster_snapshot.latest.id
- Comment out the
data "aws_db_cluster_snapshot" "latest"block indata.tf - Comment out the
snapshot_identifierline inrds_aurora.tf
After the first deployment and subsequent destruction, a snapshot will exist, and these lines can be uncommented for future deployments.
Resources follow the organizational standard:
{project_name}_{domain_name}_{stage_name}_resource_name
- project_name: Defined in
terraform.tfvars(e.g.,poc) - domain_name: Fixed as
metabaseinlocals.tf - stage_name: Derived from the active Terraform workspace
The .gitlab-ci.yml pipeline includes the following stages:
- init: Terraform initialization
- format: Terraform formatting validation
- security: Security scanning (inherited from shared templates)
- deploy: Terraform apply
- destroy: Manual terraform destroy (for cleanup)
- mirror_to_github: Mirror repository to GitHub
Additional manual jobs:
- destroy-service: Destroys only the ECS service and load balancer to reduce costs while preserving data
Key pipeline variables:
PROJECT_NAME:pocDOMAIN_NAME:metabaseSTAGE_NAME: Derived from Git branch name ($CI_COMMIT_REF_SLUG)
Backend configuration is provided via CI/CD environment variables:
TERRAFORM_BACKEND_BUCKETTERRAFORM_BACKEND_DYNAMODB
For local execution:
- Ensure AWS credentials are configured (via
aws configureor environment variables) - Initialize Terraform with backend configuration
- Select the appropriate workspace to set the environment (dev, prod, etc.)
- Run
terraform apply
Important: Terraform uses the default AWS credentials configured locally. Verify that the credentials correspond to the intended AWS account before applying changes.
To tear down the entire infrastructure:
cd iac
terraform destroyA final database snapshot will be created automatically with a timestamp-based identifier, allowing you to restore the database state later.
To destroy only the ECS service and ALB (for cost savings while preserving data):
terraform destroy -target aws_ecs_service.main -target aws_lb.main --auto-approve| Variable | Type | Description | Required | Default |
|---|---|---|---|---|
project_name |
string | Project identifier for resource naming and cost allocation | Yes | - |
git_repository |
string | Git repository URL (used for tagging resources) | Yes | - |
cidr_list_to_whitelist |
string | Comma-separated list of CIDR blocks allowed to access ALB | Yes | - |
role_to_assume_arn |
string | IAM role ARN to assume for resource creation | No | "" |
| Variable | Description | Value |
|---|---|---|
domain_name |
Fixed domain name for the project | metabase |
stage_name |
Environment name (from Terraform workspace) | terraform.workspace |
environment_name |
Full resource naming prefix | {project_name}_{domain_name}_{stage_name} |
metabase_docker_image_tag |
Metabase Docker image version | v0.54.x |
The environment (stage_name) is determined differently based on execution context:
- GitLab CI: Derived from the Git branch name via
$CI_COMMIT_REF_SLUG - Local execution: Derived from the active Terraform workspace
To manage environments locally:
# List workspaces
terraform workspace list
# Create a new environment
terraform workspace new staging
# Switch environments
terraform workspace select prodThe ECS task definition configures the following environment variables for Metabase:
| Variable | Value | Description |
|---|---|---|
AWS_REGION |
eu-west-1 |
AWS region |
MB_DB_TYPE |
postgres |
Database type |
MB_DB_DBNAME |
metabase |
Database name |
MB_DB_PORT |
5432 |
PostgreSQL port |
MB_DB_HOST |
Aurora cluster endpoint | Database host |
MB_DB_USER |
root |
Database username |
MB_DB_PASS |
Retrieved from SSM Parameter Store (secure) | Database password |
All resources are tagged with the following cost allocation tags:
Appli:{project_name}Component:metabaseEnv:{stage_name}git_repository:{git_repository_url}
These tags enable cost tracking and FinOps reporting in AWS Cost Explorer.
The Athena workgroup is configured to:
- Output query results to
s3://{project_name}-metabase-{stage_name}/athena_results/ - Encrypt results using SSE-S3
- Publish CloudWatch metrics
iac/
├── terraform.tf # Provider and backend configuration
├── variables.tf # Input variable definitions
├── locals.tf # Local variables and naming conventions
├── data.tf # Data sources (VPC, subnets, snapshots)
├── outputs.tf # Output values
├── ecr.tf # ECR repository for Metabase image
├── upload_image_to_ecr.tf # Null resource to trigger image upload
├── upload_image_to_ecr.sh # Bash script to pull/push Docker image
├── ecs_cluster.tf # ECS Fargate cluster
├── ecs_task_definition.tf # ECS task definition
├── ecs_service.tf # ECS service configuration
├── rds_aurora.tf # Aurora PostgreSQL database with snapshot config
├── load_balancer.tf # Application Load Balancer
├── s3.tf # S3 bucket for logs and Athena results
├── athena_workgroup.tf # Athena workgroup
├── cloudwatch_log_group.tf # CloudWatch log group
├── iam_role_ecs_execution.tf # IAM execution role for ECS tasks
├── iam_role_ecs_service.tf # IAM task role for ECS service
├── security_group_ecs_service.tf # Security group for ECS tasks
├── security_group_load_balancer.tf # Security group for ALB
└── security_group_rds.tf # Security group for RDS
.gitlab-ci.yml # GitLab CI pipeline definition
.releaserc.json # Semantic release configuration for versioning
The .gitlab-ci.yml references shared CI/CD templates from the GitLab project erwan.simon/devops-platform-ci-templates (v2.0.3), which are not available in GitHub mirrors.
-
Existing Network Infrastructure: The deployment assumes an existing VPC with specific naming conventions and subnet tags. The VPC must be named
{project_name}_network_platform_prodand contain subnets tagged withTier: Public,Tier: Private, andTier: Intra. -
NAT Gateway: A NAT Gateway must be deployed and available for outbound internet access from private subnets.
-
Metabase Version: The Metabase version is hard-coded to
v0.54.xinlocals.tf. To upgrade, modify themetabase_docker_image_tagvariable and re-runterraform apply. -
No HTTPS: The load balancer is configured for HTTP only (ports 80 and 3000). HTTPS/TLS termination is not configured.
-
Database Snapshot Management:
- A snapshot is automatically created when the database is destroyed
- On first deployment (when no snapshot exists), the snapshot data source and restoration configuration must be commented out
- Subsequent deployments will restore from the most recent snapshot
-
AWS Region: The infrastructure is deployed in
eu-west-1(Ireland) as specified in the Terraform backend configuration. -
GitLab as Source of Truth: This repository is mirrored from GitLab to GitHub. GitLab is the authoritative source, and GitHub should be treated as read-only.
-
No Auto-Scaling: The ECS service is configured with a fixed desired count of 1 task. Auto-scaling is not configured.
-
No Multi-AZ Redundancy: While Aurora Serverless v2 supports multi-AZ, the ECS service has only 1 task, creating a single point of failure.
-
No Custom Domain: The deployment does not configure a custom domain name or Route53 DNS records. Users must access Metabase via the ALB's public IP address.
-
IP Whitelisting Required: Access to Metabase is restricted to CIDR blocks specified in
cidr_list_to_whitelist. Users outside these ranges cannot access the application. -
Manual Image Updates: Updating the Metabase version requires modifying the Terraform configuration and re-running
terraform apply. Automatic updates are not configured. -
S3 Log Retention: ALB access logs are retained for 31 days and then automatically deleted.
-
Serverless Aurora Scaling: The Aurora cluster can scale down to 0 ACUs, which may result in cold start delays during periods of inactivity.
-
Snapshot Restoration Requires Manual Configuration: On first deployment of a new environment, the snapshot restoration configuration must be manually commented out to avoid Terraform errors.
-
Shared CI/CD Templates Not Available: The GitLab CI pipeline uses shared templates from a private GitLab project (
erwan.simon/devops-platform-ci-templates) that are not available in the GitHub mirror. GitHub Actions are not implemented. -
Local Terraform Execution: When running Terraform locally, it uses the default AWS credentials configured on the machine. There is no automated credential switching or environment validation.