Skip to content

erwan-simon/metabase-on-aws

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Metabase on AWS

I. Project Overview

This project deploys Metabase on AWS using Terraform. Metabase is an open-source business intelligence and analytics platform that allows users to query and visualize data.

The infrastructure provisions a fully managed, containerized Metabase instance running on AWS ECS Fargate with an Aurora PostgreSQL database as the application backend. The deployment includes integration with AWS Athena and Glue to enable querying data sources.

Intended users: Data analysts, data engineers, and business users who need to explore and visualize organizational data stored in AWS.

II. Architecture / Design

The solution uses a containerized microservices architecture deployed on AWS:

Core Components

  1. Compute Layer (ECS Fargate)

    • Metabase runs as a Docker container on AWS ECS Fargate
    • Container image is pulled from Docker Hub (metabase/metabase:v0.54.x) and pushed to a private ECR repository
    • ECS task definition configures Metabase with 512 CPU units and 2048 MB memory
    • Health checks monitor the /api/health endpoint
  2. Database Layer (Aurora PostgreSQL)

    • Aurora Serverless v2 cluster for Metabase application metadata
    • PostgreSQL engine version 14.12
    • Auto-scaling between 0.0 and 1 ACU (Aurora Capacity Unit)
    • Database credentials stored in AWS Systems Manager Parameter Store (SSM)
    • Snapshot Management: On database destruction, a snapshot is automatically created with a timestamp-based identifier
    • Snapshot Restoration: The database can be restored from the most recent snapshot on creation
  3. Load Balancer

    • Application Load Balancer (ALB) exposes Metabase on HTTP ports 80 and 3000
    • Access is restricted to whitelisted CIDR blocks
    • Access logs are stored in S3 with a 31-day retention policy
  4. Data Integration

    • AWS Athena workgroup for querying data sources
    • IAM permissions to access AWS Glue catalog for data discovery
    • Metabase can connect to data sources via Athena
  5. Networking

    • Deployed in an existing VPC with public, private, and intra subnets
    • Security groups control traffic between ALB, ECS, and RDS
    • NAT Gateway required for outbound internet access from private subnets

Component Interactions

  • Load Balancer → ECS Service: Forwards HTTP traffic to the Metabase container
  • ECS Task → Aurora RDS: Metabase stores its metadata in PostgreSQL
  • ECS Task → Athena/Glue: Metabase queries data via Athena
  • ECS Task → S3: Writes Athena query results to S3
  • Terraform → ECR: Automated script pulls the official Metabase image from Docker Hub and pushes it to ECR

III. Prerequisites

Required Tools

  • Terraform: >= 1.5.4
  • AWS CLI: Configured with credentials for the target AWS account
  • Docker: Required for pulling and pushing the Metabase container image
  • Bash: The ECR image upload script requires a Bash-compatible shell

AWS Resources

  • An existing VPC with the naming convention {project_name}_network_platform_prod
  • Subnets tagged with Tier: Public, Tier: Private, and Tier: Intra
  • NAT Gateway deployed and available in the VPC
  • An S3 bucket and DynamoDB table for Terraform state backend (provided at runtime)

AWS Permissions

The AWS credentials or assumed role must have permissions to:

  • Create and manage ECS clusters, services, and task definitions
  • Create and manage RDS Aurora clusters and instances
  • Create and manage Application Load Balancers and target groups
  • Create and manage ECR repositories
  • Create and manage S3 buckets and policies
  • Create and manage IAM roles and policies
  • Create and manage security groups
  • Create and manage CloudWatch log groups
  • Create and manage SSM parameters
  • Create and manage Athena workgroups
  • Access Glue catalog resources
  • Assume roles (if role_to_assume_arn is provided)

Cloud Provider

  • AWS Region: eu-west-1 (Ireland) - specified in the Terraform S3 backend configuration

IV. Installation / Setup

1. Clone the Repository

git clone <repository-url>
cd metabase

2. Configure Variables

Create a terraform.tfvars file in the iac/ directory with the required variables:

project_name             = "your-project-name"
git_repository           = "https://gitlab.com/your-org/metabase"
cidr_list_to_whitelist   = "10.0.0.0/8,172.16.0.0/12"
role_to_assume_arn       = ""  # Optional: IAM role ARN to assume

3. Create Backend Configuration

Create a backend.hcl file in the iac/ directory (not tracked in Git):

bucket         = "your-terraform-state-bucket"
dynamodb_table = "your-terraform-lock-table"

Alternatively, provide backend configuration via environment variables:

export TERRAFORM_BACKEND_BUCKET="your-terraform-state-bucket"
export TERRAFORM_BACKEND_DYNAMODB="your-terraform-lock-table"

4. Initialize Terraform

cd iac
terraform init -backend-config="bucket=$TERRAFORM_BACKEND_BUCKET" -backend-config="dynamodb_table=$TERRAFORM_BACKEND_DYNAMODB"

5. Select or Create Terraform Workspace

The workspace determines the environment name (stage_name):

# Create a new workspace for a new environment
terraform workspace new dev

# Or select an existing workspace
terraform workspace select prod

Important: If no workspace is selected, the default workspace is used, resulting in resources named with stage_name=default.

6. First-Time Deployment (No Existing Snapshot)

⚠️ Important: When deploying the stack for the first time (or when no database snapshot exists), you must temporarily disable the snapshot restoration functionality:

  1. Open iac/data.tf and comment out the data source that retrieves the latest snapshot:
# data "aws_db_cluster_snapshot" "latest" {
#   most_recent           = true
#   db_cluster_identifier = replace(local.environment_name, "_", "-")
# }
  1. Open iac/rds_aurora.tf and comment out the snapshot restoration line:
resource "aws_rds_cluster" "main" {
  # ... other configuration ...
  # snapshot_identifier = data.aws_db_cluster_snapshot.latest.id
  # ... rest of configuration ...
}
  1. Deploy the infrastructure:
terraform plan
terraform apply
  1. After the first successful deployment, you can uncomment these lines for future deployments to enable snapshot restoration.

7. Deploy Infrastructure (Subsequent Deployments)

Once a snapshot exists (after at least one successful deployment and destruction), you can deploy with snapshot restoration enabled:

terraform plan
terraform apply

The deployment process will:

  1. Create the ECR repository
  2. Pull the Metabase Docker image from Docker Hub and push it to ECR
  3. Create the Aurora PostgreSQL database (restoring from the most recent snapshot if available)
  4. Create the ECS cluster, task definition, and service
  5. Create the Application Load Balancer
  6. Configure security groups, IAM roles, and other supporting resources

V. Usage

Accessing Metabase

Once deployed, retrieve the load balancer's public IP address:

cd iac
terraform output load_balancer_ip_address

Access Metabase in your browser:

http://<load_balancer_ip>:80
# or
http://<load_balancer_ip>:3000

Note: Access is restricted to IP addresses in the whitelisted CIDR ranges specified in cidr_list_to_whitelist.

Initial Metabase Setup

On first access, Metabase will prompt you to:

  1. Create an admin account
  2. Configure data sources (e.g., PostgreSQL, MySQL, Athena, etc.)

Connecting Metabase to Athena

Metabase is pre-configured with IAM permissions to query Athena. To add an Athena data source:

  1. In Metabase, go to SettingsAdminDatabases
  2. Add a new database and select Amazon Athena
  3. Configure:
    • AWS Region: eu-west-1
    • Workgroup: {project_name}_metabase_{stage_name}
    • S3 Output Location: Managed automatically by the Athena workgroup
    • Authentication: Use IAM role-based authentication (no credentials needed)

Monitoring and Logs

Application logs are available in CloudWatch:

aws logs tail {project_name}_metabase_{stage_name} --follow

ECS Exec (SSH-like access to container)

To troubleshoot or inspect the running container:

aws ecs execute-command \
  --cluster {project_name}_metabase_{stage_name} \
  --task <task-id> \
  --container {project_name}_metabase_{stage_name} \
  --interactive \
  --command "/bin/sh"

Cost Optimization: Pausing the Stack

When the Metabase instance is not actively used, you can significantly reduce AWS costs by destroying the ECS service and Application Load Balancer while preserving the database and its data:

cd iac
terraform destroy -target aws_ecs_service.main -target aws_lb.main --auto-approve

This command:

  • Destroys the ECS service (stopping the Metabase container)
  • Destroys the Application Load Balancer
  • Preserves the Aurora database, ECR repository, S3 bucket, and all other resources
  • Reduces costs by eliminating ECS Fargate compute charges and ALB hourly charges

To resume the service, simply run terraform apply again:

terraform apply

This will recreate the ECS service and load balancer, restoring full access to Metabase with all data intact.

Note: This cost optimization strategy is also available as a manual job in the GitLab CI pipeline under the destroy-service job.

VI. Infrastructure

Terraform Structure

The infrastructure is organized into logical Terraform files by resource type:

  • terraform.tf: Provider configuration, backend configuration, and required versions
  • locals.tf: Local variables including environment naming conventions
  • variables.tf: Input variables
  • data.tf: Data sources (VPC, subnets, existing resources, database snapshots)
  • outputs.tf: Output values (load balancer IP)

Core Resources

Resource Purpose
ecr.tf ECR repository for the Metabase Docker image
upload_image_to_ecr.tf Null resource to trigger image upload script
upload_image_to_ecr.sh Bash script to pull and push the Metabase image to ECR
ecs_cluster.tf ECS Fargate cluster
ecs_task_definition.tf ECS task definition for Metabase container
ecs_service.tf ECS service with desired count and load balancer config
rds_aurora.tf Aurora PostgreSQL Serverless v2 cluster and instance
load_balancer.tf Application Load Balancer, listeners, and target group
s3.tf S3 bucket for ALB logs and Athena query results
athena_workgroup.tf Athena workgroup for Metabase queries
cloudwatch_log_group.tf CloudWatch log group for ECS logs

IAM Roles and Policies

Resource Purpose
iam_role_ecs_execution.tf Execution role for ECS tasks (pull images, write logs, access SSM)
iam_role_ecs_service.tf Task role for ECS service (application-level permissions)

Permissions include:

  • ECR image pull
  • CloudWatch Logs write
  • SSM Parameter Store read (for database password)
  • S3 read/write access to Metabase bucket
  • Glue catalog read access (default database and catalog)
  • Athena query execution

Security Groups

Resource Purpose
security_group_load_balancer.tf ALB ingress from whitelisted CIDRs, unrestricted egress
security_group_ecs_service.tf ECS ingress from public subnets, unrestricted egress
security_group_rds.tf RDS ingress from public subnets, unrestricted egress

Database Snapshot Management

The Aurora RDS cluster is configured with automatic snapshot management:

Snapshot Creation on Destruction

When the database cluster is destroyed (via terraform destroy), a final snapshot is automatically created with the naming format:

{project_name}-metabase-{stage_name}-YYYYMMDDhhmmss

Example: poc-metabase-prod-20240315143022

This is configured in rds_aurora.tf:

skip_final_snapshot       = false
final_snapshot_identifier = "${replace(local.environment_name, "_", "-")}-${formatdate("YYYYMMDDhhmmss", timestamp())}"

Snapshot Restoration on Creation

When creating a new database cluster, Terraform can restore from the most recent snapshot. This is controlled by:

  1. Data source in data.tf: Retrieves the latest snapshot for the cluster identifier:

    data "aws_db_cluster_snapshot" "latest" {
      most_recent           = true
      db_cluster_identifier = replace(local.environment_name, "_", "-")
    }
  2. Snapshot identifier in rds_aurora.tf: References the snapshot during cluster creation:

    snapshot_identifier = data.aws_db_cluster_snapshot.latest.id

First-Time Deployment Consideration

⚠️ Important: On the very first deployment of a new environment (when no snapshot exists yet), you must:

  1. Comment out the data "aws_db_cluster_snapshot" "latest" block in data.tf
  2. Comment out the snapshot_identifier line in rds_aurora.tf

After the first deployment and subsequent destruction, a snapshot will exist, and these lines can be uncommented for future deployments.

AWS Resource Naming Convention

Resources follow the organizational standard:

{project_name}_{domain_name}_{stage_name}_resource_name
  • project_name: Defined in terraform.tfvars (e.g., poc)
  • domain_name: Fixed as metabase in locals.tf
  • stage_name: Derived from the active Terraform workspace

Deployment Workflow

GitLab CI (Automated)

The .gitlab-ci.yml pipeline includes the following stages:

  1. init: Terraform initialization
  2. format: Terraform formatting validation
  3. security: Security scanning (inherited from shared templates)
  4. deploy: Terraform apply
  5. destroy: Manual terraform destroy (for cleanup)
  6. mirror_to_github: Mirror repository to GitHub

Additional manual jobs:

  • destroy-service: Destroys only the ECS service and load balancer to reduce costs while preserving data

Key pipeline variables:

  • PROJECT_NAME: poc
  • DOMAIN_NAME: metabase
  • STAGE_NAME: Derived from Git branch name ($CI_COMMIT_REF_SLUG)

Backend configuration is provided via CI/CD environment variables:

  • TERRAFORM_BACKEND_BUCKET
  • TERRAFORM_BACKEND_DYNAMODB

Local Execution

For local execution:

  1. Ensure AWS credentials are configured (via aws configure or environment variables)
  2. Initialize Terraform with backend configuration
  3. Select the appropriate workspace to set the environment (dev, prod, etc.)
  4. Run terraform apply

Important: Terraform uses the default AWS credentials configured locally. Verify that the credentials correspond to the intended AWS account before applying changes.

Destroying Infrastructure

To tear down the entire infrastructure:

cd iac
terraform destroy

A final database snapshot will be created automatically with a timestamp-based identifier, allowing you to restore the database state later.

To destroy only the ECS service and ALB (for cost savings while preserving data):

terraform destroy -target aws_ecs_service.main -target aws_lb.main --auto-approve

VII. Configuration

Terraform Variables

Variable Type Description Required Default
project_name string Project identifier for resource naming and cost allocation Yes -
git_repository string Git repository URL (used for tagging resources) Yes -
cidr_list_to_whitelist string Comma-separated list of CIDR blocks allowed to access ALB Yes -
role_to_assume_arn string IAM role ARN to assume for resource creation No ""

Terraform Local Variables (in locals.tf)

Variable Description Value
domain_name Fixed domain name for the project metabase
stage_name Environment name (from Terraform workspace) terraform.workspace
environment_name Full resource naming prefix {project_name}_{domain_name}_{stage_name}
metabase_docker_image_tag Metabase Docker image version v0.54.x

Environment Configuration

The environment (stage_name) is determined differently based on execution context:

  • GitLab CI: Derived from the Git branch name via $CI_COMMIT_REF_SLUG
  • Local execution: Derived from the active Terraform workspace

To manage environments locally:

# List workspaces
terraform workspace list

# Create a new environment
terraform workspace new staging

# Switch environments
terraform workspace select prod

Metabase Container Environment Variables

The ECS task definition configures the following environment variables for Metabase:

Variable Value Description
AWS_REGION eu-west-1 AWS region
MB_DB_TYPE postgres Database type
MB_DB_DBNAME metabase Database name
MB_DB_PORT 5432 PostgreSQL port
MB_DB_HOST Aurora cluster endpoint Database host
MB_DB_USER root Database username
MB_DB_PASS Retrieved from SSM Parameter Store (secure) Database password

AWS Tagging

All resources are tagged with the following cost allocation tags:

  • Appli: {project_name}
  • Component: metabase
  • Env: {stage_name}
  • git_repository: {git_repository_url}

These tags enable cost tracking and FinOps reporting in AWS Cost Explorer.

Athena Workgroup Configuration

The Athena workgroup is configured to:

  • Output query results to s3://{project_name}-metabase-{stage_name}/athena_results/
  • Encrypt results using SSE-S3
  • Publish CloudWatch metrics

VIII. Project Structure

A. Infrastructure (Terraform)

iac/
├── terraform.tf                       # Provider and backend configuration
├── variables.tf                       # Input variable definitions
├── locals.tf                          # Local variables and naming conventions
├── data.tf                            # Data sources (VPC, subnets, snapshots)
├── outputs.tf                         # Output values
├── ecr.tf                             # ECR repository for Metabase image
├── upload_image_to_ecr.tf             # Null resource to trigger image upload
├── upload_image_to_ecr.sh             # Bash script to pull/push Docker image
├── ecs_cluster.tf                     # ECS Fargate cluster
├── ecs_task_definition.tf             # ECS task definition
├── ecs_service.tf                     # ECS service configuration
├── rds_aurora.tf                      # Aurora PostgreSQL database with snapshot config
├── load_balancer.tf                   # Application Load Balancer
├── s3.tf                              # S3 bucket for logs and Athena results
├── athena_workgroup.tf                # Athena workgroup
├── cloudwatch_log_group.tf            # CloudWatch log group
├── iam_role_ecs_execution.tf          # IAM execution role for ECS tasks
├── iam_role_ecs_service.tf            # IAM task role for ECS service
├── security_group_ecs_service.tf      # Security group for ECS tasks
├── security_group_load_balancer.tf    # Security group for ALB
└── security_group_rds.tf              # Security group for RDS

B. CI/CD Configuration

.gitlab-ci.yml           # GitLab CI pipeline definition
.releaserc.json          # Semantic release configuration for versioning

The .gitlab-ci.yml references shared CI/CD templates from the GitLab project erwan.simon/devops-platform-ci-templates (v2.0.3), which are not available in GitHub mirrors.

IX. Limitations / Assumptions

Assumptions

  1. Existing Network Infrastructure: The deployment assumes an existing VPC with specific naming conventions and subnet tags. The VPC must be named {project_name}_network_platform_prod and contain subnets tagged with Tier: Public, Tier: Private, and Tier: Intra.

  2. NAT Gateway: A NAT Gateway must be deployed and available for outbound internet access from private subnets.

  3. Metabase Version: The Metabase version is hard-coded to v0.54.x in locals.tf. To upgrade, modify the metabase_docker_image_tag variable and re-run terraform apply.

  4. No HTTPS: The load balancer is configured for HTTP only (ports 80 and 3000). HTTPS/TLS termination is not configured.

  5. Database Snapshot Management:

    • A snapshot is automatically created when the database is destroyed
    • On first deployment (when no snapshot exists), the snapshot data source and restoration configuration must be commented out
    • Subsequent deployments will restore from the most recent snapshot
  6. AWS Region: The infrastructure is deployed in eu-west-1 (Ireland) as specified in the Terraform backend configuration.

  7. GitLab as Source of Truth: This repository is mirrored from GitLab to GitHub. GitLab is the authoritative source, and GitHub should be treated as read-only.

Limitations

  1. No Auto-Scaling: The ECS service is configured with a fixed desired count of 1 task. Auto-scaling is not configured.

  2. No Multi-AZ Redundancy: While Aurora Serverless v2 supports multi-AZ, the ECS service has only 1 task, creating a single point of failure.

  3. No Custom Domain: The deployment does not configure a custom domain name or Route53 DNS records. Users must access Metabase via the ALB's public IP address.

  4. IP Whitelisting Required: Access to Metabase is restricted to CIDR blocks specified in cidr_list_to_whitelist. Users outside these ranges cannot access the application.

  5. Manual Image Updates: Updating the Metabase version requires modifying the Terraform configuration and re-running terraform apply. Automatic updates are not configured.

  6. S3 Log Retention: ALB access logs are retained for 31 days and then automatically deleted.

  7. Serverless Aurora Scaling: The Aurora cluster can scale down to 0 ACUs, which may result in cold start delays during periods of inactivity.

  8. Snapshot Restoration Requires Manual Configuration: On first deployment of a new environment, the snapshot restoration configuration must be manually commented out to avoid Terraform errors.

  9. Shared CI/CD Templates Not Available: The GitLab CI pipeline uses shared templates from a private GitLab project (erwan.simon/devops-platform-ci-templates) that are not available in the GitHub mirror. GitHub Actions are not implemented.

  10. Local Terraform Execution: When running Terraform locally, it uses the default AWS credentials configured on the machine. There is no automated credential switching or environment validation.

Releases

No releases published

Packages

 
 
 

Contributors