Skip to content

Automate CloudFlare DNS management for workshop infrastructure #9

@blink-so

Description

@blink-so

Problem

DNS management for ai.coder.com and proxy domains currently requires manual Slack requests to #help-me-ops for CloudFlare changes. This creates friction during workshops and incident response, and is a potential single point of failure.

Context

Domains Managed in CloudFlare:

  • ai.coder.com + *.ai.coder.com → us-east-2 NLB (Control Plane)
  • oregon-proxy.ai.coder.com + *.oregon-proxy.ai.coder.com → us-west-2 NLB
  • emea-proxy.ai.coder.com + *.emea-proxy.ai.coder.com → eu-west-2 NLB

Current Process:

  1. Request DNS change in #help-me-ops Slack channel
  2. Wait for ops team response
  3. Manual CloudFlare console changes
  4. Verify propagation

Pain Points:

  • Manual process doesn't scale during incidents
  • No self-service for infrastructure team
  • Dependency on ops team availability
  • No automated validation of DNS configuration

Requirements

Terraform/IaC Management

  • Migrate CloudFlare DNS records to Terraform
  • Use CloudFlare Terraform provider
  • Store state in shared backend (S3)
  • Document DNS change process via Infrastructure as Code
  • Enable self-service DNS changes via PR/approval workflow

Automated Validation

  • Add DNS validation to pre-workshop checklist
  • Implement automated tests:
    # Verify all 6 domains resolve correctly
    dig ai.coder.com
    dig oregon-proxy.ai.coder.com  
    dig emea-proxy.ai.coder.com
    # Verify wildcard subdomains
    dig test.ai.coder.com
    dig test.oregon-proxy.ai.coder.com
    dig test.emea-proxy.ai.coder.com
  • Run validation as part of CI/CD pipeline
  • Alert on DNS misconfiguration

Documentation

Self-Service Workflow

  • Grant infrastructure team CloudFlare API access (scoped to ai.coder.com zone)
  • Implement PR-based approval workflow:
    1. Create Terraform change in PR
    2. Automated validation/plan
    3. Team review
    4. Apply after approval
  • Set up notifications for DNS changes (Slack, email)

Monitoring & Alerting

Success Criteria

  • DNS changes can be made via Terraform without manual Slack requests
  • Infrastructure team has self-service access to CloudFlare DNS
  • DNS configuration validated automatically before and after workshops
  • DNS issues detected and alerted before user impact
  • Zero workshop delays due to DNS misconfigurations

Implementation Notes

CloudFlare Terraform Example:

resource "cloudflare_record" "ai_coder_com" {
  zone_id = var.cloudflare_zone_id
  name    = "ai"
  value   = aws_lb.coder_nlb_us_east_2.dns_name
  type    = "CNAME"
  ttl     = 300
  proxied = false
}

resource "cloudflare_record" "ai_coder_com_wildcard" {
  zone_id = var.cloudflare_zone_id
  name    = "*.ai"
  value   = aws_lb.coder_nlb_us_east_2.dns_name
  type    = "CNAME"
  ttl     = 300
  proxied = false
}

resource "cloudflare_record" "oregon_proxy" {
  zone_id = var.cloudflare_zone_id
  name    = "oregon-proxy.ai"
  value   = aws_lb.coder_nlb_us_west_2.dns_name
  type    = "CNAME"
  ttl     = 300
  proxied = false
}

# ... additional records for London proxy and wildcards

CloudFlare API Scoping:

  • Use API token (not Global API Key)
  • Scope to ai.coder.com zone only
  • Grant DNS edit permissions only
  • Rotate token periodically

Security Considerations

  • CloudFlare API token stored in secure secret management (AWS Secrets Manager, HashiCorp Vault)
  • API token scoped to minimum required permissions
  • Audit log for all DNS changes
  • Require PR approval for production DNS changes

Future Domains

This infrastructure should support upcoming domains:

  • coderdemo.io - SE official demo environment
  • devcoder.io - CS / Engineering collaboration environment

Related

Sept 30 Workshop Postmortem
#2 (Image management and subdomain routing)
#4 (Pre-workshop validation checklist)
#6 (Monitoring and alerting)
Incident Runbook - Subdomain Routing Failures
Pre-Workshop Checklist - CloudFlare DNS Verification

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions