diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/README.md b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/README.md new file mode 100644 index 0000000..6213e24 --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/README.md @@ -0,0 +1,160 @@ +# Databricks Workspace on AWS with **Back-end PrivateLink** + **CMK** (Terraform) + +This template provisions: +- AWS VPC (2 subnets across AZs), security groups, route tables **OR use existing VPC/subnets** +- VPC **Interface Endpoints** for Databricks **Workspace (REST APIs)** and **Secure Cluster Connectivity (SCC) relay)** — back‑end PrivateLink +- Optional VPC endpoints for S3 (Gateway), STS and Kinesis (Interface) +- AWS KMS Customer Managed Key (CMK) **OR use existing CMK** +- S3 root bucket for workspace storage +- Cross‑account IAM role for Databricks to access your AWS account +- Databricks **VPC endpoint registrations**, **Network configuration**, **Private Access Settings (PAS)**, **Customer‑managed key (CMK)**, and the **Workspace** + +> **Notes** +> - Back‑end PrivateLink requires a **customer‑managed VPC** & **Secure Cluster Connectivity** (SCC). See Databricks docs. +> - You must supply your region's **VPC endpoint service names** for the Databricks **workspace** and **SCC relay** (var.pl_service_names). See the table in Databricks docs. +> - CMK requires Enterprise tier and KMS key policy updates; the Databricks AWS account id is `414351767826` (commercial). + +## 🆕 Flexible Infrastructure Options + +This template now supports **two deployment modes**: + +### Option 1: Create New Resources (Default) +Terraform will create new VPC, subnets, security groups, VPC endpoints, and CMK. + +### Option 2: Use Existing Resources +Bring your own VPC, subnets, and/or CMK. Terraform will only create the necessary VPC endpoints and Databricks configurations. + +## Deployment Options + +This template can be deployed in two ways: + +1. **Local Deployment**: Run Terraform from your local machine (see Quick Start below) +2. **GitHub Actions (CI/CD)**: Automated deployment via GitHub Actions - see [GITHUB_ACTIONS.md](GITHUB_ACTIONS.md) for complete setup guide + +> 💡 **GitHub Actions Support**: The IAM role self-assuming configuration works seamlessly in GitHub Actions because runners come with AWS CLI pre-installed and properly configured. See the [GitHub Actions guide](GITHUB_ACTIONS.md) for details. + +## Quick Start + +### Using NEW Resources (Default) + +1. Install Terraform >= 1.5 and configure AWS credentials (e.g., `AWS_PROFILE`, `AWS_REGION`). + +2. Configure Databricks authentication using Service Principal: + ```bash + export DATABRICKS_CLIENT_ID="" + export DATABRICKS_CLIENT_SECRET="" + ``` + +3. Copy the example configuration: + ```bash + cp terraform.tfvars.example-new-resources terraform.tfvars + ``` + +4. Edit `terraform.tfvars` and update: + - `project` - Your project name + - `region` - AWS region + - `vpc_cidr` and `private_subnet_cidrs` - Network configuration + - `databricks_account_id` - Your Databricks account ID + - `databricks_client_id` and `databricks_client_secret` - Service principal credentials + - `databricks_crossaccount_role_external_id` - From Databricks console + - `pl_service_names` - PrivateLink service names for your region + - `root_bucket_name` - S3 bucket name for workspace storage + +5. Initialize & apply: + ```bash + terraform init + terraform plan + terraform apply + ``` + +### Using EXISTING Resources + +1. Follow steps 1-2 from above. + +2. Copy the existing resources example: + ```bash + cp terraform.tfvars.example-existing-resources terraform.tfvars + ``` + +3. Edit `terraform.tfvars` and configure: + - Set `create_new_vpc = false` + - Set `create_new_cmk = false` + - Provide `existing_vpc_id` - Your VPC ID + - Provide `existing_subnet_ids` - List of subnet IDs (at least 2, in different AZs) + - Provide `existing_cmk_arn` - ARN of your KMS key + - Optionally provide `existing_security_group_id` (if not provided, one will be created) + - Update other Databricks configuration values + +4. Initialize & apply: + ```bash + terraform init + terraform plan + terraform apply + ``` + +### Important Notes for Existing Resources + +**VPC Requirements:** +- DNS support and DNS hostnames must be enabled +- Subnets must be in different Availability Zones +- Subnets should have appropriate route tables configured + +**CMK Requirements:** +- CMK must have a key policy that allows: + - Your AWS account root to manage the key + - Databricks control plane (414351767826) to use the key + - Your cross-account IAM role to create grants +- See `modules/aws-cmk/main.tf` for the required policy structure +- The CMK must be in the same region as your workspace + +**What Will Be Created (even with existing resources):** +- VPC endpoints for Databricks workspace and SCC (required for PrivateLink) +- Security groups for VPC endpoints +- Databricks workspace and related configurations + +After workspace creation reaches **RUNNING**, wait ~20 minutes before starting clusters (per Databricks guidance). + +## Unity Catalog + +Unity Catalog is automatically configured with: +- ✅ Metastore with S3 storage +- ✅ IAM role +- ✅ Storage credential +- ⚠️ External location (requires manual step) + +### External Locations (Optional) + +External locations require the Unity Catalog IAM role to have self-assuming capability. Due to AWS limitations (circular dependency), this must be done manually after the initial deployment: + +1. After `terraform apply` completes, go to AWS IAM Console +2. Find the Unity Catalog role (name in outputs: `unity_catalog_role_name`) +3. Edit the "Trust relationships" +4. Add this statement to the trust policy: +```json +{ + "Effect": "Allow", + "Principal": { + "AWS": "" + }, + "Action": "sts:AssumeRole", + "Condition": { + "StringEquals": { + "sts:ExternalId": "" + } + } +} +``` +5. Uncomment the `databricks_external_location` resource in `main.tf` +6. Run `terraform apply` again + +The role ARN is available in the Terraform outputs. + +## Clean up +```bash +terraform destroy +``` + +## References +- Back‑end PrivateLink steps & ports. Databricks docs. +- Private Access Settings (PAS), VPC endpoint registrations & network config. Databricks docs. +- CMK configuration & KMS policy. Databricks docs. diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/main.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/main.tf new file mode 100644 index 0000000..f7bbce2 --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/main.tf @@ -0,0 +1,348 @@ +locals { + name = "${var.project}-${var.region}" + + # Determine which networking resources to use + vpc_id = var.create_new_vpc ? module.network[0].vpc_id : var.existing_vpc_id + subnet_ids = var.create_new_vpc ? module.network[0].subnet_ids : var.existing_subnet_ids + vpce_workspace_id = var.create_new_vpc ? module.network[0].vpce_workspace_id : aws_vpc_endpoint.existing_vpc_workspace[0].id + vpce_scc_id = var.create_new_vpc ? module.network[0].vpce_scc_id : aws_vpc_endpoint.existing_vpc_scc[0].id + + # Determine workspace security group (avoid circular dependency by checking after creation) + workspace_sg_id = ( + var.create_new_vpc ? + module.network[0].workspace_sg_id : + (var.existing_security_group_id != "" ? + var.existing_security_group_id : + aws_security_group.existing_vpc_workspace[0].id + ) + ) + + # Determine which CMK to use + kms_key_arn = var.create_new_cmk ? module.cmk[0].key_arn : var.existing_cmk_arn +} + +# ==================== NETWORK MODULE (Optional) ==================== +module "network" { + count = var.create_new_vpc ? 1 : 0 + source = "./modules/aws-network" + project = var.project + region = var.region + vpc_cidr = var.vpc_cidr + private_subnet_cidrs = var.private_subnet_cidrs + pl_service_names = var.pl_service_names + enable_extra_endpoints = var.enable_extra_endpoints +} + +# ==================== RESOURCES FOR EXISTING VPC ==================== +# If using existing VPC, we still need to create VPC endpoints and optionally security group + +# Security group for existing VPC (only if not provided) +resource "aws_security_group" "existing_vpc_workspace" { + count = !var.create_new_vpc && var.existing_security_group_id == "" ? 1 : 0 + name = "${var.project}-workspace-sg" + description = "Databricks workspace SG" + vpc_id = var.existing_vpc_id + + egress { + from_port = 0 + to_port = 0 + protocol = "-1" + cidr_blocks = ["0.0.0.0/0"] + } + + tags = { Name = "${var.project}-workspace-sg" } +} + +# Security group for VPC endpoints in existing VPC +resource "aws_security_group" "existing_vpc_vpce" { + count = !var.create_new_vpc ? 1 : 0 + name = "${var.project}-vpce-sg" + description = "Security group for VPC endpoints (PL back-end)" + vpc_id = var.existing_vpc_id + + ingress { + from_port = 443 + to_port = 443 + protocol = "tcp" + security_groups = [ + var.existing_security_group_id != "" ? + var.existing_security_group_id : + aws_security_group.existing_vpc_workspace[0].id + ] + } + + ingress { + from_port = 6666 + to_port = 6666 + protocol = "tcp" + security_groups = [ + var.existing_security_group_id != "" ? + var.existing_security_group_id : + aws_security_group.existing_vpc_workspace[0].id + ] + } + + egress { + from_port = 0 + to_port = 0 + protocol = "-1" + cidr_blocks = ["0.0.0.0/0"] + } + + tags = { Name = "${var.project}-vpce-sg" } + + depends_on = [aws_security_group.existing_vpc_workspace] +} + +# VPC Endpoint for Databricks Workspace REST APIs in existing VPC +resource "aws_vpc_endpoint" "existing_vpc_workspace" { + count = !var.create_new_vpc ? 1 : 0 + vpc_id = var.existing_vpc_id + service_name = var.pl_service_names.workspace + vpc_endpoint_type = "Interface" + subnet_ids = [var.existing_subnet_ids[0]] + security_group_ids = [aws_security_group.existing_vpc_vpce[0].id] + private_dns_enabled = true + tags = { Name = "${var.project}-vpce-workspace" } +} + +# VPC Endpoint for SCC Relay in existing VPC +resource "aws_vpc_endpoint" "existing_vpc_scc" { + count = !var.create_new_vpc ? 1 : 0 + vpc_id = var.existing_vpc_id + service_name = var.pl_service_names.scc + vpc_endpoint_type = "Interface" + subnet_ids = [var.existing_subnet_ids[length(var.existing_subnet_ids) > 1 ? 1 : 0]] + security_group_ids = [aws_security_group.existing_vpc_vpce[0].id] + private_dns_enabled = true + tags = { Name = "${var.project}-vpce-scc" } +} + +# ==================== IAM MODULE ==================== +module "iam" { + source = "./modules/aws-iam" + project = var.project + account_id = var.databricks_account_id + external_id = var.databricks_crossaccount_role_external_id +} + +# ==================== CMK MODULE (Optional) ==================== +module "cmk" { + count = var.create_new_cmk ? 1 : 0 + source = "./modules/aws-cmk" + project_name = var.project + cross_account_role_arn = module.iam.cross_account_role_arn +} + +# ==================== STORAGE MODULE ==================== +module "storage" { + source = "./modules/aws-storage" + root_bucket_name = var.root_bucket_name + cross_account_role_arn = module.iam.cross_account_role_arn + kms_key_arn = local.kms_key_arn +} + +# ==================== UNITY CATALOG MODULE ==================== +module "unity_catalog" { + source = "./modules/aws-unity-catalog" + prefix = var.project + region = var.region + databricks_account_id = var.databricks_account_id + cross_account_role_arn = module.iam.cross_account_role_arn + kms_key_arn = local.kms_key_arn +} + +# ==================== DATABRICKS ACCOUNT-LEVEL RESOURCES ==================== +# Register KMS CMK as an encryption key configuration usable for both Managed Services & Storage +resource "databricks_mws_customer_managed_keys" "cmk" { + provider = databricks.mws + account_id = var.databricks_account_id + aws_key_info { + key_arn = local.kms_key_arn + } + use_cases = ["MANAGED_SERVICES", "STORAGE"] + + # Ensure KMS key policy is ready before Databricks validates + depends_on = [module.cmk] +} + +# Register the two VPC endpoints (workspace & SCC relay) with Databricks +resource "databricks_mws_vpc_endpoint" "workspace" { + provider = databricks.mws + account_id = var.databricks_account_id + vpc_endpoint_name = "${local.name}-workspace-vpce" + aws_vpc_endpoint_id = local.vpce_workspace_id + region = var.region +} + +resource "databricks_mws_vpc_endpoint" "scc" { + provider = databricks.mws + account_id = var.databricks_account_id + vpc_endpoint_name = "${local.name}-scc-vpce" + aws_vpc_endpoint_id = local.vpce_scc_id + region = var.region +} + +# Create the customer-managed VPC network configuration with back-end PrivateLink bindings +resource "databricks_mws_networks" "net" { + provider = databricks.mws + account_id = var.databricks_account_id + network_name = "${local.name}-net" + vpc_id = local.vpc_id + subnet_ids = local.subnet_ids + security_group_ids = [local.workspace_sg_id] + + # Back-end PrivateLink association (workspace REST + SCC relay) + vpc_endpoints { + rest_api = [databricks_mws_vpc_endpoint.workspace.vpc_endpoint_id] + dataplane_relay = [databricks_mws_vpc_endpoint.scc.vpc_endpoint_id] + } +} + +# Private Access Settings (PAS) to enforce private connectivity +resource "databricks_mws_private_access_settings" "pas" { + provider = databricks.mws + private_access_settings_name = "${local.name}-pas" + region = var.region + public_access_enabled = true # Set to false after Unity Catalog setup +} + +# Wait for IAM propagation +resource "time_sleep" "wait_for_iam" { + create_duration = "60s" + + depends_on = [module.iam] +} + +# Credentials config (cross-account role Databricks will assume) +resource "databricks_mws_credentials" "creds" { + provider = databricks.mws + credentials_name = "${local.name}-creds" + role_arn = module.iam.cross_account_role_arn + + depends_on = [time_sleep.wait_for_iam] +} + +# Wait for bucket configuration to propagate +resource "time_sleep" "wait_for_bucket" { + create_duration = "60s" + + depends_on = [module.storage, databricks_mws_credentials.creds] +} + +# Storage config (root bucket) +resource "databricks_mws_storage_configurations" "storage" { + provider = databricks.mws + account_id = var.databricks_account_id + storage_configuration_name = "${local.name}-storage" + bucket_name = module.storage.root_bucket + + depends_on = [ + time_sleep.wait_for_bucket + ] +} + +# Wait before workspace creation to ensure all configurations are propagated +resource "time_sleep" "wait_before_workspace" { + create_duration = "30s" + + depends_on = [ + databricks_mws_credentials.creds, + databricks_mws_storage_configurations.storage, + databricks_mws_networks.net + ] +} + +# Finally, create the workspace +resource "databricks_mws_workspaces" "ws" { + provider = databricks.mws + account_id = var.databricks_account_id + workspace_name = "${local.name}-ws" + + aws_region = var.region + credentials_id = databricks_mws_credentials.creds.credentials_id + storage_configuration_id = databricks_mws_storage_configurations.storage.storage_configuration_id + network_id = databricks_mws_networks.net.network_id + + private_access_settings_id = databricks_mws_private_access_settings.pas.private_access_settings_id + + managed_services_customer_managed_key_id = databricks_mws_customer_managed_keys.cmk.customer_managed_key_id + storage_customer_managed_key_id = databricks_mws_customer_managed_keys.cmk.customer_managed_key_id + + depends_on = [time_sleep.wait_before_workspace] +} + +# ---------- Unity Catalog ---------- +# Wait for workspace to be ready +resource "time_sleep" "wait_for_workspace" { + create_duration = "30s" + + depends_on = [databricks_mws_workspaces.ws] +} + +# Create Unity Catalog metastore (account-level) +resource "databricks_metastore" "this" { + provider = databricks.mws + name = "${local.name}-metastore" + region = var.region + storage_root = "s3://${module.unity_catalog.metastore_bucket_name}/" + force_destroy = true + + depends_on = [time_sleep.wait_for_workspace] +} + +# Assign metastore to workspace (account-level) +resource "databricks_metastore_assignment" "this" { + provider = databricks.mws + metastore_id = databricks_metastore.this.id + workspace_id = databricks_mws_workspaces.ws.workspace_id +} + +# Wait for metastore assignment +resource "time_sleep" "wait_for_metastore" { + create_duration = "30s" + + depends_on = [databricks_metastore_assignment.this] +} + +# Create storage credential for Unity Catalog (workspace-level) +resource "databricks_storage_credential" "unity_catalog" { + provider = databricks.workspace + name = "${local.name}-unity-catalog-credential" + + aws_iam_role { + role_arn = module.unity_catalog.unity_catalog_role_arn + } + + depends_on = [time_sleep.wait_for_metastore] +} + +# Wait for IAM and S3 permissions to propagate +# This includes time for the IAM role's self-assuming capability to be ready +resource "time_sleep" "wait_for_storage_credential" { + create_duration = "60s" + + depends_on = [ + databricks_storage_credential.unity_catalog, + module.unity_catalog + ] + + # Ensure trust policy update is complete before external location + triggers = { + trust_policy_updated = module.unity_catalog.trust_policy_updated + } +} + +# Create external location for Unity Catalog +# The IAM role is updated with self-assuming capability before this runs +resource "databricks_external_location" "unity_catalog" { + provider = databricks.workspace + name = "${local.name}-unity-catalog-external" + url = "s3://${module.unity_catalog.metastore_bucket_name}/" + credential_name = databricks_storage_credential.unity_catalog.id + force_destroy = true + + depends_on = [ + time_sleep.wait_for_storage_credential + ] +} diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-cmk/main.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-cmk/main.tf new file mode 100644 index 0000000..9d2350b --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-cmk/main.tf @@ -0,0 +1,95 @@ +# kms.tf (root or a module) + +resource "aws_kms_key" "databricks_cmk" { + description = "CMK for Databricks Managed Services & Workspace Storage" + enable_key_rotation = true + deletion_window_in_days = 7 +} + +resource "aws_kms_alias" "databricks_cmk_alias" { + name = "alias/databricks/${var.project_name}-cmk" + target_key_id = aws_kms_key.databricks_cmk.key_id +} + +# Replace ACCOUNT_ID below with your AWS account id (or use data.aws_caller_identity.current.account_id) +data "aws_caller_identity" "current" {} + +data "aws_iam_policy_document" "databricks_cmk_policy" { + # Full admin for your account root (you can refine to a KMS admin group) + statement { + sid = "AllowAccountAdministrators" + actions = ["kms:*"] + principals { + type = "AWS" + identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"] + } + resources = ["*"] + } + + # Allow Databricks control plane (commercial) to describe and encrypt/decrypt for validation + statement { + sid = "AllowDatabricksControlPlaneDirectUse" + actions = [ + "kms:Decrypt", + "kms:Encrypt", + "kms:GenerateDataKey*", + "kms:DescribeKey", + "kms:ReEncrypt*" + ] + principals { + type = "AWS" + identifiers = ["arn:aws:iam::414351767826:root"] + } + resources = ["*"] + } + + # Allow Databricks control plane to create grants & use the key for AWS resources + statement { + sid = "AllowDatabricksControlPlaneGrants" + actions = [ + "kms:CreateGrant", + "kms:DescribeKey" + ] + principals { + type = "AWS" + identifiers = ["arn:aws:iam::414351767826:root"] + } + resources = ["*"] + condition { + test = "Bool" + variable = "kms:GrantIsForAWSResource" + values = ["true"] + } + } + + # Optional: let your Databricks cross-account role create grants too + statement { + sid = "AllowCrossAccountProvisioningRoleGrants" + actions = [ + "kms:CreateGrant", + "kms:DescribeKey" + ] + principals { + type = "AWS" + identifiers = [var.cross_account_role_arn] + } + resources = ["*"] + condition { + test = "Bool" + variable = "kms:GrantIsForAWSResource" + values = ["true"] + } + } +} + +resource "aws_kms_key_policy" "databricks_cmk_policy" { + key_id = aws_kms_key.databricks_cmk.key_id + policy = data.aws_iam_policy_document.databricks_cmk_policy.json +} + +# Wait for KMS key policy to propagate +resource "time_sleep" "wait_for_kms_policy" { + create_duration = "30s" + + depends_on = [aws_kms_key_policy.databricks_cmk_policy] +} diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-cmk/outputs.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-cmk/outputs.tf new file mode 100644 index 0000000..d299aba --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-cmk/outputs.tf @@ -0,0 +1,12 @@ +output "key_arn" { + description = "ARN of the KMS key" + value = aws_kms_key.databricks_cmk.arn + depends_on = [time_sleep.wait_for_kms_policy] +} + +output "key_id" { + description = "ID of the KMS key" + value = aws_kms_key.databricks_cmk.key_id + depends_on = [time_sleep.wait_for_kms_policy] +} + diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-cmk/variables.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-cmk/variables.tf new file mode 100644 index 0000000..5a31cfe --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-cmk/variables.tf @@ -0,0 +1,10 @@ +variable "cross_account_role_arn" { + type = string + description = "ARN of the Databricks cross-account IAM role" +} + +variable "project_name" { + type = string + description = "Project name for unique resource naming" +} + diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-iam/main.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-iam/main.tf new file mode 100644 index 0000000..33e0467 --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-iam/main.tf @@ -0,0 +1,44 @@ +# Cross-account IAM role that Databricks assumes (commercial control plane account id 414351767826) +data "aws_caller_identity" "current" {} + +resource "aws_iam_role" "databricks" { + name = "${var.project}-databricks-cross-account" + + assume_role_policy = jsonencode({ + Version = "2012-10-17", + Statement = [{ + Effect = "Allow", + Principal = { AWS = "arn:aws:iam::414351767826:root" }, + Action = "sts:AssumeRole", + Condition = { + StringEquals = { "sts:ExternalId" = var.external_id } + } + }] + }) +} + +# Minimal policy for workspace provisioning; tailor to your org's security baseline +data "aws_iam_policy_document" "policy" { + statement { + actions = [ + "s3:*", + "ec2:*", + "iam:PassRole", + "iam:CreateServiceLinkedRole", + "kms:*", + "sts:AssumeRole" + ] + resources = ["*"] + } +} + +resource "aws_iam_role_policy" "databricks_inline" { + name = "${var.project}-databricks-provisioning" + role = aws_iam_role.databricks.id + policy = data.aws_iam_policy_document.policy.json +} + +output "cross_account_role_arn" { + value = aws_iam_role.databricks.arn + depends_on = [aws_iam_role_policy.databricks_inline] +} diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-iam/variables.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-iam/variables.tf new file mode 100644 index 0000000..a73fdbe --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-iam/variables.tf @@ -0,0 +1,3 @@ +variable "project" { type = string } +variable "account_id" { type = string } # Databricks Account ID (not AWS) +variable "external_id" { type = string } diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-network/main.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-network/main.tf new file mode 100644 index 0000000..22ee580 --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-network/main.tf @@ -0,0 +1,142 @@ +data "aws_availability_zones" "azs" { state = "available" } + +resource "aws_vpc" "this" { + cidr_block = var.vpc_cidr + enable_dns_support = true + enable_dns_hostnames = true + tags = { Name = "${var.project}-vpc" } +} + +resource "aws_internet_gateway" "igw" { + vpc_id = aws_vpc.this.id + tags = { Name = "${var.project}-igw" } +} + +resource "aws_subnet" "private" { + count = 2 + vpc_id = aws_vpc.this.id + cidr_block = var.private_subnet_cidrs[count.index] + availability_zone = data.aws_availability_zones.azs.names[count.index] + map_public_ip_on_launch = false + tags = { Name = "${var.project}-private-${count.index}" } +} + +# Route tables for private subnets (allow local only; egress via endpoints/NAT if added) +resource "aws_route_table" "private" { + vpc_id = aws_vpc.this.id + tags = { Name = "${var.project}-rt-private" } +} + +resource "aws_route_table_association" "a" { + count = length(aws_subnet.private) + subnet_id = aws_subnet.private[count.index].id + route_table_id = aws_route_table.private.id +} + +# Security group for workspace nodes <-> endpoints +resource "aws_security_group" "workspace" { + name = "${var.project}-workspace-sg" + description = "Databricks workspace SG" + vpc_id = aws_vpc.this.id + + # Egress all (fine-tune if needed) + egress { + from_port = 0 + to_port = 0 + protocol = "-1" + cidr_blocks = ["0.0.0.0/0"] + } + + tags = { Name = "${var.project}-workspace-sg" } +} + +# SG for VPC interface endpoints +resource "aws_security_group" "vpce" { + name = "${var.project}-vpce-sg" + description = "Security group for VPC endpoints (PL back-end)" + vpc_id = aws_vpc.this.id + + ingress { + from_port = 443 + to_port = 443 + protocol = "tcp" + security_groups = [aws_security_group.workspace.id] + } + + # SCC relay port 6666 + ingress { + from_port = 6666 + to_port = 6666 + protocol = "tcp" + security_groups = [aws_security_group.workspace.id] + } + + egress { + from_port = 0 + to_port = 0 + protocol = "-1" + cidr_blocks = ["0.0.0.0/0"] + } + + tags = { Name = "${var.project}-vpce-sg" } +} + +# VPC Interface Endpoint - Databricks Workspace (REST APIs) for back-end +resource "aws_vpc_endpoint" "workspace" { + vpc_id = aws_vpc.this.id + service_name = var.pl_service_names.workspace + vpc_endpoint_type = "Interface" + subnet_ids = [aws_subnet.private[0].id] + security_group_ids = [aws_security_group.vpce.id] + private_dns_enabled = true + tags = { Name = "${var.project}-vpce-workspace" } +} + +# VPC Interface Endpoint - Secure Cluster Connectivity Relay +resource "aws_vpc_endpoint" "scc" { + vpc_id = aws_vpc.this.id + service_name = var.pl_service_names.scc + vpc_endpoint_type = "Interface" + subnet_ids = [aws_subnet.private[1].id] + security_group_ids = [aws_security_group.vpce.id] + private_dns_enabled = true + tags = { Name = "${var.project}-vpce-scc" } +} + +# Extra endpoints recommended by Databricks +resource "aws_vpc_endpoint" "sts" { + count = var.enable_extra_endpoints ? 1 : 0 + vpc_id = aws_vpc.this.id + service_name = "com.amazonaws.${var.region}.sts" + vpc_endpoint_type = "Interface" + subnet_ids = [for s in aws_subnet.private : s.id] + security_group_ids = [aws_security_group.vpce.id] + private_dns_enabled = true + tags = { Name = "${var.project}-vpce-sts" } +} + +resource "aws_vpc_endpoint" "kinesis" { + count = var.enable_extra_endpoints ? 1 : 0 + vpc_id = aws_vpc.this.id + service_name = "com.amazonaws.${var.region}.kinesis-streams" + vpc_endpoint_type = "Interface" + subnet_ids = [for s in aws_subnet.private : s.id] + security_group_ids = [aws_security_group.vpce.id] + private_dns_enabled = true + tags = { Name = "${var.project}-vpce-kinesis" } +} + +resource "aws_vpc_endpoint" "s3" { + count = var.enable_extra_endpoints ? 1 : 0 + vpc_id = aws_vpc.this.id + service_name = "com.amazonaws.${var.region}.s3" + vpc_endpoint_type = "Gateway" + route_table_ids = [aws_route_table.private.id] + tags = { Name = "${var.project}-vpce-s3" } +} + +output "vpc_id" { value = aws_vpc.this.id } +output "subnet_ids" { value = [for s in aws_subnet.private : s.id] } +output "workspace_sg_id" { value = aws_security_group.workspace.id } +output "vpce_workspace_id" { value = aws_vpc_endpoint.workspace.id } +output "vpce_scc_id" { value = aws_vpc_endpoint.scc.id } diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-network/variables.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-network/variables.tf new file mode 100644 index 0000000..675e2ad --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-network/variables.tf @@ -0,0 +1,14 @@ +variable "project" { type = string } +variable "region" { type = string } +variable "vpc_cidr" { type = string } +variable "private_subnet_cidrs" { type = list(string) } +variable "pl_service_names" { + type = object({ + workspace = string + scc = string + }) +} +variable "enable_extra_endpoints" { + type = bool + default = false +} diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-storage/main.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-storage/main.tf new file mode 100644 index 0000000..6dda564 --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-storage/main.tf @@ -0,0 +1,72 @@ +resource "aws_s3_bucket" "root" { + bucket = var.root_bucket_name + force_destroy = true +} + +resource "aws_s3_bucket_ownership_controls" "root" { + bucket = aws_s3_bucket.root.id + rule { + object_ownership = "ObjectWriter" + } +} + +resource "aws_s3_bucket_versioning" "v" { + bucket = aws_s3_bucket.root.id + versioning_configuration { status = "Enabled" } +} + +resource "aws_s3_bucket_server_side_encryption_configuration" "sse" { + bucket = aws_s3_bucket.root.id + rule { + apply_server_side_encryption_by_default { + sse_algorithm = "aws:kms" + kms_master_key_id = var.kms_key_arn + } + bucket_key_enabled = true + } +} + +resource "aws_s3_bucket_public_access_block" "root" { + bucket = aws_s3_bucket.root.id + block_public_acls = true + block_public_policy = false + ignore_public_acls = true + restrict_public_buckets = false +} + +data "aws_caller_identity" "current" {} + +data "aws_iam_policy_document" "bucket_policy" { + statement { + sid = "Grant Databricks Full Access" + effect = "Allow" + actions = [ + "s3:GetObject", + "s3:GetObjectVersion", + "s3:PutObject", + "s3:DeleteObject", + "s3:ListBucket", + "s3:GetBucketLocation", + "s3:PutObjectAcl" + ] + resources = [ + aws_s3_bucket.root.arn, + "${aws_s3_bucket.root.arn}/*" + ] + principals { + type = "AWS" + identifiers = [ + var.cross_account_role_arn, + "arn:aws:iam::414351767826:root" + ] + } + } +} + +resource "aws_s3_bucket_policy" "root" { + bucket = aws_s3_bucket.root.id + policy = data.aws_iam_policy_document.bucket_policy.json +} + +output "root_bucket" { value = aws_s3_bucket.root.bucket } +output "root_bucket_arn" { value = aws_s3_bucket.root.arn } diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-storage/variables.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-storage/variables.tf new file mode 100644 index 0000000..db79a7e --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-storage/variables.tf @@ -0,0 +1,11 @@ +variable "root_bucket_name" { type = string } + +variable "cross_account_role_arn" { + type = string + description = "ARN of the Databricks cross-account IAM role" +} + +variable "kms_key_arn" { + type = string + description = "ARN of the KMS key for S3 encryption" +} diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-unity-catalog/main.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-unity-catalog/main.tf new file mode 100644 index 0000000..dc9b3dd --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-unity-catalog/main.tf @@ -0,0 +1,207 @@ +# Unity Catalog metastore bucket +resource "aws_s3_bucket" "metastore" { + bucket = "${var.prefix}-unity-catalog-${var.region}" + force_destroy = true +} + +resource "aws_s3_bucket_ownership_controls" "metastore" { + bucket = aws_s3_bucket.metastore.id + rule { + object_ownership = "BucketOwnerPreferred" + } +} + +resource "aws_s3_bucket_versioning" "metastore" { + bucket = aws_s3_bucket.metastore.id + versioning_configuration { + status = "Enabled" + } +} + +resource "aws_s3_bucket_server_side_encryption_configuration" "metastore" { + bucket = aws_s3_bucket.metastore.id + rule { + apply_server_side_encryption_by_default { + sse_algorithm = "aws:kms" + kms_master_key_id = var.kms_key_arn + } + bucket_key_enabled = true + } +} + +resource "aws_s3_bucket_public_access_block" "metastore" { + bucket = aws_s3_bucket.metastore.id + block_public_acls = true + block_public_policy = true + ignore_public_acls = true + restrict_public_buckets = true +} + +data "aws_caller_identity" "current" {} + +# IAM role for Unity Catalog to access the metastore bucket +# Trust policy: Databricks can assume this role +# Self-assuming is added after role creation to avoid circular dependency +resource "aws_iam_role" "unity_catalog" { + name = "${var.prefix}-unity-catalog-role" + + # Initial trust policy - only Databricks + assume_role_policy = jsonencode({ + Version = "2012-10-17" + Statement = [ + { + Effect = "Allow" + Principal = { + AWS = "arn:aws:iam::414351767826:root" + } + Action = "sts:AssumeRole" + Condition = { + StringEquals = { + "sts:ExternalId" = var.databricks_account_id + } + } + } + ] + }) + + # Lifecycle to ignore changes after self-assume is added + lifecycle { + ignore_changes = [assume_role_policy] + } +} + +# Wait for role to be created and propagated +resource "time_sleep" "wait_for_role" { + create_duration = "10s" + depends_on = [aws_iam_role.unity_catalog] +} + +# Update trust policy to add self-assuming capability +# This must happen AFTER the role exists +resource "null_resource" "add_self_assume_to_trust_policy" { + triggers = { + role_arn = aws_iam_role.unity_catalog.arn + } + + provisioner "local-exec" { + interpreter = ["/bin/bash", "-c"] + command = <<-EOT + # Create trust policy with both Databricks and self-assume + cat > /tmp/trust_policy_${var.prefix}.json <<'EOF' + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "AWS": "arn:aws:iam::414351767826:root" + }, + "Action": "sts:AssumeRole", + "Condition": { + "StringEquals": { + "sts:ExternalId": "${var.databricks_account_id}" + } + } + }, + { + "Effect": "Allow", + "Principal": { + "AWS": "${aws_iam_role.unity_catalog.arn}" + }, + "Action": "sts:AssumeRole" + } + ] + } + EOF + + # Update the trust policy + export AWS_PAGER="" + aws iam update-assume-role-policy \ + --role-name ${aws_iam_role.unity_catalog.name} \ + --policy-document file:///tmp/trust_policy_${var.prefix}.json + + # Clean up + rm -f /tmp/trust_policy_${var.prefix}.json + + echo "✅ Trust policy updated with self-assuming capability" + EOT + } + + depends_on = [time_sleep.wait_for_role] +} + +resource "aws_iam_role_policy" "unity_catalog" { + name = "${var.prefix}-unity-catalog-policy" + role = aws_iam_role.unity_catalog.id + + policy = jsonencode({ + Version = "2012-10-17" + Statement = [ + { + Effect = "Allow" + Action = [ + "s3:GetObject", + "s3:GetObjectVersion", + "s3:PutObject", + "s3:PutObjectAcl", + "s3:DeleteObject", + "s3:ListBucket", + "s3:GetBucketLocation" + ] + Resource = [ + aws_s3_bucket.metastore.arn, + "${aws_s3_bucket.metastore.arn}/*" + ] + }, + { + Effect = "Allow" + Action = [ + "kms:Decrypt", + "kms:Encrypt", + "kms:GenerateDataKey" + ] + Resource = [ + var.kms_key_arn + ] + }, + { + Effect = "Allow" + Action = "sts:AssumeRole" + Resource = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${var.prefix}-unity-catalog-role" + } + ] + }) +} + +# Bucket policy for Unity Catalog +data "aws_iam_policy_document" "metastore_bucket_policy" { + statement { + sid = "Grant Unity Catalog Access" + effect = "Allow" + actions = [ + "s3:GetObject", + "s3:GetObjectVersion", + "s3:PutObject", + "s3:PutObjectAcl", + "s3:DeleteObject", + "s3:ListBucket", + "s3:GetBucketLocation" + ] + resources = [ + aws_s3_bucket.metastore.arn, + "${aws_s3_bucket.metastore.arn}/*" + ] + principals { + type = "AWS" + identifiers = [ + aws_iam_role.unity_catalog.arn + ] + } + } +} + +resource "aws_s3_bucket_policy" "metastore" { + bucket = aws_s3_bucket.metastore.id + policy = data.aws_iam_policy_document.metastore_bucket_policy.json +} + diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-unity-catalog/outputs.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-unity-catalog/outputs.tf new file mode 100644 index 0000000..02c148b --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-unity-catalog/outputs.tf @@ -0,0 +1,31 @@ +output "metastore_bucket_name" { + description = "Name of the Unity Catalog metastore bucket" + value = aws_s3_bucket.metastore.bucket +} + +output "metastore_bucket_arn" { + description = "ARN of the Unity Catalog metastore bucket" + value = aws_s3_bucket.metastore.arn +} + +output "unity_catalog_role_arn" { + description = "ARN of the Unity Catalog IAM role (with self-assuming enabled)" + value = aws_iam_role.unity_catalog.arn +} + +output "unity_catalog_role_id" { + description = "ID of the Unity Catalog IAM role" + value = aws_iam_role.unity_catalog.id +} + +output "unity_catalog_role_name" { + description = "Name of the Unity Catalog IAM role" + value = aws_iam_role.unity_catalog.name +} + +output "trust_policy_updated" { + description = "Indicates trust policy has been updated with self-assuming" + value = null_resource.add_self_assume_to_trust_policy.id + depends_on = [null_resource.add_self_assume_to_trust_policy] +} + diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-unity-catalog/variables.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-unity-catalog/variables.tf new file mode 100644 index 0000000..3c7ec44 --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/modules/aws-unity-catalog/variables.tf @@ -0,0 +1,25 @@ +variable "prefix" { + type = string + description = "Prefix for resource names" +} + +variable "region" { + type = string + description = "AWS region" +} + +variable "databricks_account_id" { + type = string + description = "Databricks account ID" +} + +variable "cross_account_role_arn" { + type = string + description = "ARN of the Databricks cross-account IAM role" +} + +variable "kms_key_arn" { + type = string + description = "ARN of the KMS key for S3 encryption" +} + diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/outputs.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/outputs.tf new file mode 100644 index 0000000..4c47893 --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/outputs.tf @@ -0,0 +1,95 @@ +# ==================== WORKSPACE OUTPUTS ==================== +output "workspace_url" { + value = databricks_mws_workspaces.ws.workspace_url + description = "Databricks workspace URL" +} + +output "workspace_id" { + value = databricks_mws_workspaces.ws.workspace_id + description = "Workspace ID" +} + +output "workspace_deployment_name" { + value = databricks_mws_workspaces.ws.deployment_name + description = "Workspace deployment name" +} + +# ==================== NETWORKING OUTPUTS ==================== +output "vpc_id" { + value = local.vpc_id + description = "VPC ID (created or existing)" +} + +output "subnet_ids" { + value = local.subnet_ids + description = "Subnet IDs (created or existing)" +} + +output "workspace_security_group_id" { + value = local.workspace_sg_id + description = "Workspace security group ID (created or existing)" +} + +output "vpc_endpoint_workspace_id" { + value = local.vpce_workspace_id + description = "VPC endpoint ID for Databricks workspace" +} + +output "vpc_endpoint_scc_id" { + value = local.vpce_scc_id + description = "VPC endpoint ID for SCC relay" +} + +# ==================== ENCRYPTION OUTPUTS ==================== +output "kms_key_arn" { + value = local.kms_key_arn + description = "KMS key ARN (created or existing)" + sensitive = false +} + +output "customer_managed_key_id" { + value = databricks_mws_customer_managed_keys.cmk.customer_managed_key_id + description = "Databricks customer managed key ID" +} + +# ==================== UNITY CATALOG OUTPUTS ==================== +output "metastore_id" { + value = databricks_metastore.this.id + description = "Unity Catalog metastore ID" +} + +output "metastore_bucket" { + value = module.unity_catalog.metastore_bucket_name + description = "Unity Catalog metastore S3 bucket name" +} + +output "unity_catalog_role_arn" { + value = module.unity_catalog.unity_catalog_role_arn + description = "Unity Catalog IAM role ARN" +} + +output "unity_catalog_role_name" { + value = module.unity_catalog.unity_catalog_role_name + description = "Unity Catalog IAM role name (for post-deploy self-assume setup)" +} + +# ==================== IAM OUTPUTS ==================== +output "cross_account_role_arn" { + value = module.iam.cross_account_role_arn + description = "Cross-account IAM role ARN" +} + +output "root_bucket_name" { + value = module.storage.root_bucket + description = "Workspace root S3 bucket name" +} + +# ==================== CONFIGURATION INFO ==================== +output "deployment_mode" { + value = { + vpc_created = var.create_new_vpc + cmk_created = var.create_new_cmk + } + description = "Shows whether resources were created (true) or existing resources were used (false)" +} + diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/providers.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/providers.tf new file mode 100644 index 0000000..7427ba3 --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/providers.tf @@ -0,0 +1,22 @@ +provider "aws" { + region = var.region +} + +# Account-level provider (used for mws_* resources) +# Auth via env: DATABRICKS_CLIENT_ID, DATABRICKS_CLIENT_SECRET, DATABRICKS_ACCOUNT_ID, DATABRICKS_HOST +# Or use explicit client_id/client_secret attributes +provider "databricks" { + alias = "mws" + account_id = var.databricks_account_id + host = var.databricks_account_host + client_id = var.databricks_client_id # Optional: comment out to use env vars + client_secret = var.databricks_client_secret # Optional: comment out to use env vars +} + +# Workspace-level provider for Unity Catalog resources +provider "databricks" { + alias = "workspace" + host = databricks_mws_workspaces.ws.workspace_url + client_id = var.databricks_client_id + client_secret = var.databricks_client_secret +} diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/terraform.tfvars.example-existing-resources b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/terraform.tfvars.example-existing-resources new file mode 100644 index 0000000..b6dff4c --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/terraform.tfvars.example-existing-resources @@ -0,0 +1,74 @@ +# ==================================================================== +# EXAMPLE: Using EXISTING VPC, Subnets, and CMK +# ==================================================================== +# Use this configuration when you have existing infrastructure + +project = "my-databricks-project" +region = "us-east-1" + +# ==================== NETWORKING - USE EXISTING ==================== +create_new_vpc = false + +# Leave these empty when using existing VPC +vpc_cidr = "" +private_subnet_cidrs = [] + +# Provide your existing VPC and subnet IDs +existing_vpc_id = "vpc-0123456789abcdef0" +existing_subnet_ids = ["subnet-0123456789abcdef0", "subnet-0123456789abcdef1"] + +# Optional: provide existing security group ID +# If not provided, a new security group will be created in your existing VPC +existing_security_group_id = "" # or "sg-0123456789abcdef0" + +# ==================== CMK - USE EXISTING ==================== +create_new_cmk = false +existing_cmk_arn = "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012" + +# ==================== DATABRICKS ACCOUNT ==================== +databricks_account_id = "your-account-id" +databricks_account_host = "https://accounts.cloud.databricks.com" +databricks_client_id = "your-client-id" +databricks_client_secret = "your-client-secret" + +root_bucket_name = "my-databricks-root-bucket" + +# Get this from Databricks Account Console -> Cloud resources -> Credentials +databricks_crossaccount_role_external_id = "your-external-id" + +# ==================== PRIVATELINK ==================== +# From Databricks docs: PrivateLink VPC endpoint services table +# https://docs.databricks.com/administration-guide/cloud-configurations/aws/privatelink.html +pl_service_names = { + workspace = "com.amazonaws.vpce.us-east-1.vpce-svc-09143d1e626de2f04" + scc = "com.amazonaws.vpce.us-east-1.vpce-svc-00018a8c3ff62ffdf" +} + +enable_extra_endpoints = false + +# ==================================================================== +# IMPORTANT NOTES FOR EXISTING RESOURCES: +# ==================================================================== +# +# 1. EXISTING VPC REQUIREMENTS: +# - VPC must have DNS support and DNS hostnames enabled +# - Subnets must be in different Availability Zones +# - Subnets should have appropriate route tables configured +# +# 2. EXISTING CMK REQUIREMENTS: +# - CMK must have a key policy that allows: +# a) Your AWS account root to manage the key +# b) Databricks control plane (414351767826) to use the key +# c) Your cross-account IAM role to create grants +# - See modules/aws-cmk/main.tf for the required policy +# - The CMK must be in the same region as your workspace +# +# 3. SECURITY GROUP (if provided): +# - Must allow egress to 0.0.0.0/0 (all traffic) +# - Will be used for Databricks workspace compute +# +# 4. VPC ENDPOINTS: +# - VPC endpoints for Databricks workspace and SCC will still be created +# - These are required for PrivateLink connectivity +# - Security groups for VPC endpoints will be created automatically + diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/terraform.tfvars.example-new-resources b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/terraform.tfvars.example-new-resources new file mode 100644 index 0000000..b932a10 --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/terraform.tfvars.example-new-resources @@ -0,0 +1,42 @@ +# ==================================================================== +# EXAMPLE: Creating NEW VPC, Subnets, and CMK +# ==================================================================== +# Use this configuration when you want Terraform to create all resources + +project = "my-databricks-project" +region = "us-east-1" + +# ==================== NETWORKING - CREATE NEW ==================== +create_new_vpc = true +vpc_cidr = "10.20.0.0/16" +private_subnet_cidrs = ["10.20.1.0/24", "10.20.2.0/24"] + +# Leave these empty when creating new VPC +existing_vpc_id = "" +existing_subnet_ids = [] +existing_security_group_id = "" + +# ==================== CMK - CREATE NEW ==================== +create_new_cmk = true +existing_cmk_arn = "" + +# ==================== DATABRICKS ACCOUNT ==================== +databricks_account_id = "your-account-id" +databricks_account_host = "https://accounts.cloud.databricks.com" +databricks_client_id = "your-client-id" +databricks_client_secret = "your-client-secret" + +root_bucket_name = "my-databricks-root-bucket" + +# Get this from Databricks Account Console -> Cloud resources -> Credentials +databricks_crossaccount_role_external_id = "your-external-id" + +# ==================== PRIVATELINK ==================== +# From Databricks docs: PrivateLink VPC endpoint services table +# https://docs.databricks.com/administration-guide/cloud-configurations/aws/privatelink.html +pl_service_names = { + workspace = "com.amazonaws.vpce.us-east-1.vpce-svc-09143d1e626de2f04" + scc = "com.amazonaws.vpce.us-east-1.vpce-svc-00018a8c3ff62ffdf" +} + +enable_extra_endpoints = false \ No newline at end of file diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/validation.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/validation.tf new file mode 100644 index 0000000..e3c03af --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/validation.tf @@ -0,0 +1,45 @@ +# Input validation to ensure proper configuration + +locals { + # Validate VPC configuration + vpc_validation = ( + var.create_new_vpc ? + (var.vpc_cidr != "" && length(var.private_subnet_cidrs) >= 2) : + (var.existing_vpc_id != "" && length(var.existing_subnet_ids) >= 2) + ) + + # Validate CMK configuration + cmk_validation = ( + var.create_new_cmk ? + true : + var.existing_cmk_arn != "" + ) +} + +# Validation checks +resource "null_resource" "validation" { + lifecycle { + precondition { + condition = local.vpc_validation + error_message = <<-EOT + VPC Configuration Error: + - If create_new_vpc = true, you must provide vpc_cidr and at least 2 private_subnet_cidrs + - If create_new_vpc = false, you must provide existing_vpc_id and at least 2 existing_subnet_ids + EOT + } + + precondition { + condition = local.cmk_validation + error_message = <<-EOT + CMK Configuration Error: + - If create_new_cmk = false, you must provide existing_cmk_arn + EOT + } + + precondition { + condition = !var.create_new_vpc ? length(var.existing_subnet_ids) >= 2 : true + error_message = "When using existing VPC, you must provide at least 2 subnet IDs in different Availability Zones." + } + } +} + diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/variables.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/variables.tf new file mode 100644 index 0000000..fdef62d --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/variables.tf @@ -0,0 +1,91 @@ +variable "project" { type = string } +variable "region" { type = string } + +# ==================== NETWORKING OPTIONS ==================== +# Option 1: Create new VPC (set create_new_vpc = true) +variable "create_new_vpc" { + type = bool + default = true + description = "Set to true to create new VPC, false to use existing VPC" +} + +# Variables for NEW VPC creation +variable "vpc_cidr" { + type = string + default = "" + description = "CIDR block for new VPC (required if create_new_vpc = true)" +} + +variable "private_subnet_cidrs" { + type = list(string) + default = [] + description = "Two CIDR blocks in distinct AZs (required if create_new_vpc = true)" +} + +# Variables for EXISTING VPC +variable "existing_vpc_id" { + type = string + default = "" + description = "ID of existing VPC (required if create_new_vpc = false)" +} + +variable "existing_subnet_ids" { + type = list(string) + default = [] + description = "List of existing subnet IDs in different AZs (required if create_new_vpc = false)" +} + +variable "existing_security_group_id" { + type = string + default = "" + description = "ID of existing security group for Databricks workspace (optional if create_new_vpc = false)" +} + +# ==================== CMK OPTIONS ==================== +# Option 2: Create new CMK or use existing +variable "create_new_cmk" { + type = bool + default = true + description = "Set to true to create new CMK, false to use existing CMK" +} + +variable "existing_cmk_arn" { + type = string + default = "" + description = "ARN of existing KMS CMK (required if create_new_cmk = false)" +} + +# Databricks account +variable "databricks_account_id" { type = string } +variable "databricks_account_host" { + type = string + default = "https://accounts.cloud.databricks.com" +} +variable "databricks_client_id" { + type = string + sensitive = true +} +variable "databricks_client_secret" { + type = string + sensitive = true +} + +# Root bucket for workspace storage +variable "root_bucket_name" { type = string } + +# Cross-account role external ID (from Databricks Account Console -> Cloud resources -> Credentials) +variable "databricks_crossaccount_role_external_id" { type = string } + +# PrivateLink service names for your region (from Databricks table) +variable "pl_service_names" { + type = object({ + workspace = string # com.amazonaws.vpce..vpce-svc-xxxxxxxxxxxxxxxxx + scc = string # com.amazonaws.vpce..vpce-svc-xxxxxxxxxxxxxxxxx + }) +} + +# Optional: add STS/Kinesis endpoints & S3 Gateway +variable "enable_extra_endpoints" { + type = bool + default = false +} diff --git a/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/versions.tf b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/versions.tf new file mode 100644 index 0000000..98d8775 --- /dev/null +++ b/workspace-setup/terraform-examples/aws/aws-pl-back-cmk/versions.tf @@ -0,0 +1,21 @@ +terraform { + required_version = ">= 1.5.0" + required_providers { + aws = { + source = "hashicorp/aws" + version = "~> 6.0" + } + databricks = { + source = "databricks/databricks" + version = ">= 1.30.0" + } + local = { + source = "hashicorp/local" + version = "~> 2.0" + } + null = { + source = "hashicorp/null" + version = "~> 3.0" + } + } +}