Version: 0.1.0
Provision a production-ready GKE cluster for DecisionBox using the included Terraform module.
| Resource | Description |
|---|---|
| VPC | Dedicated network with subnets for nodes, pods, and services |
| Cloud NAT | Outbound internet access for private nodes |
| Firewall rules | Internal traffic + GCP health check ranges |
| GKE cluster | Private nodes, Dataplane V2, auto-upgrade, shielded nodes |
| Node pool | Auto-scaling with configurable machine type and disk |
| Service accounts | Node SA (logging/monitoring) + Workload Identity SA (API) |
| IAM bindings | Workload Identity, Secret Manager access, BigQuery access (optional) |
- Terraform 1.5+
- gcloud CLI authenticated with a project
- GCP project with billing enabled
- Sufficient IAM permissions (Project Owner or Editor)
The included setup wizard handles Terraform state, cluster provisioning, and Helm deployment in one flow:
cd terraform
./setup.sh # Full interactive setup
./setup.sh --dry-run # Generate config files only
./setup.sh --resume # Resume from Helm deployThe wizard prompts for:
- Cloud provider (GCP)
- Secret namespace prefix
- Secret provider (GCP Secret Manager or MongoDB)
- GCP project ID, region, cluster name
- Terraform state bucket (auto-creates if needed)
- Machine type and node scaling
- BigQuery IAM (optional)
SECRET_ENCRYPTION_KEY(auto-generates or user-provided)
After provisioning, it automatically:
- Configures
kubectlcredentials - Creates the Kubernetes namespace and secrets
- Deploys API and Dashboard via Helm
- Waits for ingress and verifies health checks
- Displays the dashboard URL
PROJECT_ID=$(gcloud config get-value project)
gsutil mb -p $PROJECT_ID gs://$PROJECT_ID-terraform-state
gsutil versioning set on gs://$PROJECT_ID-terraform-stateCreate terraform/gcp/prod/terraform.tfvars:
project_id = "my-gcp-project"
region = "us-central1"
cluster_name = "decisionbox-prod"
# Networking
create_vpc = true
subnet_cidr = "10.0.0.0/20"
pods_cidr = "10.4.0.0/14"
services_cidr = "10.8.0.0/20"
# Node pool
machine_type = "e2-standard-2"
min_node_count = 1
max_node_count = 2
disk_size_gb = 50
# Workload Identity
k8s_namespace = "decisionbox"
k8s_service_account = "decisionbox-api"
# Optional: GCP Secret Manager
enable_gcp_secrets = true
secret_namespace = "decisionbox"
# Optional: BigQuery read access
enable_bigquery_iam = truecd terraform/gcp/prod
terraform init \
-backend-config="bucket=$PROJECT_ID-terraform-state" \
-backend-config="prefix=prod"
terraform plan -out=tfplan
terraform apply tfplangcloud container clusters get-credentials decisionbox-prod \
--region us-central1 \
--project $PROJECT_IDFollow the Kubernetes Deployment guide to deploy the API and Dashboard.
When using GCP Secret Manager with Workload Identity, annotate the service account:
# values-prod.yaml
serviceAccountAnnotations:
iam.gke.io/gcp-service-account: "decisionbox-prod-api@my-gcp-project.iam.gserviceaccount.com"terraform/gcp/
├── prod/
│ ├── versions.tf # Provider versions (Google 5.0-7.0)
│ ├── variables.tf # Environment-level variables
│ ├── main.tf # Module instantiation
│ └── outputs.tf # Cluster outputs
└── modules/decisionbox/
├── apis.tf # GCP API enablement
├── networking.tf # VPC, subnets, NAT, firewalls
├── gke.tf # GKE cluster + node pool
├── iam.tf # Service accounts + Workload Identity
├── secrets.tf # Secret Manager IAM (conditional)
├── bigquery.tf # BigQuery IAM (conditional)
├── variables.tf # 40+ input variables
└── outputs.tf # Cluster outputs
All variables are defined in terraform/gcp/modules/decisionbox/variables.tf.
| Variable | Type | Description |
|---|---|---|
project_id |
string | GCP project ID |
| Variable | Type | Default | Description |
|---|---|---|---|
region |
string | us-central1 |
GCP region |
cluster_name |
string | decisionbox-prod |
GKE cluster name |
create_cluster |
bool | true |
Create GKE cluster (false to use existing) |
deletion_protection |
bool | true |
Prevent accidental cluster deletion |
release_channel |
string | REGULAR |
GKE release channel |
datapath_provider |
string | ADVANCED_DATAPATH |
Dataplane V2 for network policy |
enable_network_policy |
bool | true |
Enable network policy enforcement |
network_policy_provider |
string | CALICO |
Network policy provider (used when not ADVANCED_DATAPATH) |
enable_binary_authorization |
bool | false |
Binary Authorization for container images |
logging_components |
list(string) | ["SYSTEM_COMPONENTS", "WORKLOADS"] |
GKE logging components |
monitoring_components |
list(string) | ["SYSTEM_COMPONENTS"] |
GKE monitoring components |
| Variable | Type | Default | Description |
|---|---|---|---|
create_vpc |
bool | true |
Create VPC (false to use existing) |
existing_vpc_id |
string | "" |
Existing VPC self-link (when create_vpc=false) |
existing_subnet_id |
string | "" |
Existing subnet self-link |
subnet_cidr |
string | 10.0.0.0/20 |
Node subnet CIDR |
pods_cidr |
string | 10.4.0.0/14 |
Pod IP range |
pods_range_name |
string | pods |
Secondary range name for pods |
services_cidr |
string | 10.8.0.0/20 |
Service IP range |
services_range_name |
string | services |
Secondary range name for services |
master_cidr |
string | 172.16.0.0/28 |
Control plane CIDR |
enable_private_nodes |
bool | true |
Nodes have no public IPs |
enable_private_endpoint |
bool | false |
Restrict master to private network |
master_authorized_networks |
list(object) | [{cidr_block="0.0.0.0/0", display_name="all"}] |
CIDRs allowed to reach the master API |
enable_flow_logs |
bool | true |
VPC flow logs |
flow_log_interval |
string | INTERVAL_10_MIN |
Flow log aggregation interval |
flow_log_sampling |
number | 0.5 |
Flow log sampling rate (0.0-1.0) |
flow_log_metadata |
string | INCLUDE_ALL_METADATA |
Flow log metadata inclusion |
| Variable | Type | Default | Description |
|---|---|---|---|
nat_ip_allocate_option |
string | AUTO_ONLY |
NAT IP allocation |
nat_source_subnetwork_ip_ranges |
string | ALL_SUBNETWORKS_ALL_IP_RANGES |
NAT source ranges |
enable_nat_logging |
bool | true |
Cloud NAT logging |
nat_log_filter |
string | ERRORS_ONLY |
NAT log filter |
| Variable | Type | Default | Description |
|---|---|---|---|
internal_tcp_ports |
list(string) | ["0-65535"] |
Internal TCP ports allowed |
internal_udp_ports |
list(string) | ["0-65535"] |
Internal UDP ports allowed |
health_check_ports |
list(string) | ["80","443","3000","8080","10256"] |
Health check ports |
health_check_source_ranges |
list(string) | ["35.191.0.0/16","130.211.0.0/22"] |
GCP health check IP ranges |
| Variable | Type | Default | Description |
|---|---|---|---|
machine_type |
string | e2-standard-2 |
GCE machine type |
disk_size_gb |
number | 50 |
Boot disk size (GB) |
disk_type |
string | pd-standard |
Boot disk type |
image_type |
string | COS_CONTAINERD |
Node image |
min_node_count |
number | 1 |
Minimum nodes per zone |
max_node_count |
number | 2 |
Maximum nodes per zone |
enable_secure_boot |
bool | true |
Shielded VM secure boot |
enable_integrity_monitoring |
bool | true |
Shielded VM integrity monitoring |
enable_auto_repair |
bool | true |
Auto-repair unhealthy nodes |
enable_auto_upgrade |
bool | true |
Auto-upgrade node versions |
disable_legacy_metadata_endpoints |
string | "true" |
Disable legacy metadata API |
| Variable | Type | Default | Description |
|---|---|---|---|
k8s_namespace |
string | decisionbox |
Kubernetes namespace for Workload Identity |
k8s_service_account |
string | decisionbox-api |
K8s service account name (API) |
k8s_agent_service_account |
string | decisionbox-agent |
K8s service account name (Agent, read-only) |
enable_gcp_secrets |
bool | false |
Create Secret Manager IAM bindings |
secret_namespace |
string | decisionbox |
Secret name prefix for IAM conditions |
enable_bigquery_iam |
bool | false |
Grant BigQuery read access to the agent SA |
| Variable | Type | Default | Description |
|---|---|---|---|
labels |
map(string) | {} |
Resource labels applied to all resources |
| Output | Sensitive | Description |
|---|---|---|
cluster_name |
No | GKE cluster name |
cluster_endpoint |
Yes | Kubernetes API endpoint |
cluster_ca_certificate |
Yes | CA certificate for kubectl |
vpc_name |
No | VPC network name |
workload_identity_sa_email |
No | GCP service account for API Workload Identity |
agent_workload_identity_sa_email |
No | GCP service account for Agent Workload Identity (read-only) |
gcp_secrets_iam_enabled |
No | Whether Secret Manager IAM was configured |
bigquery_iam_enabled |
No | Whether BigQuery IAM was configured |
The module creates a GCP service account and binds it to a Kubernetes service account via Workload Identity. This allows the API pod to authenticate to GCP services (Secret Manager, BigQuery) without storing credentials.
K8s ServiceAccount: decisionbox/decisionbox-api
↕ Workload Identity binding
GCP ServiceAccount: decisionbox-prod-api@project.iam.gserviceaccount.com
↓ IAM roles
GCP Secret Manager (namespace-scoped)
BigQuery (data viewer + job user)
The Helm chart must annotate the K8s service account:
serviceAccountAnnotations:
iam.gke.io/gcp-service-account: "decisionbox-prod-api@my-project.iam.gserviceaccount.com"When enable_gcp_secrets=true, the module creates IAM bindings with conditions that restrict the API to secrets prefixed with the configured namespace:
- Allowed:
decisionbox-project123-llm-api-key - Blocked:
other-app-database-password
This ensures multi-tenant isolation when multiple applications share a GCP project.
To deploy into an existing network:
create_vpc = false
existing_vpc_id = "projects/my-project/global/networks/my-vpc"
existing_subnet_id = "projects/my-project/regions/us-central1/subnetworks/my-subnet"The subnet must have secondary IP ranges named pods and services.
# Remove Helm releases first
helm uninstall decisionbox-dashboard -n decisionbox
helm uninstall decisionbox-api -n decisionbox
# Disable deletion protection
terraform apply -var="deletion_protection=false"
# Destroy infrastructure
terraform destroy- Kubernetes Deployment — Deploy with Helm after Terraform
- Helm Values Reference — All chart configuration options
- Production Considerations — Scaling, monitoring, backups