Skip to content

Can not terraform destroy when http_load_balancing=true due to fw rules dependency (private-cluster) resourceInUseByAnotherResource #2411

@brokedba

Description

@brokedba

TL;DR

CRITICAL BUG: Breaks fundamental IaC principles

Warning

"Untracked GKE-managed resources blocking destruction of tracked resources" resourceInUseByAnotherResource

This private-cluster-update-variant module violates the fundamental promise of Infrastructure-as-Code. terraform destroy fails catastrophically due to GKE creating untracked forwarding rules linked to the load balancer that block subnet deletion.

3 DAYS of debugging and manual intervention required for what should be a simple destroy operation. It has been very frustrating to say the least.

Affected: All users using http_load_balancing = true (the default).

Looking at the module structure, this affects the modules/private-cluster-update-variant/

resource "google_container_cluster" "primary" {
provider = google
name = var.name
description = var.description
project = var.project_id
resource_labels = var.cluster_resource_labels
location = local.location
node_locations = local.node_locations
cluster_ipv4_cidr = var.cluster_ipv4_cidr
cluster resource and related networking components in modules/beta-private-cluster-update-variant/
data "google_compute_subnetwork" "gke_subnetwork" {
provider = google
count = var.add_cluster_firewall_rules ? 1 : 0
name = var.subnetwork
region = local.region
project = local.network_project_id
}

Note: This is similar to this issue https://discuss.hashicorp.com/t/gcp-delete-automatic-created-resources-not-by-terraform/28749

Expected behavior

Classic scenario

When I run terraform destroy, ALL resources should be cleaned up automatically without any manual intervention. That's the entire point of IaC - declarative, reproducible, hands-off infrastructure management.

Observed behavior

BROKEN DESTROY PROCESS

  1. terraform destroy runs
  2. GKE cluster gets destroyed
  3. VPC module SUBNET DELETION FAILS with dependency errors
  4. Manual detective work required to identify phantom forwarding rules created by GKE
  5. Manual gcloud commands needed to clean up orphaned resources
  6. Multiple destroy attempts required
  7. Complete failure of IaC principles
Image
terraform destroy
...
module.vpc["vpc"].module.subnets.google_compute_subnetwork.subnetwork["us-east1/vllm-subnet"]: Destroying... [id=projects/**/regions/us-east1/subnetworks/vllm-subnet]  
╷  
│ Error: 
Error when reading or editing Subnetwork: googleapi: Error 400: 
The subnetwork resource 'projects/**/regions/us-east1/subnetworks/vllm-subnet' is already being used by 
'projects/**/regions/us-east1/forwardingRules/a910a2abd46d247119f8a241b7234957', resourceInUseByAnotherResource  
│  

Root Cause:

When http_load_balancing = true , GKE automatically creates forwarding rules that reference the subnet. These resources are NOT tracked by Terraform, creating invisible dependencies that break the destruction process.

project → vpc → subnet → gke → [INVISIBLE FORWARDING RULES] → subnet deletion fails

Terraform Configuration

Using modules/private-cluster-update-variant with standard VPC setup:

module "gke" {  
  source = "./modules/private-cluster-update-variant"  
    
  project_id = local.target_project_id  
  name       = var.cluster_name  
  region     = var.region  
  network    = local.vpc_name  
  subnetwork = local.subnet_name  
    
  http_load_balancing = true  # DEFAULT - causes the issue  
  deletion_protection = false
# ... other config  
}  

vpc
``HCL
module "vpc" {
  source = "./modules/google-network"
  # version  = "~> 9.0"  
  for_each = var.create_vpc ? { "vpc" = {} } : {}

  # Required parameters  
  project_id   = local.target_project_id
  network_name = var.vpc_name
  routing_mode = "REGIONAL"

  # Subnets configuration (GCP uses single subnet + secondary ranges)  
  subnets = [
    {
      subnet_name           = var.subnetwork_name
      subnet_ip             = var.subnetwork_cidr
      subnet_region         = var.region
      subnet_private_access = "true"
      subnet_flow_logs      = "true"
      description           = "GKE cluster subnet"
    }
  ]

  # Secondary IP ranges for GKE pods and services (singular naming)  
  secondary_ranges = {
    (var.subnetwork_name) = [
      {
        range_name    = var.pod_range_name
        ip_cidr_range = var.pod_cidr
      },
      {
        range_name    = var.service_range_name
        ip_cidr_range = var.service_cidr
      }
    ]
  }

  # Optional parameters  
  delete_default_internet_gateway_routes = false

    depends_on = [  
    module.vllm_gke_project,  
    google_project_service.existing_project_services , 
  ]  
}

Terraform Version

Terraform: 1.3+
Google Provider: >= 6.42.0, < 7
Module Version: v38.0.0

Terraform Provider Versions

terraform {
  required_version = ">= 1.3, < 2.0"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = ">= 6.27.0, < 7"
    }
    google-beta = {  
      source  = "hashicorp/google-beta"  
      version = ">= 4.64, < 7"  
    }  
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.10"
    }
    helm = {
      source  = "hashicorp/helm"
      version = ">= 2.15"
    }
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = ">= 1.19.0"
    }
    random = {
      source  = "hashicorp/random"
      version = ">= 2.1"
    }
    local = {
      source  = "hashicorp/local"
      version = ">= 2.5"
    }
  }
}

# Configure the Google provider  
provider "google" {
  project = var.project_id # i.e TF_VAR_project_id  
  region  = var.region
}

provider "google-beta" {  
  project = var.project_id  
  region  = var.region  
}
# Get access token for authentication
data "google_client_config" "default" {}

provider "random" {}
...

Additional information

Failed Workaround Attempts

1. Import-based Resource Management (doesn't work):

# Attempted to import GKE-created forwarding rules  
resource "google_compute_forwarding_rule" "managed_lb_rules" {  
  for_each = { for rule in local.blocking_rules : rule.name => rule }  
  lifecycle { ignore_changes = all }  
}  
  
import {  
  for_each = { for rule in local.blocking_rules : rule.name => rule }  
  to = google_compute_forwarding_rule.managed_lb_rules[each.key]  
  id = "projects/${local.target_project_id}/regions/${var.region}/forwardingRules/${each.key}"  
}

2. Destroy-time Cleanup (doesn't work):

# Data source that runs after GKE is created
data "google_compute_forwarding_rules" "all_rules" {
  project = local.target_project_id
  region  = var.region
  
#   depends_on = [
#     module.gke
#   ]
}

locals {
  blocking_rules = [
    for rule in data.google_compute_forwarding_rules.all_rules.rules :
    rule if can(regex("/${var.subnetwork_name}$", rule.subnetwork))
  ]
    blocking_map = { for r in local.blocking_rules : r.name => r }
}

resource "null_resource" "cleanup_forwarding_rule" {
  for_each = { for r in local.blocking_rules : r.name => r }

  triggers = {
    project_id = local.target_project_id
    region     = var.region
    rule_name  = each.key
  }

  provisioner "local-exec" {
    when    = destroy
    command = "gcloud compute forwarding-rules delete '${self.triggers.rule_name}' --region='${self.triggers.region}' --project='${self.triggers.project_id}' --quiet"
    on_failure = continue
  }

  depends_on = [module.gke]
  
}

The gcloud command works perfectly outside Terraform but doesn't solve the dependency ordering within Terraform's destroy process.

gcloud compute addresses delete fw-id  --region=us-east1 --project=$project_id

Impact

This bug completely breaks the Infrastructure as Code paradigm. Users are forced into manual intervention, defeating the entire purpose of declarative infrastructure management. No production environment can accept this implementation.

❌ Hidden Circular Dependency Created by GKE:

subnet ← [invisible forwarding rules] ← gke cluster ← subnet
This proves the module creates dependencies outside Terraform's knowledge that break even perfect configurations.

Needed Fix

  • Ensure proper destruction ordering
  • Provide a declarative solution that doesn't require manual intervention

Thank You

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions