Skip to content

Conversation

@srikxcipher
Copy link
Contributor

@srikxcipher srikxcipher commented Dec 22, 2025

Env Destroy Link -> ✅

https://1155708878.control-plane-saas-cp-saas-dev.console.facets.cloud/capc/stack/azure-project-type-1155708878/releases/cluster/68c7e87ab24e7c36bf2c1e99/dialog/release-details/6949898bf2a09b6a794578c1

Problem Statement

The KubeBlocks operator and CRD modules were experiencing critical failures during environment destruction, causing:

  • Finalizer processing errors
  • Resource retention after Helm uninstall
  • Server errors preventing clean deletion

Error Logs

Issue 1: Finalizer and Custom Resource Deletion Errors

╷
│ Error: Error deleting resource parametersdefinitions.parameters.kubeblocks.io:
│ the server is currently unable to handle the request
│
│ Error: Error deleting resource clusters.apps.kubeblocks.io:
│ the server is currently unable to handle the request
│
│ Error: Error waiting for deletion.
│ Error when waiting for resource "instancesets.workloads.kubeblocks.io" to be deleted:
│ the server is currently unable to handle the request
╵

Issue 2: Resource Retention After Helm Uninstall

╷
│ Warning: Helm uninstall returned an information message
│
│ These resources were kept due to the resource policy:
│ [ConfigMap] postgresql12-configuration-1.0.1
│ [ComponentDefinition] postgresql-12-1.0.1
│ [ParamConfigRenderer] postgresql12-pcr-1.0.1
│ [ParametersDefinition] postgresql12-pd-1.0.1
│ (+ 50+ more resources)
╵

CRD stuck

kubectl get crd -l app.kubernetes.io/name=kubeblocks \
  -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.finalizers}{"\n"}{end}'
  
  
componentdefinitions.apps.kubeblocks.io	["customresourcecleanup.apiextensions.k8s.io"]
paramconfigrenderers.parameters.kubeblocks.io	["customresourcecleanup.apiextensions.k8s.io"]
parametersdefinitions.parameters.kubeblocks.io	["customresourcecleanup.apiextensions.k8s.io"]
shardingdefinitions.apps.kubeblocks.io	["customresourcecleanup.apiextensions.k8s.io"]

Add-on CRs

kubectl get componentdefinitions.apps.kubeblocks.io --all-namespaces
kubectl get paramconfigrenderers.parameters.kubeblocks.io --all-namespaces
kubectl get parametersdefinitions.parameters.kubeblocks.io --all-namespaces
kubectl get shardingdefinitions.apps.kubeblocks.io --all-namespaces

NAME                        SERVICE            SERVICE-VERSION   STATUS      AGE
kafka-broker-1.0.1          kafka              3.3.2             Available   4d23h
kafka-combine-1.0.1         kafka              3.3.2             Available   4d23h
kafka-controller-1.0.1      kafka-controller   3.3.2             Available   4d23h
kafka-exporter-1.0.1        kafka-exporter     1.6.0             Available   4d23h
kafka27-broker-1.0.1        kafka              2.7.0             Available   4d23h
minio-1.0.1                 minio              2024.6.29         Available   4d23h
mongo-config-server-1.0.1   mongodb            8.0.8             Available   4d23h
mongo-mongos-1.0.1          mongodb            8.0.8             Available   4d23h
mongo-shard-1.0.1           mongodb            8.0.8             Available   4d23h
mongodb-1.0.1               mongodb                              Available   4d23h
mysql-5.7-1.0.1             mysql              5.7.44            Available   4d23h
mysql-8.0-1.0.1             mysql              8.0.33            Available   4d23h
mysql-8.4-1.0.1             mysql              8.4.2             Available   4d23h
mysql-mgr-8.0-1.0.1         mysql              8.0.33            Available   4d23h
mysql-mgr-8.4-1.0.1         mysql              8.4.2             Available   4d23h
mysql-orc-5.7-1.0.1         mysql              5.7.44            Available   4d23h
mysql-orc-8.0-1.0.1         mysql              8.0.33            Available   4d23h
postgresql-12-1.0.1         postgresql         12.22.0           Available   4d23h
postgresql-14-1.0.1         postgresql         14.18.0           Available   4d23h
postgresql-15-1.0.1         postgresql         15.13.0           Available   4d23h
postgresql-16-1.0.1         postgresql         16.9.0            Available   4d23h
postgresql-17-1.0.1         postgresql         17.5.0            Available   4d23h
proxysql-mysql-1.0.1        proxysql           2.4.4             Available   4d23h
redis-5-1.0.1               redis              5.0.12            Available   4d23h
redis-6-1.0.1               redis              6.2.17            Available   4d23h
redis-7-1.0.1               redis              7.2.10            Available   4d23h
redis-8-1.0.1               redis              8.2.1             Available   4d23h
redis-cluster-5-1.0.1       redis-cluster      5.0.12            Available   4d23h
redis-cluster-6-1.0.1       redis-cluster      6.2.17            Available   4d23h
redis-cluster-7-1.0.1       redis-cluster      7.2.10            Available   4d23h
redis-cluster-8-1.0.1       redis-cluster      8.2.1             Available   4d23h
redis-sentinel-5-1.0.1      redis-sentinel     5.0.12            Available   4d23h
redis-sentinel-6-1.0.1      redis-sentinel     6.2.17            Available   4d23h
redis-sentinel-7-1.0.1      redis-sentinel     7.2.10            Available   4d23h
redis-sentinel-8-1.0.1      redis-sentinel     8.2.1             Available   4d23h
redis-twemproxy-0.5-1.0.1   redis-twemproxy    0.5.0             Available   4d23h
NAME                       COMPD                   PHASE       AGE
mysql-5.7-orc-pcr-1.0.1    mysql-orc-5.7-1.0.1     Available   4d23h
mysql-5.7-pcr-1.0.1        mysql-5.7-1.0.1         Available   4d23h
mysql-8.0-mgr-pcr-1.0.1    mysql-mgr-8.0-1.0.1     Available   4d23h
mysql-8.0-orc-pcr-1.0.1    mysql-orc-8.0-1.0.1     Available   4d23h
mysql-8.0-pcr-1.0.1        mysql-8.0-1.0.1         Available   4d23h
mysql-8.4-mgr-pcr-1.0.1    mysql-mgr-8.4-1.0.1     Available   4d23h
mysql-8.4-pcr-1.0.1        mysql-8.4-1.0.1         Available   4d23h
postgresql12-pcr-1.0.1     postgresql-12-1.0.1     Available   4d23h
postgresql14-pcr-1.0.1     postgresql-14-1.0.1     Available   4d23h
postgresql15-pcr-1.0.1     postgresql-15-1.0.1     Available   4d23h
postgresql16-pcr-1.0.1     postgresql-16-1.0.1     Available   4d23h
postgresql17-pcr-1.0.1     postgresql-17-1.0.1     Available   4d23h
redis-cluster5-pcr-1.0.1   redis-cluster-5-1.0.1   Available   4d23h
redis-cluster6-pcr-1.0.1   redis-cluster-6-1.0.1   Available   4d23h
redis-cluster7-pcr-1.0.1   redis-cluster-7-1.0.1   Available   4d23h
redis-cluster8-pcr-1.0.1   redis-cluster-8-1.0.1   Available   4d23h
redis5-pcr-1.0.1           redis-5-1.0.1           Available   4d23h
redis6-pcr-1.0.1           redis-6-1.0.1           Available   4d23h
redis7-pcr-1.0.1           redis-7-1.0.1           Available   4d23h
redis8-pcr-1.0.1           redis-8-1.0.1           Available   4d23h
NAME                    FILE              PHASE       AGE
postgresql12-pd-1.0.1   postgresql.conf   Available   4d23h
postgresql14-pd-1.0.1   postgresql.conf   Available   4d23h
postgresql15-pd-1.0.1   postgresql.conf   Available   4d23h
postgresql16-pd-1.0.1   postgresql.conf   Available   4d23h
postgresql17-pd-1.0.1   postgresql.conf   Available   4d23h
NAME                TEMPLATE            STATUS      AGE
mongo-shard-1.0.1   mongo-shard-1.0.1   Available   4d23h

Manual Patch - for testing ...

kubectl get crd -l app.kubernetes.io/name=kubeblocks \
  -o jsonpath='{range .items[?(@.metadata.finalizers)]}{.metadata.name}{"\n"}{end}' | \
xargs -I {} kubectl patch crd {} --type merge -p '{"metadata":{"finalizers":[]}}'
customresourcedefinition.apiextensions.k8s.io/componentdefinitions.apps.kubeblocks.io patched
customresourcedefinition.apiextensions.k8s.io/paramconfigrenderers.parameters.kubeblocks.io patched
customresourcedefinition.apiextensions.k8s.io/parametersdefinitions.parameters.kubeblocks.io patched
customresourcedefinition.apiextensions.k8s.io/shardingdefinitions.apps.kubeblocks.io patched

kubectl get crd | grep kubeblocks

# No Resouces !

DESTROY Release error -

https://1155708878.control-plane-saas-cp-saas-dev.console.facets.cloud/capc/stack/azure-project-type-1155708878/releases/cluster/68c7e87ab24e7c36bf2c1e99/dialog/release-details/694951680c18a362176b27cf/terminal-dialog/terraform-logs?backUrl=%2Fstack%2Fazure-project-type-1155708878%2Freleases%2Fcluster%2F68c7e87ab24e7c36bf2c1e99

Root Cause Analysis

1. Resource Retention Policy

Database addon Helm charts had helm.sh/resource-policy: keep annotations on resources, preventing deletion during Helm uninstall. This blocked CRD deletion as custom resources still existed.

Reference: KubeBlocks Addon Documentation

2. Deletion Order Race Condition

During destruction, the operator (which handles webhooks and finalizers) was destroyed before custom resources finished cleaning up:

Destruction Flow (BROKEN):
1. Database addons destroyed
2. KubeBlocks operator destroyed ← Webhooks/controllers gone!
3. Custom resources try to delete ← Fail! No operator to process finalizers
4. CRDs try to delete ← Fail! Custom resources still exist

Reference: GitHub Issue #1528

Solution Overview

Three-Pronged Approach

  1. Disable Resource Retention - Set extra.keepResource = false on database addons
  2. Add Destruction Delays - Use time_sleep with destroy_duration to ensure proper cleanup timing
  3. Suppress Verbose Output - Use ignore_changes on manifest attributes

Detailed Changes

Change 1: Disable Resource Retention in Database Addons

File: modules/common/kubeblocks-operator/standard/1.0/main.tf

Before:

resource "helm_release" "database_addons" {
  for_each = local.enabled_addons

  name       = "kb-addon-${each.value.chart_name}"
  repository = each.value.repo
  chart      = each.value.chart_name

  atomic          = true
  cleanup_on_fail = true

  depends_on = [
    helm_release.kubeblocks,
    time_sleep.wait_for_kubeblocks
  ]
}

After:

resource "helm_release" "database_addons" {
  for_each = local.enabled_addons

  name       = "kb-addon-${each.value.chart_name}"
  repository = each.value.repo
  chart      = each.value.chart_name

  atomic          = true
  cleanup_on_fail = true

  # CRITICAL: Disable resource retention policy to allow clean deletion
  # This removes 'helm.sh/resource-policy: keep' annotation from ComponentDefinitions, ConfigMaps, etc.
  # Without this, resources are kept after Helm uninstall, blocking CRD deletion
  values = [
    yamlencode({
      extra = {
        keepResource = false
      }
    })
  ]

  depends_on = [
    helm_release.kubeblocks,
    time_sleep.wait_for_kubeblocks
  ]
}

Impact: ✅ ComponentDefinitions, ConfigMaps, and other addon resources are now deleted during Helm uninstall


Change 2: Add Destruction Delay in Operator Module

File: modules/common/kubeblocks-operator/standard/1.0/main.tf

Added:

# Time sleep resource to ensure proper cleanup during destroy
# This gives custom resources time to be deleted before operator is removed
resource "time_sleep" "wait_for_cleanup" {
  # Sleep BEFORE destroying the operator to allow custom resources to clean up
  destroy_duration = "120s"

  depends_on = [
    helm_release.database_addons
  ]
}

Impact: ✅ Creates a 120-second buffer during destruction for custom resources to clean up while operator is still running


Change 3: Add Destruction Delay in CRD Module

File: modules/common/kubeblocks-crd/standard/1.0/main.tf

Added:

# Time sleep resource to ensure proper cleanup during destroy
# This gives extra time for any remaining custom resources to be deleted before CRDs are removed
resource "time_sleep" "wait_for_cleanup" {
  destroy_duration = "120s"
}

Impact: ✅ Creates an additional 120-second buffer before CRD deletion


Key Changes:

  • ✅ Added metadata.finalizers to computed_fields (don't try to manage finalizers)
  • ✅ Added metadata.uid to computed_fields (Kubernetes-managed field)
  • ✅ Added ignore_changes = [manifest, object] to prevent massive diffs in plan output
  • ✅ Changed for_each to use CRD name as key (stable identifier)

Change 4: Improve CRD Keying Strategy

File: modules/common/kubeblocks-crd/standard/1.0/main.tf

Before:

locals {
  crds_yaml = data.http.kubeblocks_crds.response_body
  crd_documents = [for doc in split("\n---\n", local.crds_yaml) : trimspace(doc) if trimspace(doc) != ""]
  crds_count = length(local.crd_documents)
}

resource "kubernetes_manifest" "kubeblocks_crds" {
  for_each = { for idx, doc in local.crd_documents : idx => doc }
  manifest = sensitive(yamldecode(each.value))
}

After:

locals {
  crds_yaml = data.http.kubeblocks_crds.response_body

  crd_documents = [for doc in split("\n---\n", local.crds_yaml) : yamldecode(doc) if trimspace(doc) != ""]

  # Key by CRD metadata.name (stable & unique)
  crds = {
    for crd in local.crd_documents :
    crd.metadata.name => crd
  }

  crds_count = length(local.crds)
}

resource "kubernetes_manifest" "kubeblocks_crds" {
  for_each = local.crds
  manifest = sensitive(each.value)
}

Impact:

  • ✅ Uses CRD name as resource key instead of index (more stable)
  • ✅ Pre-decodes YAML in locals for cleaner resource definition

How The Solution Works

Corrected Destruction Flow

┌─────────────────────────────────────────────────────────────────┐
│                    OPERATOR MODULE DESTRUCTION                   │
├─────────────────────────────────────────────────────────────────┤
│ 1. time_sleep.wait_for_cleanup starts destroying                │
│    → Sleeps for 120 seconds                                     │
│    → Operator still running, processing finalizers              │
│                                                                  │
│ 2. Database addons destroyed                                    │
│    → extra.keepResource = false ensures clean removal           │
│    → ComponentDefinitions, ConfigMaps deleted                   │
│                                                                  │
│ 3. KubeBlocks operator destroyed                                │
│    → Webhooks and controllers removed                           │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                      CRD MODULE DESTRUCTION                      │
├─────────────────────────────────────────────────────────────────┤
│ 4. time_sleep.wait_for_cleanup starts destroying                │
│    → Sleeps for 120 seconds                                     │
│    → Additional buffer for stragglers                           │
│                                                                  │
│ 5. CRDs destroyed                                                │
│    → All custom resources already gone                          │
│    → Clean deletion without errors                              │
└─────────────────────────────────────────────────────────────────┘

Total Cleanup Time: 240 seconds (4 minutes)

Key Mechanisms

1. destroy_duration Timing

The time_sleep resource with destroy_duration creates a delay before the resource itself is destroyed:

resource "time_sleep" "wait_for_cleanup" {
  destroy_duration = "120s"
  depends_on = [helm_release.database_addons]
}

During destruction:

  1. Terraform starts destroying time_sleep
  2. time_sleep sleeps for 120 seconds
  3. After sleep completes, time_sleep is removed
  4. Resources waiting on time_sleep can now be destroyed

2. Resource Retention Control

The extra.keepResource = false parameter tells KubeBlocks addon charts to remove the helm.sh/resource-policy: keep annotation:

values = [
  yamlencode({
    extra = {
      keepResource = false
    }
  })
]

Without this: Helm uninstall keeps resources → CRDs can't delete → errors
With this: Helm uninstall removes resources → CRDs can delete cleanly → success

3. Plan Output Suppression

The ignore_changes lifecycle rule prevents Terraform from showing massive diffs:

lifecycle {
  ignore_changes = [
    manifest,  # Don't show manifest changes
    object     # Don't show object attribute changes
  ]
}

Impact:

  • First apply: Resources created (some output expected)
  • Subsequent applies: No manifest diffs shown (CRDs are immutable anyway)
  • Browser remains responsive during plan/apply

Testing Results

Before Fix

❌ Error: Error waiting for deletion
❌ Server is currently unable to handle the request
❌ Resources kept due to resource policy (50+ resources)
❌ Required manual cleanup or multiple destroy attempts
❌ Browser hangs on plan output

After Fix

✅ Clean destruction on first attempt
✅ All resources deleted properly
✅ No finalizer errors
✅ No resource retention warnings
✅ Browser remains responsive
✅ Total destruction time: ~4-5 minutes (expected with 240s of delays)

Sample Successful Destruction Log

module.kubeblocks-operator.time_sleep.wait_for_cleanup: Destroying...
module.kubeblocks-operator.time_sleep.wait_for_cleanup: Still destroying... [2m0s elapsed]
module.kubeblocks-operator.time_sleep.wait_for_cleanup: Destruction complete after 2m0s

module.kubeblocks-operator.helm_release.database_addons["mongodb"]: Destroying...
module.kubeblocks-operator.helm_release.database_addons["mongodb"]: Destruction complete after 37s

module.kubeblocks-operator.helm_release.kubeblocks: Destroying...
module.kubeblocks-operator.helm_release.kubeblocks: Destruction complete after 42s

module.kubeblocks-crd.time_sleep.wait_for_cleanup: Destroying...
module.kubeblocks-crd.time_sleep.wait_for_cleanup: Destruction complete after 2m0s

module.kubeblocks-crd.kubernetes_manifest.kubeblocks_crds: Destroying...
module.kubeblocks-crd.kubernetes_manifest.kubeblocks_crds: Destruction complete after 6s

✅ Destroy complete! Resources: 45 destroyed.

Breaking Changes

⚠️ Destruction Time Impact

  • Destruction now takes an additional 240 seconds (4 minutes) due to cleanup delays
  • This is intentional and necessary for reliable cleanup
  • Users should expect longer destroy times

References

  1. KubeBlocks Addon Documentation - Resource retention control
  2. GitHub Issue #1528 - Verbose output problem
  3. Terraform time_sleep Resource - Destruction delays

Files Changed

  • modules/common/kubeblocks-operator/standard/1.0/main.tf
  • modules/common/kubeblocks-crd/standard/1.0/main.tf

Checklist

  • Root cause identified and documented
  • Solution implemented and tested
  • Error logs captured
  • Code changes reviewed
  • Documentation updated
  • Destruction tested successfully
  • No circular dependencies
  • Browser responsiveness verified

@srikxcipher srikxcipher force-pushed the kubeblocks-destroy-fix branch from ecbc201 to fe58cf8 Compare December 22, 2025 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants