Fix hostname parsing and add tests

WalBeh · WalBeh · commit 79cd722f9c91 · 2025-10-14T09:55:21.000+02:00
diff --git a/CHANGES.rst b/CHANGES.rst
@@ -5,6 +5,8 @@ Changelog
 Unreleased
 ----------
 
+* Fix hostname parsing and add tests in dc_util.
+
 2.53.0 (2025-09-25)
 -------------------
 
diff --git a/utils/dc_util/README.md b/utils/dc_util/README.md
@@ -1,16 +1,24 @@
 # Rolling restart with `alter cluster decommission`
 
 While working on a cloud issue a small tool was created
-to not only _terminate_ a POD by`kubelet` sending a SIGTERM, but by having the ability
+to not only _terminate_ a POD by `kubelet` sending a SIGTERM, but by having the ability
 to use a preStop Hook and issue a `alter cluster decommission` for that node.
 
+The cratedb Documentation explains the rolling restart process here: https://cratedb.com/docs/guide/admin/upgrade/rolling.html
+
+Please note that due to the nature of using a preStop Hook, the first stop describe in the
+documentation is omitted, as we would not be able to reliably detect that the shutdown was
+initiated by dc_util. Therefore the _NEW_PRIMARIES_ would not be
+reset!
+
 # What does the tool do?
 
-First the decommission settings are configured for the cluster. We assume that
-we always want to _force_ decommission - in terms of: If cratedb would come to the
-decision that the decommission failed, it would roll it back. In context of terminating
-the POD/process in kubernetes, the shutdown cannot be canceled - therefore _force_ is set
-on cratedb side.
+First the decommission settings are configured for the cluster. By default, _force_
+decommission is enabled - in terms of: If cratedb would come to the decision that the
+decommission failed, it would roll it back. In context of terminating the POD/process
+in kubernetes, the shutdown cannot be canceled - therefore _force_ is typically set on
+cratedb side. However, this can now be controlled via the `dc-util-graceful-stop`
+StatefulSet label or remains true by default.
 
 Before doing that, the STS is checked for the number of replicas configured. This is done
 to figure out whether a FULL stop of all PODS in the cratedb Cluster is _scheduled_. In
@@ -81,7 +89,77 @@ are used for testing purpose:
 | Paramter              | setting |
 | --------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `--crate-node-prefix` | allows to customize the cratedb node names in the statefulset in case it is not the default `data-hot`. This is not to be confused with the _hostname_!                                                                                                         |
-| `--timemout`          | crateDBs default timeout is 7200s - this needs to be correlated to `TerminationGracePeriod`                                                                                                                                                                     |
+| `--timeout`           | CrateDB's default timeout is 7200s - this is automatically adjusted based on `terminationGracePeriodSeconds` (see below)                                                                                                                                       |
 | `--pid`               | For testing locally only                                                                                                                                                                                                                                        |
 | `--hostname`          | Is used to derive the name of the kubernetes statefulset, the _replica number_ of the pod is _stripped_ from it, which returns the sts name. eg. `crate-data-hot-eadf76b5-c634-4f0f-abcc-7442d01cb7dd-0 -> crate-data-hot-eadf76b5-c634-4f0f-abcc-7442d01cb7dd` |
-| `--min-availability`  | Either `PRIMARIES`or `FULL`. Please refer to the crateDB documentation.                                                                                                                                                                                         |
+| `--min-availability`  | Either `PRIMARIES`, `FULL`, or `NONE`. Can be overridden by StatefulSet labels (see below). Please refer to the crateDB documentation.                                                                                                                        |
+
+# Timeout Logic
+
+The tool automatically determines the appropriate decommission timeout based on the StatefulSet's `terminationGracePeriodSeconds`:
+
+- **Default case (30s or nil)**: Uses `--timeout` flag value (30s is too small for CrateDB decommissioning)
+- **Custom terminationGracePeriodSeconds**: Uses `terminationGracePeriodSeconds - 120s` (reserves 120s for shutdown)
+- **Minimum safety**: Always enforces minimum 360s timeout regardless of calculated value
+- **Logging**: Reports when using derived timeout instead of flag timeout
+
+## Real-world scenarios:
+- **Standard deployment** (30s default): Uses `--timeout` flag (e.g., 7200s)
+- **Long-running workload** (1800s): Uses 1680s for decommission, keeps 120s for shutdown
+- **Short custom period** (300s): Uses 360s minimum (logs the adjustment)
+- **Very long period** (3600s): Uses 3480s for decommission
+
+# StatefulSet Label Configuration
+
+The tool can read configuration from StatefulSet labels, overriding CLI parameters:
+
+## Labels:
+- **`dc-util-min-availability`**: Sets min-availability (values: `NONE`, `PRIMARIES`, `FULL`)
+- **`dc-util-graceful-stop`**: Controls graceful stop force setting (values: `true`, `false`)
+
+## Example StatefulSet with labels:
+```yaml
+apiVersion: apps/v1
+kind: StatefulSet
+metadata:
+  name: crate-data-hot
+  labels:
+    dc-util-min-availability: "PRIMARIES"
+    dc-util-graceful-stop: "false"
+spec:
+  # ... rest of StatefulSet spec
+```
+
+## Behavior:
+- **No labels**: Uses CLI parameter values (`--min-availability`, default force=true)
+- **Valid labels**: Uses label values, logs the override
+- **Invalid labels**: Uses CLI defaults, logs the invalid value
+- **Label precedence**: StatefulSet labels override CLI parameters
+
+## Sample Logs
+Please note that you will not be able to see the commands log output! It is run in the backgroud by k8s and is not logged
+to STDOUT where you would expect them.
+
+```
+bash-5.2# ./dc_util-linux-amd64 -min-availability PRIMARIES -timeout 120s
+Decommissioner: 2025/10/09 17:16:46 Using in-cluster configuration
+Decommissioner: 2025/10/09 17:16:46 Parsing hostname: crate-data-hot-d84c10e6-d8fb-4d10-bf60-f9f2ea919a73-2
+Decommissioner: 2025/10/09 17:16:46 Extracted CrateDB node name: data-hot-2
+Decommissioner: 2025/10/09 17:16:46 StatefulSet has 3 replicas configured
+Decommissioner: 2025/10/09 17:16:46 Using min-availability from StatefulSet label: NONE
+Decommissioner: 2025/10/09 17:16:46 Using graceful stop force from StatefulSet label: false
+Decommissioner: 2025/10/09 17:16:46 Using timeout derived from terminationGracePeriodSeconds: 780s (terminationGracePeriodSeconds=900s, buffer=120s) instead of flag value: 120s
+Decommissioner: 2025/10/09 17:16:46 StatefulSet terminationGracePeriodSeconds: 900s
+Decommissioner: 2025/10/09 17:16:48 Decommissioning node data-hot-2 with graceful_stop.timeout of 780s, min_availability=NONE, force=false
+Decommissioner: 2025/10/09 17:16:48 Payload: {"stmt":"set global transient \"cluster.graceful_stop.timeout\" = '780s';"}
+Decommissioner: 2025/10/09 17:16:48 Response from server: {"cols":[],"rows":[[]],"rowcount":1,"duration":24.105846}
+Decommissioner: 2025/10/09 17:16:48 Payload: {"stmt":"set global transient \"cluster.graceful_stop.force\" = false;"}
+Decommissioner: 2025/10/09 17:16:48 Response from server: {"cols":[],"rows":[[]],"rowcount":1,"duration":18.95872}
+Decommissioner: 2025/10/09 17:16:48 Payload: {"stmt":"set global transient \"cluster.graceful_stop.min_availability\"='NONE';"}
+Decommissioner: 2025/10/09 17:16:48 Response from server: {"cols":[],"rows":[[]],"rowcount":1,"duration":13.927663}
+Decommissioner: 2025/10/09 17:16:48 Payload: {"stmt":"alter cluster decommission 'data-hot-2'"}
+Decommissioner: 2025/10/09 17:16:48 Response from server: {"cols":[],"rows":[[]],"rowcount":1,"duration":3.827284}
+Decommissioner: 2025/10/09 17:16:48 Decommission command sent successfully
+Decommissioner: 2025/10/09 17:16:48 Process 1 is still running (check count: 0)
+
+```
diff --git a/utils/dc_util/dc_util.go b/utils/dc_util/dc_util.go
@@ -22,13 +22,148 @@ import (
 )
 
 const (
-	defaultCrateNodePrefix = "data-hot"
-	defaultPID             = 1
-	defaultProto           = "https"
-	defaultMinAvailability = "FULL"
-	defaultTimeout         = "7200s"
+	defaultCrateNodePrefix               = "data-hot"
+	defaultPID                           = 1
+	defaultProto                         = "https"
+	defaultMinAvailability               = "FULL"
+	defaultTimeout                       = "7200s"
+	defaultTerminationGracePeriodSeconds = 30  // Kubernetes default
+	gracePeriodBuffer                    = 120 // seconds to subtract from terminationGracePeriodSeconds
+	minimumTimeout                       = 360 // minimum effective timeout in seconds
+
+	// StatefulSet label keys (using dashes instead of underscores)
+	labelMinAvailability = "dc-util-min-availability"
+	labelGracefulStop    = "dc-util-graceful-stop"
 )
 
+type customError struct{ msg string }
+
+func (e *customError) Error() string { return e.msg }
+
+// ErrMalformedHostname is returned when hostname cannot be parsed
+var ErrMalformedHostname = &customError{"malformed hostname"}
+
+// splitHostname splits hostname by "-"
+func splitHostname(hostname string) []string {
+	return strings.Split(hostname, "-")
+}
+
+// makeDecommissionStmt creates the decommission statement
+func makeDecommissionStmt(nodeName string) string {
+	return fmt.Sprintf("alter cluster decommission '%s'", nodeName)
+}
+
+// extractNodeName extracts the CrateDB node name from hostname
+func extractNodeName(hostname, crateNodePrefix, defaultPrefix string) (string, error) {
+	parts := splitHostname(hostname)
+
+	// If custom prefix provided (not default), use it
+	if crateNodePrefix != defaultPrefix && crateNodePrefix != "" {
+		if len(parts) > 0 {
+			podNumber := parts[len(parts)-1]
+			return crateNodePrefix + "-" + podNumber, nil
+		}
+		return "", ErrMalformedHostname
+	}
+
+	// Extract from hostname if using default prefix
+	// Expected format: crate-<prefix-parts>-<uuid-parts>-<pod-number>
+	// We want: <prefix-parts>-<pod-number>
+	if len(parts) >= 4 && parts[0] == "crate" {
+		podNumber := parts[len(parts)-1]
+
+		// Look for the node prefix pattern after "crate"
+		// Use the provided crateNodePrefix (which equals defaultPrefix in this case)
+		prefixParts := strings.Split(crateNodePrefix, "-")
+
+		// Check if the hostname contains the expected prefix parts after "crate"
+		if len(parts) >= len(prefixParts)+2 { // crate + prefix parts + pod number (minimum)
+			// Extract the prefix parts that match our expected pattern
+			prefixMatches := true
+			for i, expectedPart := range prefixParts {
+				if parts[1+i] != expectedPart {
+					prefixMatches = false
+					break
+				}
+			}
+
+			if prefixMatches {
+				return crateNodePrefix + "-" + podNumber, nil
+			}
+		}
+	}
+
+	return "", ErrMalformedHostname
+}
+
+// calculateEffectiveTimeout determines the timeout to use based on terminationGracePeriodSeconds
+func calculateEffectiveTimeout(flagTimeout string, terminationGracePeriodSeconds *int64) (string, error) {
+	// Parse the flag timeout value
+	flagTimeoutDuration, err := time.ParseDuration(flagTimeout)
+	if err != nil {
+		return "", fmt.Errorf("invalid timeout format: %w", err)
+	}
+	flagTimeoutSeconds := int(flagTimeoutDuration.Seconds())
+
+	// If terminationGracePeriodSeconds is not set or is the default value (30s), use flag timeout
+	// The default 30s is too small for CrateDB decommissioning, so we rely on the flag timeout
+	if terminationGracePeriodSeconds == nil || *terminationGracePeriodSeconds == defaultTerminationGracePeriodSeconds {
+		return flagTimeout, nil
+	}
+
+	// Calculate effective timeout: terminationGracePeriodSeconds - buffer
+	effectiveTimeoutSeconds := int(*terminationGracePeriodSeconds) - gracePeriodBuffer
+
+	// Ensure minimum timeout
+	if effectiveTimeoutSeconds < minimumTimeout {
+		effectiveTimeoutSeconds = minimumTimeout
+		log.Printf("Calculated timeout (%ds) is below minimum, using %ds instead",
+			int(*terminationGracePeriodSeconds)-gracePeriodBuffer, minimumTimeout)
+	}
+
+	// Log when using different timeout than flag
+	if effectiveTimeoutSeconds != flagTimeoutSeconds {
+		log.Printf("Using timeout derived from terminationGracePeriodSeconds: %ds (terminationGracePeriodSeconds=%ds, buffer=%ds) instead of flag value: %ds",
+			effectiveTimeoutSeconds, *terminationGracePeriodSeconds, gracePeriodBuffer, flagTimeoutSeconds)
+	}
+
+	return fmt.Sprintf("%ds", effectiveTimeoutSeconds), nil
+}
+
+// getMinAvailabilityFromLabels reads min-availability from StatefulSet labels or returns default
+func getMinAvailabilityFromLabels(labels map[string]string, defaultValue string) string {
+	if value, exists := labels[labelMinAvailability]; exists {
+		// Validate the value
+		switch value {
+		case "NONE", "FULL", "PRIMARIES":
+			log.Printf("Using min-availability from StatefulSet label: %s", value)
+			return value
+		default:
+			log.Printf("Invalid min-availability value in StatefulSet label '%s': %s, using default: %s",
+				labelMinAvailability, value, defaultValue)
+		}
+	}
+	return defaultValue
+}
+
+// getGracefulStopForceFromLabels reads graceful stop force setting from StatefulSet labels
+func getGracefulStopForceFromLabels(labels map[string]string) bool {
+	if value, exists := labels[labelGracefulStop]; exists {
+		switch value {
+		case "TRUE", "true", "True":
+			log.Printf("Using graceful stop force from StatefulSet label: true")
+			return true
+		case "FALSE", "false", "False":
+			log.Printf("Using graceful stop force from StatefulSet label: false")
+			return false
+		default:
+			log.Printf("Invalid graceful stop value in StatefulSet label '%s': %s, using default: true",
+				labelGracefulStop, value)
+		}
+	}
+	return true // default behavior
+}
+
 func sendSQLStatement(proto, stmt string) error {
 	payload := map[string]string{"stmt": stmt}
 	payloadBytes, err := json.Marshal(payload)
@@ -159,6 +294,14 @@ func run() error {
 	}
 	statefulSetName := strings.Join(hostnameParts[:len(hostnameParts)-1], "-")
 
+	// Determine crateNodeName using extracted logic
+	log.Printf("Parsing hostname: %s", hostname)
+	actualCrateNodeName, err := extractNodeName(hostname, crateNodePrefix, defaultCrateNodePrefix)
+	if err != nil {
+		return fmt.Errorf("failed to extract node name from hostname %s: %w", hostname, err)
+	}
+	log.Printf("Extracted CrateDB node name: %s", actualCrateNodeName)
+
 	ctx := context.Background()
 	statefulSet, err := clientset.AppsV1().StatefulSets(namespace).Get(ctx, statefulSetName, metav1.GetOptions{})
 	if err != nil {
@@ -168,19 +311,34 @@ func run() error {
 
 	log.Printf("StatefulSet has %d replicas configured", replicas)
 
+	// Get configuration from StatefulSet labels
+	effectiveMinAvailability := getMinAvailabilityFromLabels(statefulSet.Labels, minAvailability)
+	gracefulStopForce := getGracefulStopForceFromLabels(statefulSet.Labels)
+
+	// Calculate effective timeout based on terminationGracePeriodSeconds
+	effectiveTimeout, err := calculateEffectiveTimeout(decommissionTimeout, statefulSet.Spec.Template.Spec.TerminationGracePeriodSeconds)
+	if err != nil {
+		return fmt.Errorf("failed to calculate effective timeout: %w", err)
+	}
+
+	if statefulSet.Spec.Template.Spec.TerminationGracePeriodSeconds != nil {
+		log.Printf("StatefulSet terminationGracePeriodSeconds: %ds", *statefulSet.Spec.Template.Spec.TerminationGracePeriodSeconds)
+	} else {
+		log.Printf("StatefulSet terminationGracePeriodSeconds: not set (using Kubernetes default)")
+	}
+
 	time.Sleep(2 * time.Second) // Sleep to catch up with the replica settings
 
 	if replicas > 0 {
-		podNumber := hostnameParts[len(hostnameParts)-1]
-
 		// Send the SQL statements to decommission the node
-		log.Printf("Decommissioning node %s with graceful_stop.timeout of %s", podNumber, decommissionTimeout)
+		log.Printf("Decommissioning node %s with graceful_stop.timeout of %s, min_availability=%s, force=%t",
+			actualCrateNodeName, effectiveTimeout, effectiveMinAvailability, gracefulStopForce)
 
 		statements := []string{
-			fmt.Sprintf(`set global transient "cluster.graceful_stop.timeout" = '%s';`, decommissionTimeout),
-			`set global transient "cluster.graceful_stop.force" = True;`,
-			fmt.Sprintf(`set global transient "cluster.graceful_stop.min_availability"='%s';`, minAvailability),
-			fmt.Sprintf(`alter cluster decommission '%s-%s'`, crateNodePrefix, podNumber),
+			fmt.Sprintf(`set global transient "cluster.graceful_stop.timeout" = '%s';`, effectiveTimeout),
+			fmt.Sprintf(`set global transient "cluster.graceful_stop.force" = %t;`, gracefulStopForce),
+			fmt.Sprintf(`set global transient "cluster.graceful_stop.min_availability"='%s';`, effectiveMinAvailability),
+			makeDecommissionStmt(actualCrateNodeName),
 		}
 
 		for _, stmt := range statements {
diff --git a/utils/dc_util/decommission_test.go b/utils/dc_util/decommission_test.go