Skip to content

Commit 79cd722

Browse files
committed
Fix hostname parsing and add tests
1 parent 60165ac commit 79cd722

File tree

4 files changed

+880
-20
lines changed

4 files changed

+880
-20
lines changed

CHANGES.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ Changelog
55
Unreleased
66
----------
77

8+
* Fix hostname parsing and add tests in dc_util.
9+
810
2.53.0 (2025-09-25)
911
-------------------
1012

utils/dc_util/README.md

Lines changed: 86 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,24 @@
11
# Rolling restart with `alter cluster decommission`
22

33
While working on a cloud issue a small tool was created
4-
to not only _terminate_ a POD by`kubelet` sending a SIGTERM, but by having the ability
4+
to not only _terminate_ a POD by `kubelet` sending a SIGTERM, but by having the ability
55
to use a preStop Hook and issue a `alter cluster decommission` for that node.
66

7+
The cratedb Documentation explains the rolling restart process here: https://cratedb.com/docs/guide/admin/upgrade/rolling.html
8+
9+
Please note that due to the nature of using a preStop Hook, the first stop describe in the
10+
documentation is omitted, as we would not be able to reliably detect that the shutdown was
11+
initiated by dc_util. Therefore the _NEW_PRIMARIES_ would not be
12+
reset!
13+
714
# What does the tool do?
815

9-
First the decommission settings are configured for the cluster. We assume that
10-
we always want to _force_ decommission - in terms of: If cratedb would come to the
11-
decision that the decommission failed, it would roll it back. In context of terminating
12-
the POD/process in kubernetes, the shutdown cannot be canceled - therefore _force_ is set
13-
on cratedb side.
16+
First the decommission settings are configured for the cluster. By default, _force_
17+
decommission is enabled - in terms of: If cratedb would come to the decision that the
18+
decommission failed, it would roll it back. In context of terminating the POD/process
19+
in kubernetes, the shutdown cannot be canceled - therefore _force_ is typically set on
20+
cratedb side. However, this can now be controlled via the `dc-util-graceful-stop`
21+
StatefulSet label or remains true by default.
1422

1523
Before doing that, the STS is checked for the number of replicas configured. This is done
1624
to figure out whether a FULL stop of all PODS in the cratedb Cluster is _scheduled_. In
@@ -81,7 +89,77 @@ are used for testing purpose:
8189
| Paramter | setting |
8290
| --------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
8391
| `--crate-node-prefix` | allows to customize the cratedb node names in the statefulset in case it is not the default `data-hot`. This is not to be confused with the _hostname_! |
84-
| `--timemout` | crateDBs default timeout is 7200s - this needs to be correlated to `TerminationGracePeriod` |
92+
| `--timeout` | CrateDB's default timeout is 7200s - this is automatically adjusted based on `terminationGracePeriodSeconds` (see below) |
8593
| `--pid` | For testing locally only |
8694
| `--hostname` | Is used to derive the name of the kubernetes statefulset, the _replica number_ of the pod is _stripped_ from it, which returns the sts name. eg. `crate-data-hot-eadf76b5-c634-4f0f-abcc-7442d01cb7dd-0 -> crate-data-hot-eadf76b5-c634-4f0f-abcc-7442d01cb7dd` |
87-
| `--min-availability` | Either `PRIMARIES`or `FULL`. Please refer to the crateDB documentation. |
95+
| `--min-availability` | Either `PRIMARIES`, `FULL`, or `NONE`. Can be overridden by StatefulSet labels (see below). Please refer to the crateDB documentation. |
96+
97+
# Timeout Logic
98+
99+
The tool automatically determines the appropriate decommission timeout based on the StatefulSet's `terminationGracePeriodSeconds`:
100+
101+
- **Default case (30s or nil)**: Uses `--timeout` flag value (30s is too small for CrateDB decommissioning)
102+
- **Custom terminationGracePeriodSeconds**: Uses `terminationGracePeriodSeconds - 120s` (reserves 120s for shutdown)
103+
- **Minimum safety**: Always enforces minimum 360s timeout regardless of calculated value
104+
- **Logging**: Reports when using derived timeout instead of flag timeout
105+
106+
## Real-world scenarios:
107+
- **Standard deployment** (30s default): Uses `--timeout` flag (e.g., 7200s)
108+
- **Long-running workload** (1800s): Uses 1680s for decommission, keeps 120s for shutdown
109+
- **Short custom period** (300s): Uses 360s minimum (logs the adjustment)
110+
- **Very long period** (3600s): Uses 3480s for decommission
111+
112+
# StatefulSet Label Configuration
113+
114+
The tool can read configuration from StatefulSet labels, overriding CLI parameters:
115+
116+
## Labels:
117+
- **`dc-util-min-availability`**: Sets min-availability (values: `NONE`, `PRIMARIES`, `FULL`)
118+
- **`dc-util-graceful-stop`**: Controls graceful stop force setting (values: `true`, `false`)
119+
120+
## Example StatefulSet with labels:
121+
```yaml
122+
apiVersion: apps/v1
123+
kind: StatefulSet
124+
metadata:
125+
name: crate-data-hot
126+
labels:
127+
dc-util-min-availability: "PRIMARIES"
128+
dc-util-graceful-stop: "false"
129+
spec:
130+
# ... rest of StatefulSet spec
131+
```
132+
133+
## Behavior:
134+
- **No labels**: Uses CLI parameter values (`--min-availability`, default force=true)
135+
- **Valid labels**: Uses label values, logs the override
136+
- **Invalid labels**: Uses CLI defaults, logs the invalid value
137+
- **Label precedence**: StatefulSet labels override CLI parameters
138+
139+
## Sample Logs
140+
Please note that you will not be able to see the commands log output! It is run in the backgroud by k8s and is not logged
141+
to STDOUT where you would expect them.
142+
143+
```
144+
bash-5.2# ./dc_util-linux-amd64 -min-availability PRIMARIES -timeout 120s
145+
Decommissioner: 2025/10/09 17:16:46 Using in-cluster configuration
146+
Decommissioner: 2025/10/09 17:16:46 Parsing hostname: crate-data-hot-d84c10e6-d8fb-4d10-bf60-f9f2ea919a73-2
147+
Decommissioner: 2025/10/09 17:16:46 Extracted CrateDB node name: data-hot-2
148+
Decommissioner: 2025/10/09 17:16:46 StatefulSet has 3 replicas configured
149+
Decommissioner: 2025/10/09 17:16:46 Using min-availability from StatefulSet label: NONE
150+
Decommissioner: 2025/10/09 17:16:46 Using graceful stop force from StatefulSet label: false
151+
Decommissioner: 2025/10/09 17:16:46 Using timeout derived from terminationGracePeriodSeconds: 780s (terminationGracePeriodSeconds=900s, buffer=120s) instead of flag value: 120s
152+
Decommissioner: 2025/10/09 17:16:46 StatefulSet terminationGracePeriodSeconds: 900s
153+
Decommissioner: 2025/10/09 17:16:48 Decommissioning node data-hot-2 with graceful_stop.timeout of 780s, min_availability=NONE, force=false
154+
Decommissioner: 2025/10/09 17:16:48 Payload: {"stmt":"set global transient \"cluster.graceful_stop.timeout\" = '780s';"}
155+
Decommissioner: 2025/10/09 17:16:48 Response from server: {"cols":[],"rows":[[]],"rowcount":1,"duration":24.105846}
156+
Decommissioner: 2025/10/09 17:16:48 Payload: {"stmt":"set global transient \"cluster.graceful_stop.force\" = false;"}
157+
Decommissioner: 2025/10/09 17:16:48 Response from server: {"cols":[],"rows":[[]],"rowcount":1,"duration":18.95872}
158+
Decommissioner: 2025/10/09 17:16:48 Payload: {"stmt":"set global transient \"cluster.graceful_stop.min_availability\"='NONE';"}
159+
Decommissioner: 2025/10/09 17:16:48 Response from server: {"cols":[],"rows":[[]],"rowcount":1,"duration":13.927663}
160+
Decommissioner: 2025/10/09 17:16:48 Payload: {"stmt":"alter cluster decommission 'data-hot-2'"}
161+
Decommissioner: 2025/10/09 17:16:48 Response from server: {"cols":[],"rows":[[]],"rowcount":1,"duration":3.827284}
162+
Decommissioner: 2025/10/09 17:16:48 Decommission command sent successfully
163+
Decommissioner: 2025/10/09 17:16:48 Process 1 is still running (check count: 0)
164+
165+
```

utils/dc_util/dc_util.go

Lines changed: 170 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,148 @@ import (
2222
)
2323

2424
const (
25-
defaultCrateNodePrefix = "data-hot"
26-
defaultPID = 1
27-
defaultProto = "https"
28-
defaultMinAvailability = "FULL"
29-
defaultTimeout = "7200s"
25+
defaultCrateNodePrefix = "data-hot"
26+
defaultPID = 1
27+
defaultProto = "https"
28+
defaultMinAvailability = "FULL"
29+
defaultTimeout = "7200s"
30+
defaultTerminationGracePeriodSeconds = 30 // Kubernetes default
31+
gracePeriodBuffer = 120 // seconds to subtract from terminationGracePeriodSeconds
32+
minimumTimeout = 360 // minimum effective timeout in seconds
33+
34+
// StatefulSet label keys (using dashes instead of underscores)
35+
labelMinAvailability = "dc-util-min-availability"
36+
labelGracefulStop = "dc-util-graceful-stop"
3037
)
3138

39+
type customError struct{ msg string }
40+
41+
func (e *customError) Error() string { return e.msg }
42+
43+
// ErrMalformedHostname is returned when hostname cannot be parsed
44+
var ErrMalformedHostname = &customError{"malformed hostname"}
45+
46+
// splitHostname splits hostname by "-"
47+
func splitHostname(hostname string) []string {
48+
return strings.Split(hostname, "-")
49+
}
50+
51+
// makeDecommissionStmt creates the decommission statement
52+
func makeDecommissionStmt(nodeName string) string {
53+
return fmt.Sprintf("alter cluster decommission '%s'", nodeName)
54+
}
55+
56+
// extractNodeName extracts the CrateDB node name from hostname
57+
func extractNodeName(hostname, crateNodePrefix, defaultPrefix string) (string, error) {
58+
parts := splitHostname(hostname)
59+
60+
// If custom prefix provided (not default), use it
61+
if crateNodePrefix != defaultPrefix && crateNodePrefix != "" {
62+
if len(parts) > 0 {
63+
podNumber := parts[len(parts)-1]
64+
return crateNodePrefix + "-" + podNumber, nil
65+
}
66+
return "", ErrMalformedHostname
67+
}
68+
69+
// Extract from hostname if using default prefix
70+
// Expected format: crate-<prefix-parts>-<uuid-parts>-<pod-number>
71+
// We want: <prefix-parts>-<pod-number>
72+
if len(parts) >= 4 && parts[0] == "crate" {
73+
podNumber := parts[len(parts)-1]
74+
75+
// Look for the node prefix pattern after "crate"
76+
// Use the provided crateNodePrefix (which equals defaultPrefix in this case)
77+
prefixParts := strings.Split(crateNodePrefix, "-")
78+
79+
// Check if the hostname contains the expected prefix parts after "crate"
80+
if len(parts) >= len(prefixParts)+2 { // crate + prefix parts + pod number (minimum)
81+
// Extract the prefix parts that match our expected pattern
82+
prefixMatches := true
83+
for i, expectedPart := range prefixParts {
84+
if parts[1+i] != expectedPart {
85+
prefixMatches = false
86+
break
87+
}
88+
}
89+
90+
if prefixMatches {
91+
return crateNodePrefix + "-" + podNumber, nil
92+
}
93+
}
94+
}
95+
96+
return "", ErrMalformedHostname
97+
}
98+
99+
// calculateEffectiveTimeout determines the timeout to use based on terminationGracePeriodSeconds
100+
func calculateEffectiveTimeout(flagTimeout string, terminationGracePeriodSeconds *int64) (string, error) {
101+
// Parse the flag timeout value
102+
flagTimeoutDuration, err := time.ParseDuration(flagTimeout)
103+
if err != nil {
104+
return "", fmt.Errorf("invalid timeout format: %w", err)
105+
}
106+
flagTimeoutSeconds := int(flagTimeoutDuration.Seconds())
107+
108+
// If terminationGracePeriodSeconds is not set or is the default value (30s), use flag timeout
109+
// The default 30s is too small for CrateDB decommissioning, so we rely on the flag timeout
110+
if terminationGracePeriodSeconds == nil || *terminationGracePeriodSeconds == defaultTerminationGracePeriodSeconds {
111+
return flagTimeout, nil
112+
}
113+
114+
// Calculate effective timeout: terminationGracePeriodSeconds - buffer
115+
effectiveTimeoutSeconds := int(*terminationGracePeriodSeconds) - gracePeriodBuffer
116+
117+
// Ensure minimum timeout
118+
if effectiveTimeoutSeconds < minimumTimeout {
119+
effectiveTimeoutSeconds = minimumTimeout
120+
log.Printf("Calculated timeout (%ds) is below minimum, using %ds instead",
121+
int(*terminationGracePeriodSeconds)-gracePeriodBuffer, minimumTimeout)
122+
}
123+
124+
// Log when using different timeout than flag
125+
if effectiveTimeoutSeconds != flagTimeoutSeconds {
126+
log.Printf("Using timeout derived from terminationGracePeriodSeconds: %ds (terminationGracePeriodSeconds=%ds, buffer=%ds) instead of flag value: %ds",
127+
effectiveTimeoutSeconds, *terminationGracePeriodSeconds, gracePeriodBuffer, flagTimeoutSeconds)
128+
}
129+
130+
return fmt.Sprintf("%ds", effectiveTimeoutSeconds), nil
131+
}
132+
133+
// getMinAvailabilityFromLabels reads min-availability from StatefulSet labels or returns default
134+
func getMinAvailabilityFromLabels(labels map[string]string, defaultValue string) string {
135+
if value, exists := labels[labelMinAvailability]; exists {
136+
// Validate the value
137+
switch value {
138+
case "NONE", "FULL", "PRIMARIES":
139+
log.Printf("Using min-availability from StatefulSet label: %s", value)
140+
return value
141+
default:
142+
log.Printf("Invalid min-availability value in StatefulSet label '%s': %s, using default: %s",
143+
labelMinAvailability, value, defaultValue)
144+
}
145+
}
146+
return defaultValue
147+
}
148+
149+
// getGracefulStopForceFromLabels reads graceful stop force setting from StatefulSet labels
150+
func getGracefulStopForceFromLabels(labels map[string]string) bool {
151+
if value, exists := labels[labelGracefulStop]; exists {
152+
switch value {
153+
case "TRUE", "true", "True":
154+
log.Printf("Using graceful stop force from StatefulSet label: true")
155+
return true
156+
case "FALSE", "false", "False":
157+
log.Printf("Using graceful stop force from StatefulSet label: false")
158+
return false
159+
default:
160+
log.Printf("Invalid graceful stop value in StatefulSet label '%s': %s, using default: true",
161+
labelGracefulStop, value)
162+
}
163+
}
164+
return true // default behavior
165+
}
166+
32167
func sendSQLStatement(proto, stmt string) error {
33168
payload := map[string]string{"stmt": stmt}
34169
payloadBytes, err := json.Marshal(payload)
@@ -159,6 +294,14 @@ func run() error {
159294
}
160295
statefulSetName := strings.Join(hostnameParts[:len(hostnameParts)-1], "-")
161296

297+
// Determine crateNodeName using extracted logic
298+
log.Printf("Parsing hostname: %s", hostname)
299+
actualCrateNodeName, err := extractNodeName(hostname, crateNodePrefix, defaultCrateNodePrefix)
300+
if err != nil {
301+
return fmt.Errorf("failed to extract node name from hostname %s: %w", hostname, err)
302+
}
303+
log.Printf("Extracted CrateDB node name: %s", actualCrateNodeName)
304+
162305
ctx := context.Background()
163306
statefulSet, err := clientset.AppsV1().StatefulSets(namespace).Get(ctx, statefulSetName, metav1.GetOptions{})
164307
if err != nil {
@@ -168,19 +311,34 @@ func run() error {
168311

169312
log.Printf("StatefulSet has %d replicas configured", replicas)
170313

314+
// Get configuration from StatefulSet labels
315+
effectiveMinAvailability := getMinAvailabilityFromLabels(statefulSet.Labels, minAvailability)
316+
gracefulStopForce := getGracefulStopForceFromLabels(statefulSet.Labels)
317+
318+
// Calculate effective timeout based on terminationGracePeriodSeconds
319+
effectiveTimeout, err := calculateEffectiveTimeout(decommissionTimeout, statefulSet.Spec.Template.Spec.TerminationGracePeriodSeconds)
320+
if err != nil {
321+
return fmt.Errorf("failed to calculate effective timeout: %w", err)
322+
}
323+
324+
if statefulSet.Spec.Template.Spec.TerminationGracePeriodSeconds != nil {
325+
log.Printf("StatefulSet terminationGracePeriodSeconds: %ds", *statefulSet.Spec.Template.Spec.TerminationGracePeriodSeconds)
326+
} else {
327+
log.Printf("StatefulSet terminationGracePeriodSeconds: not set (using Kubernetes default)")
328+
}
329+
171330
time.Sleep(2 * time.Second) // Sleep to catch up with the replica settings
172331

173332
if replicas > 0 {
174-
podNumber := hostnameParts[len(hostnameParts)-1]
175-
176333
// Send the SQL statements to decommission the node
177-
log.Printf("Decommissioning node %s with graceful_stop.timeout of %s", podNumber, decommissionTimeout)
334+
log.Printf("Decommissioning node %s with graceful_stop.timeout of %s, min_availability=%s, force=%t",
335+
actualCrateNodeName, effectiveTimeout, effectiveMinAvailability, gracefulStopForce)
178336

179337
statements := []string{
180-
fmt.Sprintf(`set global transient "cluster.graceful_stop.timeout" = '%s';`, decommissionTimeout),
181-
`set global transient "cluster.graceful_stop.force" = True;`,
182-
fmt.Sprintf(`set global transient "cluster.graceful_stop.min_availability"='%s';`, minAvailability),
183-
fmt.Sprintf(`alter cluster decommission '%s-%s'`, crateNodePrefix, podNumber),
338+
fmt.Sprintf(`set global transient "cluster.graceful_stop.timeout" = '%s';`, effectiveTimeout),
339+
fmt.Sprintf(`set global transient "cluster.graceful_stop.force" = %t;`, gracefulStopForce),
340+
fmt.Sprintf(`set global transient "cluster.graceful_stop.min_availability"='%s';`, effectiveMinAvailability),
341+
makeDecommissionStmt(actualCrateNodeName),
184342
}
185343

186344
for _, stmt := range statements {

0 commit comments

Comments
 (0)