Skip to content

Commit 8660a94

Browse files
authored
chore: Backport Changes from KubeFleet 07/25/2025 (#1161)
2 parents bf18fd8 + 8d5d54b commit 8660a94

File tree

67 files changed

+5038
-2463
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+5038
-2463
lines changed
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# CRP Delete Policy Implementation
2+
3+
## Problem Analysis
4+
5+
The issue requests adding API options to allow customers to choose whether to delete placed resources when a CRP (ClusterResourcePlacement) is deleted.
6+
7+
### Current Deletion Behavior
8+
1. When a CRP is deleted, it has two finalizers:
9+
- `ClusterResourcePlacementCleanupFinalizer` - handled by CRP controller to delete snapshots
10+
- `SchedulerCleanupFinalizer` - handled by scheduler to delete bindings
11+
12+
2. The deletion flow:
13+
- CRP controller removes snapshots (ClusterSchedulingPolicySnapshot, ClusterResourceSnapshot)
14+
- Scheduler removes bindings (ClusterResourceBinding) which triggers cleanup of placed resources
15+
- Currently there's no option for users to control whether placed resources are deleted
16+
17+
### References
18+
- Kubernetes DeleteOptions has `PropagationPolicy` with values: `Orphan`, `Background`, `Foreground`
19+
- AKS API has `DeletePolicy` pattern (couldn't fetch exact details but following similar pattern)
20+
21+
## Implementation Plan
22+
23+
### Phase 1: API Design
24+
- [x] Add `DeleteStrategy` struct to `RolloutStrategy` in beta API
25+
- [x] Define `PropagationPolicy` field with enum values: `Delete` (default), `Abandon`
26+
- [x] Update API documentation and validation
27+
- [x] Move DeleteStrategy inside RolloutStrategy after ApplyStrategy for consistency
28+
29+
### Phase 2: Implementation Details
30+
TODO: @Arvindthiru to fill out the details for controller logic implementation.
31+
32+
### Phase 3: Testing
33+
- [ ] Add unit tests for new deletion policy options
34+
- [ ] Add integration tests to verify behavior
35+
- [ ] Test both `Delete` and `Abandon` scenarios
36+
37+
### Phase 4: Documentation & Examples
38+
- [ ] Update CRD documentation
39+
- [ ] Add example configurations
40+
- [ ] Update any user-facing documentation
41+
42+
## Success Criteria
43+
- [x] CRP API has `deleteStrategy` field with `Delete`/`Abandon` options inside RolloutStrategy
44+
- [x] Default behavior (`Delete`) preserves current functionality
45+
- [ ] `Abandon` policy leaves placed resources intact when CRP is deleted
46+
- [ ] All tests pass including new deletion policy tests
47+
- [x] Changes are minimal and backwards compatible
48+
49+
## Current API Structure
50+
51+
The DeleteStrategy is now part of RolloutStrategy:
52+
53+
```go
54+
type RolloutStrategy struct {
55+
// ... other fields ...
56+
ApplyStrategy *ApplyStrategy `json:"applyStrategy,omitempty"`
57+
DeleteStrategy *DeleteStrategy `json:"deleteStrategy,omitempty"`
58+
}
59+
60+
type DeleteStrategy struct {
61+
PropagationPolicy DeletePropagationPolicy `json:"propagationPolicy,omitempty"`
62+
}
63+
64+
type DeletePropagationPolicy string
65+
66+
const (
67+
DeletePropagationPolicyDelete DeletePropagationPolicy = "Delete" // default
68+
DeletePropagationPolicyAbandon DeletePropagationPolicy = "Abandon"
69+
)
70+
```
71+
72+
## Behavior Summary
73+
74+
- **Default (`Delete`)**: When CRP is deleted, all placed resources are removed from member clusters (current behavior)
75+
- **Abandon**: When CRP is deleted, placed resources remain on member clusters but are no longer managed by Fleet
76+
77+
This provides customers with the flexibility to choose between complete cleanup or leaving resources in place when deleting a CRP.

.github/copilot-instructions.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ The main idea is that we are creating a multi-cluster application management sol
1111
- Follow the [Uber Go Style Guide](https://github.com/uber-go/guide/blob/master/style.md) if possible.
1212
- Favor using the standard library over third-party libraries.
1313
- Run "make reviewable" before submitting a pull request to ensure the code is formatted correctly and all dependencies are up to date.
14+
- The title of a PR must use one of the following prefixes: "[WIP] ", "feat: ", "test: ", "fix: ", "docs: ", "style: ", "interface: ", "util: ", "chore: ", "ci: ", "perf: ", "refactor: ", "revert: ". Please pick one that matches the PR content the most.
1415

1516
## Terminology
1617
- **Fleet**: A conceptual term referring to a collection of clusters.
@@ -216,4 +217,4 @@ This practice creates a trail of decision points that document our thought proce
216217

217218
**User**: This looks good.
218219

219-
**Agent**: I've updated the breadcrumb with the latest understanding.
220+
**Agent**: I've updated the breadcrumb with the latest understanding.

.github/workflows/ci.yml

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,12 +89,16 @@ jobs:
8989
strategy:
9090
fail-fast: false
9191
matrix:
92-
customized-settings: [default, custom]
92+
customized-settings: [default, joinleave, custom]
9393
include:
9494
- customized-settings: default
9595
# to shorten the test duration, set the resource snapshot creation interval to 0
9696
resource-snapshot-creation-minimum-interval: 0m
9797
resource-changes-collection-duration: 0m
98+
- customized-settings: joinleave
99+
# to shorten the test duration, set the resource snapshot creation interval to 0
100+
resource-snapshot-creation-minimum-interval: 0m
101+
resource-changes-collection-duration: 0m
98102
- customized-settings: custom
99103
resource-snapshot-creation-minimum-interval: 30s
100104
resource-changes-collection-duration: 15s
@@ -132,7 +136,9 @@ jobs:
132136
- name: Run e2e tests
133137
run: |
134138
if [ "${{ matrix.customized-settings }}" = "default" ]; then
135-
make e2e-tests
139+
make e2e-tests LABEL_FILTER="!custom && !joinleave"
140+
elif [ "${{ matrix.customized-settings }}" = "joinleave" ]; then
141+
make e2e-tests LABEL_FILTER="!custom && joinleave"
136142
else
137143
make e2e-tests-custom
138144
fi

.github/workflows/codespell.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ jobs:
1212
runs-on: ubuntu-latest
1313
steps:
1414
- name: Harden Runner
15-
uses: step-security/harden-runner@6c439dc8bdf85cadbbce9ed30d1c7b959517bc49 # v2.12.2
15+
uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
1616
with:
1717
egress-policy: audit
1818

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,8 @@ make setup-clusters
6060

6161
# Run E2E tests with custom cluster count
6262
make setup-clusters MEMBER_CLUSTER_COUNT=5
63+
64+
# Run parallel E2E tests (default - excludes custom tests)
6365
make e2e-tests
6466

6567
# Clean up test clusters

Makefile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -215,9 +215,12 @@ install-helm: load-hub-docker-image load-member-docker-image install-member-agen
215215
.PHONY: e2e-tests-v1alpha1
216216
e2e-tests-v1alpha1: create-kind-cluster run-e2e-v1alpha1
217217

218+
# E2E test label filter (can be overridden)
219+
LABEL_FILTER ?= !custom
220+
218221
.PHONY: e2e-tests
219222
e2e-tests: setup-clusters
220-
cd ./test/e2e && ginkgo --timeout=70m --label-filter="!custom" -v -p .
223+
cd ./test/e2e && ginkgo --timeout=70m --label-filter="$(LABEL_FILTER)" -v -p .
221224

222225
e2e-tests-custom: setup-clusters
223226
cd ./test/e2e && ginkgo --label-filter="custom" -v -p .

apis/placement/v1alpha1/common.go

Lines changed: 0 additions & 29 deletions
This file was deleted.

apis/placement/v1alpha1/overridesnapshot_types.go

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -18,22 +18,6 @@ package v1alpha1
1818

1919
import metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
2020

21-
const (
22-
23-
// OverrideIndexLabel is the label that indicate the policy snapshot index of a cluster policy.
24-
OverrideIndexLabel = fleetPrefix + "override-index"
25-
26-
// OverrideSnapshotNameFmt is clusterResourceOverrideSnapshot name format: {CROName}-{OverrideSnapshotIndex}.
27-
OverrideSnapshotNameFmt = "%s-%d"
28-
29-
// OverrideTrackingLabel is the label that points to the cluster resource override that creates a resource snapshot.
30-
OverrideTrackingLabel = fleetPrefix + "parent-resource-override"
31-
32-
// OverrideFinalizer is a finalizer added by the override controllers to all override, to make sure
33-
// that the override controller can react to override deletions if necessary.
34-
OverrideFinalizer = fleetPrefix + "override-cleanup"
35-
)
36-
3721
// +genclient
3822
// +genclient:nonNamespaced
3923
// +kubebuilder:object:root=true

apis/placement/v1beta1/clusterresourceplacement_types.go

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -488,6 +488,10 @@ type RolloutStrategy struct {
488488
// ApplyStrategy describes when and how to apply the selected resources to the target cluster.
489489
// +kubebuilder:validation:Optional
490490
ApplyStrategy *ApplyStrategy `json:"applyStrategy,omitempty"`
491+
492+
// DeleteStrategy configures the deletion behavior when the ClusterResourcePlacement is deleted.
493+
// +kubebuilder:validation:Optional
494+
DeleteStrategy *DeleteStrategy `json:"deleteStrategy,omitempty"`
491495
}
492496

493497
// ApplyStrategy describes when and how to apply the selected resource to the target cluster.
@@ -1319,6 +1323,38 @@ const (
13191323
PickFixedPlacementType PlacementType = "PickFixed"
13201324
)
13211325

1326+
// DeleteStrategy configures the deletion behavior when a placement is deleted.
1327+
type DeleteStrategy struct {
1328+
// PropagationPolicy controls whether to delete placed resources when placement is deleted.
1329+
//
1330+
// Available options:
1331+
//
1332+
// * Delete: all placed resources on member clusters will be deleted when
1333+
// the placement is deleted. This is the default behavior.
1334+
//
1335+
// * Abandon: all placed resources on member clusters will be left intact (abandoned)
1336+
// when the placement is deleted.
1337+
//
1338+
// +kubebuilder:validation:Enum=Abandon;Delete
1339+
// +kubebuilder:default=Delete
1340+
// +kubebuilder:validation:Optional
1341+
PropagationPolicy DeletePropagationPolicy `json:"propagationPolicy,omitempty"`
1342+
}
1343+
1344+
// DeletePropagationPolicy identifies the propagation policy when a placement is deleted.
1345+
// +enum
1346+
type DeletePropagationPolicy string
1347+
1348+
const (
1349+
// DeletePropagationPolicyAbandon instructs Fleet to leave (abandon) all placed resources on member
1350+
// clusters when the placement is deleted.
1351+
DeletePropagationPolicyAbandon DeletePropagationPolicy = "Abandon"
1352+
1353+
// DeletePropagationPolicyDelete instructs Fleet to delete all placed resources on member clusters
1354+
// when the placement is deleted. This is the default behavior.
1355+
DeletePropagationPolicyDelete DeletePropagationPolicy = "Delete"
1356+
)
1357+
13221358
// ClusterResourcePlacementList contains a list of ClusterResourcePlacement.
13231359
// +kubebuilder:resource:scope="Cluster"
13241360
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

apis/placement/v1beta1/commons.go

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,32 @@ const (
148148
ApprovalTaskNameFmt = "%s-%s"
149149
)
150150

151+
var (
152+
// ClusterResourceOverrideKind is the kind of the ClusterResourceOverride.
153+
ClusterResourceOverrideKind = "ClusterResourceOverride"
154+
155+
// ClusterResourceOverrideSnapshotKind is the kind of the ClusterResourceOverrideSnapshot.
156+
ClusterResourceOverrideSnapshotKind = "ClusterResourceOverrideSnapshot"
157+
158+
// ResourceOverrideKind is the kind of the ResourceOverride.
159+
ResourceOverrideKind = "ResourceOverride"
160+
161+
// ResourceOverrideSnapshotKind is the kind of the ResourceOverrideSnapshot.
162+
ResourceOverrideSnapshotKind = "ResourceOverrideSnapshot"
163+
164+
// OverrideClusterNameVariable is the reserved variable in the override value that will be replaced by the actual cluster name.
165+
OverrideClusterNameVariable = "${MEMBER-CLUSTER-NAME}"
166+
167+
// OverrideClusterLabelKeyVariablePrefix is a reserved variable in the override expression.
168+
// We use this variable to find the associated the key following the prefix.
169+
// The key name ends with a "}" character (but not include it).
170+
// The key name must be a valid Kubernetes label name and case-sensitive.
171+
// The content of the string containing this variable will be replaced by the actual label value on the member cluster.
172+
// For example, if the string is "${MEMBER-CLUSTER-LABEL-KEY-kube-fleet.io/region}" then the key name is "kube-fleet.io/region".
173+
// If there is a label "kube-fleet.io/region": "us-west-1" on the member cluster, this string will be replaced by "us-west-1".
174+
OverrideClusterLabelKeyVariablePrefix = "${MEMBER-CLUSTER-LABEL-KEY-"
175+
)
176+
151177
// NamespacedName comprises a resource name, with a mandatory namespace.
152178
type NamespacedName struct {
153179
// Name is the name of the namespaced scope resource.

0 commit comments

Comments
 (0)