-
Notifications
You must be signed in to change notification settings - Fork 136
[platform] Add flux-plunger controller to fix HelmRelease errors #1843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds a new FluxPlunger controller binary and Helm chart with platform packaging. The controller reconciles Flux HelmRelease resources to remediate "has no deployed releases" errors by suspending, deleting the latest Helm release secret, annotating processed version, and unsuspending; includes manager entrypoint, RBAC, manifests, and image build. Changes
Sequence Diagram(s)sequenceDiagram
participant K8s as Kubernetes API
participant Ctrl as FluxPlunger Controller
participant Secrets as Helm Release Secrets
participant HR as HelmRelease Resource
K8s->>Ctrl: HelmRelease event (error present or suspended)
Ctrl->>K8s: Get HelmRelease
alt HelmRelease is suspended
Ctrl->>Secrets: List release secrets for HelmRelease
Ctrl->>Ctrl: Determine latest secret version
Ctrl->>HR: Read `last-processed-version` annotation
alt annotation == latest version
Ctrl->>K8s: Patch HelmRelease to unsuspend
end
else HelmRelease shows "has no deployed releases"
Ctrl->>Secrets: List release secrets for HelmRelease
Ctrl->>Ctrl: Identify latest secret/version
Ctrl->>K8s: Patch HelmRelease to suspend
Ctrl->>Secrets: Delete latest secret
Ctrl->>K8s: Patch HelmRelease to update `last-processed-version` annotation
Ctrl->>K8s: Patch HelmRelease to unsuspend
end
K8s->>Ctrl: Reconciliation finished
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @kvaps, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new flux-plunger controller to automatically handle a specific HelmRelease error state. The implementation is well-structured, but I've found a critical logical flaw in the reconciliation loop that could lead to a deadlock, preventing the controller from functioning correctly under certain failure conditions. I've also provided several suggestions to improve robustness, efficiency, and adherence to Kubernetes best practices, such as enabling leader election, adding health probes to the deployment, and optimizing some of the Go functions. Addressing these points will make the new controller more resilient and maintainable.
| // If not previous to processed, skip all actions | ||
| logger.Info("HelmRelease is suspended by external process, skipping", "latest", latestVersion) | ||
| return ctrl.Result{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a logical flaw here that can lead to a deadlock. If the controller suspends a HelmRelease but then fails on a subsequent step (e.g., deleting the secret), the next reconciliation will find a suspended HelmRelease. This code block will then incorrectly assume it was suspended by an external process and skip it, preventing the error from ever being fixed. To fix this, you should remove these lines and allow the reconciliation to fall through to the main logic block, which can handle this case correctly.
| ports: | ||
| - name: metrics | ||
| containerPort: 8080 | ||
| - name: health | ||
| containerPort: 8081 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The controller exposes health check endpoints (/healthz and /readyz), but the deployment manifest is missing liveness and readiness probes. Without these probes, Kubernetes cannot determine the health of the pod or restart it if it becomes unresponsive. Please add livenessProbe and readinessProbe to the container spec.
ports:
- name: metrics
containerPort: 8080
- name: health
containerPort: 8081
livenessProbe:
httpGet:
path: /healthz
port: health
readinessProbe:
httpGet:
path: /readyz
port: health| flag.BoolVar(&enableLeaderElection, "leader-elect", false, | ||
| "Enable leader election for controller manager. "+ | ||
| "Enabling this will ensure there is only one active controller manager.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default for leader election is false. For a controller, it's safer to enable leader election by default to prevent multiple instances from acting on the same resources simultaneously if the deployment is ever scaled beyond one replica. Please consider changing the default to true.
| flag.BoolVar(&enableLeaderElection, "leader-elect", false, | |
| "Enable leader election for controller manager. "+ | |
| "Enabling this will ensure there is only one active controller manager.") | |
| flag.BoolVar(&enableLeaderElection, "leader-elect", true, | |
| "Enable leader election for controller manager. "+ | |
| "Enabling this will ensure there is only one active controller manager.") |
| filtered := []corev1.Secret{} | ||
| for _, secret := range secretList.Items { | ||
| if secret.Type == "helm.sh/release.v1" { | ||
| filtered = append(filtered, secret) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop can be made more efficient. The for...range on a slice of structs creates a copy of each corev1.Secret on every iteration. Also, appending to a slice without pre-allocating capacity can lead to multiple reallocations. A more efficient pattern is to pre-allocate the slice and iterate by index to avoid copying.
| filtered := []corev1.Secret{} | |
| for _, secret := range secretList.Items { | |
| if secret.Type == "helm.sh/release.v1" { | |
| filtered = append(filtered, secret) | |
| } | |
| } | |
| filtered := make([]corev1.Secret, 0, len(secretList.Items)) | |
| for i := range secretList.Items { | |
| if secretList.Items[i].Type == "helm.sh/release.v1" { | |
| filtered = append(filtered, secretList.Items[i]) | |
| } | |
| } |
| func getLatestSecret(secrets []corev1.Secret) corev1.Secret { | ||
| if len(secrets) == 1 { | ||
| return secrets[0] | ||
| } | ||
|
|
||
| sort.Slice(secrets, func(i, j int) bool { | ||
| vi := extractVersionNumber(secrets[i].Name) | ||
| vj := extractVersionNumber(secrets[j].Name) | ||
| return vi > vj | ||
| }) | ||
|
|
||
| return secrets[0] | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function modifies the input secrets slice by sorting it in place, which can be an unexpected side effect for callers. Additionally, sorting is an O(n log n) operation. You can achieve the same result more efficiently with a single O(n) pass to find the secret with the highest version, without modifying the original slice.
func getLatestSecret(secrets []corev1.Secret) corev1.Secret {
if len(secrets) == 0 {
return corev1.Secret{}
}
var latest corev1.Secret
latestVersion := -1
for _, secret := range secrets {
v := extractVersionNumber(secret.Name)
if v > latestVersion {
latest = secret
latestVersion = v
}
}
return latest
}| } | ||
| latestHR.Annotations[annotationLastProcessedVersion] = strconv.Itoa(version) | ||
|
|
||
| return r.Patch(ctx, latestHR, patch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with the other Patch calls in this file (suspendHelmRelease and unsuspendHelmRelease), you should specify a client.FieldOwner. This is important for tracking changes and managing conflicts, especially when using server-side apply.
return r.Patch(ctx, latestHR, patch, client.FieldOwner(fieldManager))| args: | ||
| {{- if .Values.debug }} | ||
| - --zap-log-level=debug | ||
| {{- else }} | ||
| - --zap-log-level=info | ||
| {{- end }} | ||
| - --metrics-bind-address=:8080 | ||
| - --metrics-secure=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The controller supports leader election, which is a crucial feature for controllers to prevent race conditions when running more than one replica. It's best practice to enable it. Please add the --leader-elect=true flag to the container arguments.
args:
{{- if .Values.debug }}
- --zap-log-level=debug
{{- else }}
- --zap-log-level=info
{{- end }}
- --leader-elect=true
- --metrics-bind-address=:8080
- --metrics-secure=falseThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In @cmd/flux-plunger/main.go:
- Around line 96-114: The WebhookServer is unused and causes startup failures;
remove the creation and wiring of it: delete the webhook.NewServer(...) call
that assigns webhookServer and remove WebhookServer: webhookServer from the
ctrl.Options passed into ctrl.NewManager (symbols: webhook.NewServer,
webhookServer, ctrl.NewManager, ctrl.Options, WebhookServer) so the manager no
longer tries to start a webhook server requiring TLS certs.
In @internal/controller/fluxplunger/flux_plunger.go:
- Around line 86-91: Controller can orphan a HelmRelease suspended state if it
crashes after Secret deletion but before updating the processed annotation,
causing logger.Info("HelmRelease is suspended by external process, skipping",
"latest", latestVersion) to always skip reconciliation; fix by either updating
the HelmRelease annotation with latestVersion+1 before deleting the Secret
(marking intent so crash recovery can resume) or add/maintain a dedicated
annotation like "flux-plunger-suspended" that this controller sets and clears
around the delete operation so the reconcile path can distinguish
controller-owned suspensions and recover safely (apply the chosen change in the
reconcile flow around the Secret deletion and annotation update logic, ensuring
atomic ordering and clearing on success/failure).
In @packages/core/platform/sources/flux-plunger.yaml:
- Around line 4-22: The manifest installs the flux-plunger component into the
wrong namespace; update the install.namespace value for the component named
"flux-plunger" in the cozystack.flux-plunger source (metadata.name:
cozystack.flux-plunger) from "cozy-flux" to "cozy-fluxcd" so it matches the
paas-full.yaml bundle and the Makefile NAMESPACE export.
🧹 Nitpick comments (9)
cmd/flux-plunger/main.go (2)
110-128: If leader election is enabled, consider settingLeaderElectionNamespaceexplicitly.Without it, leader election often defaults to
kube-system(and then fails RBAC in restricted clusters). If you want leases in the pod namespace, wire it explicitly (e.g., fromPOD_NAMESPACE).
104-108:metrics-securecurrently means “TLS”, not “authn/authz”. Consider renaming or adding filters.If you intend “secure” to mean more than TLS, controller-runtime supports authn/authz filter providers for metrics; otherwise the flag can be misleading.
packages/system/flux-plunger/values.yaml (1)
1-1: Good: image is digest-pinned; consider replacing:latestwith an immutable tag for clarity.Digest pinning is the important part here. Using a non-
latesttag (e.g., git SHA) makes human review/auditing easier.packages/system/flux-plunger/Makefile (1)
7-16: Add validation to prevent conflicting--pushand--loadsettings.The defaults in
common-envs.mkare safe (PUSH=1,LOAD=0), but users can override via command line (e.g.,make image LOAD=1), which would cause docker buildx to fail since--pushand--loadare mutually exclusive. Add a guard to prevent this invalid combination.packages/system/flux-plunger/templates/deployment.yaml (1)
21-36: Add resource limits and health probes for production readiness.The container definition is missing:
- Resource requests and limits - Could lead to scheduling issues or resource contention
- Liveness and readiness probes - Even though port 8081 (health) is exposed, no probes are configured
🔧 Proposed enhancements
- name: flux-plunger image: "{{ .Values.image }}" args: {{- if .Values.debug }} - --zap-log-level=debug {{- else }} - --zap-log-level=info {{- end }} - --metrics-bind-address=:8080 - --metrics-secure=false ports: - name: metrics containerPort: 8080 - name: health containerPort: 8081 + livenessProbe: + httpGet: + path: /healthz + port: health + initialDelaySeconds: 15 + periodSeconds: 20 + readinessProbe: + httpGet: + path: /readyz + port: health + initialDelaySeconds: 5 + periodSeconds: 10 + resources: + requests: + cpu: 100m + memory: 128Mi + limits: + cpu: 500m + memory: 256MiNote: The YAMLlint syntax error on line 26 is a false positive—Helm's conditional templating syntax is valid.
internal/controller/fluxplunger/flux_plunger.go (4)
27-35: Struct follows controller-runtime patterns.The RBAC markers correctly declare the required permissions. Minor note: the
Schemefield is declared but not referenced in this controller—consider removing it if not needed for future functionality.
220-228: Unused functionextractVersionFromSecretName.This function is only called by
extractVersionNumber(line 233). Consider making it unexported or inlining it intoextractVersionNumbersince it's not used elsewhere.
244-249:suspendHelmReleasedoesn't re-fetch before patching.Unlike
unsuspendHelmReleaseandupdateProcessedVersionAnnotation, this function patches the originalhrobject without re-fetching. If the HelmRelease was modified after the initialGet, the optimistic lock could fail. Consider re-fetching for consistency with other patch functions.♻️ Suggested improvement
func (r *FluxPlunger) suspendHelmRelease(ctx context.Context, hr *helmv2.HelmRelease) error { + // Re-fetch the HelmRelease to get the latest state + key := types.NamespacedName{Namespace: hr.Namespace, Name: hr.Name} + latestHR := &helmv2.HelmRelease{} + if err := r.Get(ctx, key, latestHR); err != nil { + return fmt.Errorf("failed to get latest HelmRelease: %w", err) + } + - patch := client.MergeFromWithOptions(hr.DeepCopy(), client.MergeFromWithOptimisticLock{}) - hr.Spec.Suspend = true + patch := client.MergeFromWithOptions(latestHR.DeepCopy(), client.MergeFromWithOptimisticLock{}) + latestHR.Spec.Suspend = true - return r.Patch(ctx, hr, patch, client.FieldOwner(fieldManager)) + return r.Patch(ctx, latestHR, patch, client.FieldOwner(fieldManager)) }
275-282: InconsistentFieldOwnerusage.
updateProcessedVersionAnnotationdoesn't useclient.FieldOwner(fieldManager)unlikesuspendHelmReleaseandunsuspendHelmRelease. For consistent field ownership tracking, consider adding it.- return r.Patch(ctx, latestHR, patch) + return r.Patch(ctx, latestHR, patch, client.FieldOwner(fieldManager))
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
cmd/flux-plunger/main.gointernal/controller/fluxplunger/flux_plunger.gopackages/core/platform/bundles/paas-full.yamlpackages/core/platform/sources/flux-plunger.yamlpackages/system/flux-plunger/Chart.yamlpackages/system/flux-plunger/Makefilepackages/system/flux-plunger/images/flux-plunger/Dockerfilepackages/system/flux-plunger/templates/deployment.yamlpackages/system/flux-plunger/templates/rbac.yamlpackages/system/flux-plunger/templates/service.yamlpackages/system/flux-plunger/templates/serviceaccount.yamlpackages/system/flux-plunger/values.yaml
🧰 Additional context used
📓 Path-based instructions (6)
packages/system/**
📄 CodeRabbit inference engine (AGENTS.md)
Use Helm Chart umbrella pattern with vendored upstream charts in
charts/directory
Files:
packages/system/flux-plunger/Makefilepackages/system/flux-plunger/templates/serviceaccount.yamlpackages/system/flux-plunger/templates/deployment.yamlpackages/system/flux-plunger/templates/service.yamlpackages/system/flux-plunger/images/flux-plunger/Dockerfilepackages/system/flux-plunger/Chart.yamlpackages/system/flux-plunger/templates/rbac.yamlpackages/system/flux-plunger/values.yaml
**/*.yaml
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms
Files:
packages/system/flux-plunger/templates/serviceaccount.yamlpackages/system/flux-plunger/templates/deployment.yamlpackages/system/flux-plunger/templates/service.yamlpackages/system/flux-plunger/Chart.yamlpackages/system/flux-plunger/templates/rbac.yamlpackages/core/platform/bundles/paas-full.yamlpackages/system/flux-plunger/values.yamlpackages/core/platform/sources/flux-plunger.yaml
cmd/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory
Files:
cmd/flux-plunger/main.go
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)
Files:
cmd/flux-plunger/main.gointernal/controller/fluxplunger/flux_plunger.go
internal/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in internal directory
Files:
internal/controller/fluxplunger/flux_plunger.go
packages/core/**
📄 CodeRabbit inference engine (AGENTS.md)
Use Helm Chart umbrella pattern with vendored upstream charts in
charts/directory
Files:
packages/core/platform/bundles/paas-full.yamlpackages/core/platform/sources/flux-plunger.yaml
🧠 Learnings (6)
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to cmd/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory
Applied to files:
cmd/flux-plunger/main.go
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to packages/system/** : Use Helm Chart umbrella pattern with vendored upstream charts in `charts/` directory
Applied to files:
packages/system/flux-plunger/Chart.yaml
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to **/*.yaml : Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms
Applied to files:
packages/system/flux-plunger/Chart.yaml
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to packages/core/** : Use Helm Chart umbrella pattern with vendored upstream charts in `charts/` directory
Applied to files:
packages/system/flux-plunger/Chart.yaml
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to packages/apps/** : Use Helm Chart umbrella pattern with vendored upstream charts in `charts/` directory
Applied to files:
packages/system/flux-plunger/Chart.yaml
📚 Learning: 2025-12-25T15:24:00.888Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/goldpinger.yaml:20-23
Timestamp: 2025-12-25T15:24:00.888Z
Learning: In Goldpinger YAML configurations (e.g., packages/core/platform/sources/goldpinger.yaml), when hostNetwork is enabled, set privileged: true to ensure required host-level capabilities for cross-node network monitoring. Apply this rule to similar Goldpinger YAML files under the same directory. Confirm privileges are limited to scenarios that truly require host access and are scoped to this component.
Applied to files:
packages/core/platform/sources/flux-plunger.yaml
🧬 Code graph analysis (1)
internal/controller/fluxplunger/flux_plunger.go (2)
pkg/apiserver/apiserver.go (1)
Scheme(53-53)internal/controller/dashboard/manager.go (1)
Manager(45-48)
🪛 checkmake (0.2.2)
packages/system/flux-plunger/Makefile
[warning] 7-7: Target body for "image" exceeds allowed length of 5 (12).
(maxbodylength)
[warning] 1-1: Missing required phony target "all"
(minphony)
[warning] 1-1: Missing required phony target "clean"
(minphony)
[warning] 1-1: Missing required phony target "test"
(minphony)
🪛 YAMLlint (1.37.1)
packages/system/flux-plunger/templates/deployment.yaml
[error] 26-26: syntax error: could not find expected ':'
(syntax)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (8)
packages/core/platform/bundles/paas-full.yaml (1)
61-65: LGTM! Release entry follows established patterns.The flux-plunger release configuration is consistent with other plunger components in the bundle (e.g., kubeovn-plunger) and follows the established pattern for release definitions.
packages/system/flux-plunger/templates/serviceaccount.yaml (1)
1-8: LGTM! ServiceAccount definition is correct.The ServiceAccount is properly configured with appropriate labels and explicitly enables token mounting, which is required for the controller to interact with the Kubernetes API.
packages/system/flux-plunger/Chart.yaml (1)
1-6: LGTM! Chart metadata is appropriate for a custom component.The Chart.yaml follows standard Helm conventions. The umbrella pattern guideline applies to vendored upstream charts, but since flux-plunger is a custom controller developed in this repository, a simple application chart is appropriate.
packages/system/flux-plunger/templates/rbac.yaml (1)
17-24: Runtime filtering is already properly implemented.The controller's
listHelmReleaseSecrets()function correctly scopes access to the reconciled namespace, filters secrets by labels (nameandowner: helm), and by type (helm.sh/release.v1). Secrets are only deleted after multiple validity checks during specific error conditions.The broad RBAC rule is a necessary trade-off in Kubernetes—RBAC rules cannot express label selectors, so controllers use broad rules paired with proper runtime filtering. This implementation is sound and does not pose an unmitigated security risk.
Likely an incorrect or invalid review comment.
packages/system/flux-plunger/templates/service.yaml (1)
1-18: LGTM!The Service template correctly exposes the metrics (8080) and health (8081) endpoints with matching labels and selectors for the flux-plunger deployment.
internal/controller/fluxplunger/flux_plunger.go (3)
1-25: LGTM!The imports and constants are appropriate. Using
flux-client-side-applyas the field manager aligns with the PR objective of working alongside FluxCD's field ownership.
135-164: Remediation flow logic is sound.The sequence of suspend → delete → annotate → unsuspend correctly implements the recovery mechanism. The annotation check at lines 119-133 properly prevents re-processing by comparing
latestVersion+1 == processedVersion.Note: The atomicity concern was addressed in the previous comment regarding crash recovery.
286-300: Predicate filter is appropriate but has an edge case.The predicate correctly limits reconciliation to HelmReleases with the target error. However, this relies on the assumption that suspended HelmReleases retain their error status condition (which is true since Flux won't reconcile suspended releases).
This design choice means the controller won't react to HelmReleases that are suspended without the error condition—which is fine for externally-suspended releases but could be problematic in the crash-recovery scenario flagged earlier.
Add flux-plunger controller to automatically fix HelmRelease resources with "has no deployed releases" error. The controller watches HelmRelease resources and performs the following: - Detects HelmRelease with "has no deployed releases" error - Suspends the HelmRelease with flux-client-side-apply field manager - Deletes the latest Helm release secret - Updates annotation with processed version to prevent recursive deletion - Unsuspends the HelmRelease to allow Flux to retry Special handling for suspended HelmRelease: - If suspend=true and latest+1==processed: removes suspend - Otherwise: skips processing as suspended by external process Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>
22ab852 to
fc2b687
Compare
| releaseName: flux-plunger | ||
| chart: cozy-flux-plunger | ||
| namespace: cozy-fluxcd | ||
| dependsOn: [cilium,kubeovn] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| dependsOn: [cilium,kubeovn] | |
| dependsOn: [cilium,kubeovn,multus] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In @cmd/flux-plunger/main.go:
- Around line 53-104: The enable-http2 flag description incorrectly references
webhook servers; update the flag.BoolVar call that sets enableHTTP2 (the flag
registered with flag.BoolVar(&enableHTTP2, "enable-http2", ...)) to use the
corrected description string "If set, HTTP/2 will be enabled for the metrics
server" and ensure any related comments or documentation near the
metricsServerOptions configuration reflect that only the metrics server TLS
options are affected.
In @internal/controller/fluxplunger/flux_plunger.go:
- Around line 290-306: The event filter in SetupWithManager currently only
returns true when hasNoDeployedReleasesError(hr), which can miss suspended
releases after a crash; update the predicate used in WithEventFilter(pred) so it
also returns true when hr.Spec.Suspend == true AND the controller's
ownership/in-progress annotation is present (create a helper like
hasPlungerAnnotation(hr) to check the annotation), or alternatively remove
WithEventFilter(pred) and perform the same checks early in Reconcile
(fast-return when irrelevant) so suspended-but-owned HelmReleases are still
enqueued for recovery; modify SetupWithManager/ predicate and ensure
hasNoDeployedReleasesError and the new annotation check are the deciding
conditions.
- Around line 44-89: When suspending a HelmRelease we must mark ownership so
retries can resume work; update the suspend path (the code that calls
suspendHelmRelease or the suspendHelmRelease implementation) to add an
annotation like "flux-plunger/suspended-by=flux-plunger" (or "in-progress") and
persist it atomically with the spec.suspend change, and ensure any subsequent
steps record the last-processed version annotation
(annotationLastProcessedVersion). In the suspended-branch (the block that calls
r.listHelmReleaseSecrets and currently returns early when hr.Spec.Suspend)
detect that annotation: if "flux-plunger/suspended-by" (or "in-progress") is
present and owned by us, continue the remediation sequence (delete Helm release
secret, update annotationLastProcessedVersion, then call r.unsuspendHelmRelease)
instead of skipping; only treat external suspends (no owned annotation) as skip.
Ensure unsuspendHelmRelease only clears suspend when the suspended-by annotation
matches and remove that annotation when done.
- Around line 140-152: The secret deletion should be idempotent: when calling
r.Delete(ctx, &latestSecret) handle a NotFound error as success instead of
returning an error; change the deletion error handling in the block that logs
"Deleting latest Helm release secret" to check
k8s.io/apimachinery/pkg/api/errors.IsNotFound(err) (or use
client.IgnoreNotFound) and only return on non-NotFound errors, optionally
logging a debug/info that the secret was already absent, then proceed to call
r.updateProcessedVersionAnnotation(ctx, hr, latestVersion) as before.
- Around line 20-24: The current use of MergeFrom (client.MergeFrom) with
fieldManager (variable fieldManager) means client.FieldOwner() is ignored and
managedFields is not updated, so relying on annotationLastProcessedVersion +
version heuristics is unsafe; fix by either (A) adding an explicit annotation
flag like flux-plunger.cozystack.io/suspended-by-controller:"true" when
suspending and checking/removing it when unsuspending, or (B) change the
suspend/unsuspend calls to use Server-Side Apply via client.Apply with the
fieldManager to ensure ownership is recorded in managedFields (and remove the
unreliable version-heuristic logic); pick one approach and update the code paths
that currently call MergeFrom to use the chosen method and checks.
🧹 Nitpick comments (3)
packages/system/flux-plunger/templates/serviceaccount.yaml (1)
1-8: Consider omittingautomountServiceAccountToken: trueunless you’re intentionally overriding a default.It’s redundant in most clusters (default is already true), but keeping it is fine if you’re guarding against namespace/SA defaults being changed.
packages/system/flux-plunger/templates/deployment.yaml (1)
37-55: Probes look aligned with:8081health endpoints; consider making resources configurable via values.internal/controller/fluxplunger/flux_plunger.go (1)
203-239: Avoid treating parse failures as version 0 (can select the wrong “latest” secret).
extractVersionNumber()returning 0 on parse error means an unexpected secret name can skew sorting/selection. Prefer returning(int, error)and skipping invalid entries (with a log), or treat invalid as lowest and log loudly.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
cmd/flux-plunger/main.gointernal/controller/fluxplunger/flux_plunger.gopackages/core/platform/bundles/paas-full.yamlpackages/core/platform/sources/flux-plunger.yamlpackages/system/flux-plunger/Chart.yamlpackages/system/flux-plunger/Makefilepackages/system/flux-plunger/images/flux-plunger/Dockerfilepackages/system/flux-plunger/templates/deployment.yamlpackages/system/flux-plunger/templates/rbac.yamlpackages/system/flux-plunger/templates/service.yamlpackages/system/flux-plunger/templates/serviceaccount.yamlpackages/system/flux-plunger/values.yaml
🚧 Files skipped from review as they are similar to previous changes (7)
- packages/system/flux-plunger/Chart.yaml
- packages/core/platform/sources/flux-plunger.yaml
- packages/system/flux-plunger/values.yaml
- packages/system/flux-plunger/templates/service.yaml
- packages/system/flux-plunger/images/flux-plunger/Dockerfile
- packages/core/platform/bundles/paas-full.yaml
- packages/system/flux-plunger/templates/rbac.yaml
🧰 Additional context used
📓 Path-based instructions (5)
cmd/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory
Files:
cmd/flux-plunger/main.go
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)
Files:
cmd/flux-plunger/main.gointernal/controller/fluxplunger/flux_plunger.go
packages/system/**
📄 CodeRabbit inference engine (AGENTS.md)
Use Helm Chart umbrella pattern with vendored upstream charts in
charts/directory
Files:
packages/system/flux-plunger/templates/serviceaccount.yamlpackages/system/flux-plunger/Makefilepackages/system/flux-plunger/templates/deployment.yaml
**/*.yaml
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms
Files:
packages/system/flux-plunger/templates/serviceaccount.yamlpackages/system/flux-plunger/templates/deployment.yaml
internal/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in internal directory
Files:
internal/controller/fluxplunger/flux_plunger.go
🧠 Learnings (3)
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to cmd/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory
Applied to files:
cmd/flux-plunger/main.go
📚 Learning: 2025-06-26T04:29:24.830Z
Learnt from: NickVolynkin
Repo: cozystack/cozystack PR: 1117
File: packages/apps/mysql/Makefile:8-8
Timestamp: 2025-06-26T04:29:24.830Z
Learning: The cozystack project uses yq v4+ on their CI runner, so yq v4 syntax (-o json --indent 4) is compatible and version checks are not needed.
Applied to files:
packages/system/flux-plunger/Makefile
📚 Learning: 2025-07-14T16:23:12.803Z
Learnt from: NickVolynkin
Repo: cozystack/cozystack PR: 1196
File: packages/apps/http-cache/Makefile:24-27
Timestamp: 2025-07-14T16:23:12.803Z
Learning: In the cozystack repository, the `readme-generator` tool removes enum contents from values.schema.json files during its operation. Therefore, when using readme-generator in Makefiles, any enum values need to be injected back into the schema using yq commands after readme-generator has run, not before.
Applied to files:
packages/system/flux-plunger/Makefile
🧬 Code graph analysis (1)
internal/controller/fluxplunger/flux_plunger.go (1)
internal/controller/dashboard/manager.go (1)
Manager(45-48)
🪛 checkmake (0.2.2)
packages/system/flux-plunger/Makefile
[warning] 7-7: Target body for "image" exceeds allowed length of 5 (12).
(maxbodylength)
[warning] 1-1: Missing required phony target "all"
(minphony)
[warning] 1-1: Missing required phony target "clean"
(minphony)
[warning] 1-1: Missing required phony target "test"
(minphony)
🪛 YAMLlint (1.37.1)
packages/system/flux-plunger/templates/deployment.yaml
[error] 26-26: syntax error: could not find expected ':'
(syntax)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (5)
packages/system/flux-plunger/Makefile (3)
1-5: LGTM!The environment variable exports and include statements follow the expected pattern for system packages in this repository.
17-19: LGTM!The digest extraction and values.yaml update workflow is well-structured. The yq v4+ syntax is correct per project standards, and cleanup of the temporary metadata file is good practice.
7-16: No changes required; Dockerfile path and images directory are correct.The Dockerfile path
images/flux-plunger/Dockerfilecorrectly resolves to the existing file atpackages/system/flux-plunger/images/flux-plunger/Dockerfile. Theimages/directory exists in the repository, so the metadata file can be written without issue. This pattern is consistent across all system packages in the codebase.Likely an incorrect or invalid review comment.
cmd/flux-plunger/main.go (1)
105-150: Manager wiring + health/ready checks look correct.internal/controller/fluxplunger/flux_plunger.go (1)
177-201: The label selectors for Helm release secrets are correct.The labels
name: <releaseName>andowner: helmmatch how Flux/Helm's helm-controller labels Helm release secrets, and they are sufficient to identify only that HelmRelease'shelm.sh/release.v1secrets. The additional type filter ensures only Helm v3 release secrets are returned. No changes needed.
| // Delete the latest secret | ||
| logger.Info("Deleting latest Helm release secret", "secret", latestSecret.Name) | ||
| if err := r.Delete(ctx, &latestSecret); err != nil { | ||
| logger.Error(err, "Failed to delete Helm release secret") | ||
| return ctrl.Result{}, err | ||
| } | ||
|
|
||
| // Update annotation with processed version | ||
| logger.Info("Updating annotation with processed version", "version", latestVersion) | ||
| if err := r.updateProcessedVersionAnnotation(ctx, hr, latestVersion); err != nil { | ||
| logger.Error(err, "Failed to update annotation") | ||
| return ctrl.Result{}, err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make secret deletion idempotent (ignore NotFound).
Proposed fix
- if err := r.Delete(ctx, &latestSecret); err != nil {
+ if err := r.Delete(ctx, &latestSecret); err != nil {
+ if client.IgnoreNotFound(err) == nil {
+ logger.Info("Helm release secret already deleted", "secret", latestSecret.Name)
+ } else {
logger.Error(err, "Failed to delete Helm release secret")
return ctrl.Result{}, err
+ }
}🤖 Prompt for AI Agents
In @internal/controller/fluxplunger/flux_plunger.go around lines 140 - 152, The
secret deletion should be idempotent: when calling r.Delete(ctx, &latestSecret)
handle a NotFound error as success instead of returning an error; change the
deletion error handling in the block that logs "Deleting latest Helm release
secret" to check k8s.io/apimachinery/pkg/api/errors.IsNotFound(err) (or use
client.IgnoreNotFound) and only return on non-NotFound errors, optionally
logging a debug/info that the secret was already absent, then proceed to call
r.updateProcessedVersionAnnotation(ctx, hr, latestVersion) as before.
| args: | ||
| {{- if .Values.debug }} | ||
| - --zap-log-level=debug | ||
| {{- else }} | ||
| - --zap-log-level=info | ||
| {{- end }} | ||
| - --metrics-bind-address=:8080 | ||
| - --metrics-secure=false | ||
| ports: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find CI configuration files
find . -type f \( -name ".github" -o -name ".gitlab-ci.yml" -o -name ".circleci" \) -o -path "*/.github/workflows/*" 2>/dev/null | head -20Repository: cozystack/cozystack
Length of output: 362
🏁 Script executed:
#!/bin/bash
# List files to identify CI setup
ls -la .github/workflows/ 2>/dev/null || echo "No GitHub Actions found"
cat .gitlab-ci.yml 2>/dev/null | head -50 || echo "No GitLab CI found"Repository: cozystack/cozystack
Length of output: 716
🏁 Script executed:
#!/bin/bash
# Find the flux-plunger deployment.yaml file
find . -path "*/flux-plunger/templates/deployment.yaml" -type fRepository: cozystack/cozystack
Length of output: 119
🏁 Script executed:
#!/bin/bash
# Search for yamllint in the repository (config files, workflows, etc.)
rg -i "yamllint" --type yaml --type json --type shell -A 5Repository: cozystack/cozystack
Length of output: 92
🏁 Script executed:
#!/bin/bash
# Check pre-commit configuration for yamllint
cat .github/workflows/pre-commit.ymlRepository: cozystack/cozystack
Length of output: 1452
🏁 Script executed:
#!/bin/bash
# Read the deployment.yaml file
cat packages/system/flux-plunger/templates/deployment.yamlRepository: cozystack/cozystack
Length of output: 1455
🏁 Script executed:
#!/bin/bash
# Search for if/else control flow patterns in helm templates
rg "{{-\s*if" packages/system --type yaml -B 2 -A 2 | head -100Repository: cozystack/cozystack
Length of output: 146
🏁 Script executed:
#!/bin/bash
# Search for ternary usage in helm templates
rg "ternary" packages/system --type yaml -B 1 -A 1Repository: cozystack/cozystack
Length of output: 26904
🏁 Script executed:
#!/bin/bash
# Find and read .pre-commit-config.yaml
find . -name ".pre-commit-config.yaml" -type fRepository: cozystack/cozystack
Length of output: 88
🏁 Script executed:
#!/bin/bash
# Check the structure of flux-plunger
ls -la packages/system/flux-plunger/Repository: cozystack/cozystack
Length of output: 482
🏁 Script executed:
#!/bin/bash
# Check if flux-plunger is a custom chart or has a charts/ directory
find packages/system/flux-plunger -type f | head -20Repository: cozystack/cozystack
Length of output: 456
🏁 Script executed:
#!/bin/bash
# Search for yamllint configuration in the repo
rg "yamllint" --type yaml -B 2 -A 5Repository: cozystack/cozystack
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Read the pre-commit configuration
cat .pre-commit-config.yamlRepository: cozystack/cozystack
Length of output: 556
🏁 Script executed:
#!/bin/bash
# Search for helm or yaml linting in pre-commit hooks and CI
rg "helm|lint" --type yaml -B 1 -A 2 .github/workflows/ | head -100Repository: cozystack/cozystack
Length of output: 242
🏁 Script executed:
#!/bin/bash
# Check the Makefile in flux-plunger
cat packages/system/flux-plunger/MakefileRepository: cozystack/cozystack
Length of output: 815
🏁 Script executed:
#!/bin/bash
# Check values.yaml
cat packages/system/flux-plunger/values.yamlRepository: cozystack/cozystack
Length of output: 191
🏁 Script executed:
#!/bin/bash
# Check if there's any helm-lint or validation tool in the repo
find . -name "Makefile" -o -name "*.sh" -o -name "tox.ini" | xargs rg -l "helm|lint|validate" 2>/dev/null | head -10Repository: cozystack/cozystack
Length of output: 481
🏁 Script executed:
#!/bin/bash
# Search for 'debug' field in flux-plunger related files
rg "debug" packages/system/flux-plunger/ -B 2 -A 2Repository: cozystack/cozystack
Length of output: 558
🏁 Script executed:
#!/bin/bash
# Search more broadly for values schema or documentation
find packages/system/flux-plunger -type f -name "*.md" -o -name "*.txt" -o -name "schema.json"Repository: cozystack/cozystack
Length of output: 45
🏁 Script executed:
#!/bin/bash
# Check if there's a values schema or documentation
cat packages/system/flux-plunger/Chart.yamlRepository: cozystack/cozystack
Length of output: 348
🏁 Script executed:
#!/bin/bash
# Test if the if/else pattern actually causes YAML parsing issues
# Create a test file with the original pattern
cat > /tmp/test-original.yaml << 'EOF'
args:
{{- if .Values.debug }}
- --zap-log-level=debug
{{- else }}
- --zap-log-level=info
{{- end }}
- --metrics-bind-address=:8080
EOF
# Try to parse it with a YAML parser (before helm template rendering)
python3 << 'PYEOF'
import yaml
try:
with open('/tmp/test-original.yaml', 'r') as f:
content = f.read()
# Try to parse the raw template (without rendering)
yaml.safe_load(content)
print("YAML parsing: SUCCESS (no error)")
except yaml.YAMLError as e:
print(f"YAML parsing ERROR: {e}")
except Exception as e:
print(f"ERROR: {e}")
PYEOFRepository: cozystack/cozystack
Length of output: 292
🏁 Script executed:
#!/bin/bash
# Test if the proposed ternary fix is valid YAML
cat > /tmp/test-proposed.yaml << 'EOF'
args:
- '--zap-log-level={{ ternary "debug" "info" .Values.debug }}'
- --metrics-bind-address=:8080
- --metrics-secure=false
EOF
python3 << 'PYEOF'
import yaml
try:
with open('/tmp/test-proposed.yaml', 'r') as f:
content = f.read()
result = yaml.safe_load(content)
print("Proposed fix YAML parsing: SUCCESS")
print(f"Parsed structure: {result}")
except yaml.YAMLError as e:
print(f"Proposed fix YAML parsing ERROR: {e}")
except Exception as e:
print(f"ERROR: {e}")
PYEOFRepository: cozystack/cozystack
Length of output: 246
🏁 Script executed:
#!/bin/bash
# Check if similar debug/ternary patterns exist in other custom charts
rg "debug" packages/system --type yaml | grep -v "charts/" | head -20Repository: cozystack/cozystack
Length of output: 2566
🏁 Script executed:
#!/bin/bash
# Look for examples of ternary in args sections
rg "args:" packages/system -A 5 --type yaml | grep -B 2 -A 3 "ternary"Repository: cozystack/cozystack
Length of output: 45
Fix template YAML syntax: if/else in args sequence is invalid without Helm rendering.
The current pattern is syntactically invalid YAML when unrendered (breaks parsers at line 26-27). Replace with a ternary expression to keep the template valid:
Proposed fix
args:
- {{- if .Values.debug }}
- - --zap-log-level=debug
- {{- else }}
- - --zap-log-level=info
- {{- end }}
+ - '--zap-log-level={{ ternary "debug" "info" .Values.debug }}'
- --metrics-bind-address=:8080
- --metrics-secure=falseAlso add the missing field to values.yaml:
-image: "ghcr.io/cozystack/cozystack/flux-plunger:latest@..."
+image: "ghcr.io/cozystack/cozystack/flux-plunger:latest@..."
+debug: false📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| args: | |
| {{- if .Values.debug }} | |
| - --zap-log-level=debug | |
| {{- else }} | |
| - --zap-log-level=info | |
| {{- end }} | |
| - --metrics-bind-address=:8080 | |
| - --metrics-secure=false | |
| ports: | |
| args: | |
| - '--zap-log-level={{ ternary "debug" "info" .Values.debug }}' | |
| - --metrics-bind-address=:8080 | |
| - --metrics-secure=false | |
| ports: |
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 26-26: syntax error: could not find expected ':'
(syntax)
- Remove unused webhook server configuration - Add health probes (liveness and readiness) to deployment - Add resource requests and limits to deployment - Fix optimistic lock conflicts by checking current state before patching - Fix potential deadlock in crash recovery by updating predicate filter - Fix namespace in PackageSource (cozy-fluxcd) The predicate now watches both: 1. HelmReleases with "has no deployed releases" error 2. Suspended HelmReleases with flux-plunger annotation (crash recovery) This prevents deadlock when controller crashes after suspend but before unsuspend, and FluxCD removes error from status. Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @internal/controller/fluxplunger/flux_plunger.go:
- Around line 44-89: When HR is suspended and the last-processed annotation is
missing or cannot be parsed, implement a recovery/fallback so the controller
doesn't permanently skip processing: in the suspend-handling block that calls
listHelmReleaseSecrets and computes latestVersion (use hr.Spec.Suspend,
listHelmReleaseSecrets, getLatestSecret, extractVersionNumber,
annotationLastProcessedVersion), treat a missing/invalid annotation as a
potential crash scenario and attempt recovery by unsuspending when appropriate
(at minimum: if annotation is missing/invalid AND len(secrets)==0 then call
unsuspendHelmRelease; otherwise requeue for immediate reconcile when parse fails
so we can re-check state instead of silently skipping). Ensure errors from
unsuspendHelmRelease are handled the same way as before.
🧹 Nitpick comments (1)
internal/controller/fluxplunger/flux_plunger.go (1)
203-239: Consider adding defensive nil/empty check ingetLatestSecret.While callers currently check
len(secrets) > 0before calling, adding a defensive check would make the function safer for future use and prevent potential panics.Suggested improvement
// getLatestSecret returns the secret with the highest version number func getLatestSecret(secrets []corev1.Secret) corev1.Secret { + if len(secrets) == 0 { + return corev1.Secret{} + } if len(secrets) == 1 { return secrets[0] }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cmd/flux-plunger/main.gointernal/controller/fluxplunger/flux_plunger.go
🧰 Additional context used
📓 Path-based instructions (3)
cmd/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory
Files:
cmd/flux-plunger/main.go
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)
Files:
cmd/flux-plunger/main.gointernal/controller/fluxplunger/flux_plunger.go
internal/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in internal directory
Files:
internal/controller/fluxplunger/flux_plunger.go
🧠 Learnings (1)
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to cmd/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory
Applied to files:
cmd/flux-plunger/main.go
🧬 Code graph analysis (2)
cmd/flux-plunger/main.go (3)
internal/controller/dashboard/manager.go (1)
NewManager(51-54)pkg/apiserver/apiserver.go (1)
Scheme(53-53)internal/controller/fluxplunger/flux_plunger.go (1)
FluxPlunger(27-29)
internal/controller/fluxplunger/flux_plunger.go (1)
internal/controller/dashboard/manager.go (1)
Manager(45-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (6)
cmd/flux-plunger/main.go (2)
1-51: LGTM - Standard kubebuilder scaffold setup.The license header, imports, and scheme registration follow controller-runtime patterns correctly. The scheme properly registers both core Kubernetes types and Flux HelmRelease types. Based on coding guidelines for cmd/**/*.go.
53-150: LGTM - Main function follows controller-runtime best practices.The manager configuration is well-structured with:
- Proper flag parsing for operational flexibility
- HTTP/2 disabled by default with documented security rationale (CVE references)
- Appropriate leader election ID namespacing
- Standard health/readiness probe setup
- Proper error handling with informative log messages
Based on coding guidelines for controller-runtime patterns.
internal/controller/fluxplunger/flux_plunger.go (4)
20-29: LGTM - Constants and struct definition.The constants are well-defined with clear purposes, and the
FluxPlungerstruct follows the standard controller-runtime pattern of embeddingclient.Client.
91-163: LGTM - Error handling path with proper safeguards.The reconciliation logic correctly:
- Checks for the specific error condition before processing
- Prevents reprocessing with the
latestVersion+1 == processedVersioncheck- Re-fetches the HelmRelease before each mutation to avoid stale writes
- Uses optimistic locking to handle concurrent modifications
- Returns errors to let the workqueue retry on failures
241-298: LGTM - Mutation functions are well-implemented.The suspend/unsuspend and annotation update functions correctly:
- Re-fetch the resource to get the latest state (addressing optimistic lock conflicts)
- Use
MergeFromWithOptimisticLockfor conflict detection- Check for idempotency (already suspended/unsuspended) to avoid unnecessary patches
- Use a consistent field manager for server-side apply semantics
300-331: LGTM - Controller setup with appropriate predicate filtering.The predicate correctly filters for:
- HelmReleases with the specific error condition (primary use case)
- Suspended HelmReleases with the annotation (crash recovery)
This aligns with the PR objective of handling crash recovery and avoids unnecessary reconciliation of unrelated HelmReleases.
…flicts When FluxCD and flux-plunger both update the same HelmRelease, optimistic lock conflicts occur. Previously, returning these errors caused controller-runtime to immediately requeue, creating an infinite loop while the object was being modified by Flux. Now we log conflicts as Info and return success, allowing controller-runtime to naturally wait for the next update event. This eliminates the reconcile storm and allows both controllers to coexist peacefully. Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @internal/controller/fluxplunger/flux_plunger.go:
- Around line 20-24: The code passes client.FieldOwner(fieldManager) into
MergeFromWithOptions (used when creating merge patches via MergeFromWithOptions)
which is ignored for JSON Merge Patch; remove the FieldOwner() option from all
MergeFromWithOptions calls (the ones near where MergeFromWithOptions is invoked)
if you don't need ownership tracking, or change the patch strategy to
server-side apply by switching those Patch calls to use client.Apply with
appropriate ApplyOptions and keep a meaningful fieldManager value; also update
the fieldManager constant (currently "flux-client-side-apply") to a
non-misleading name if you remove SSA usage.
🧹 Nitpick comments (3)
internal/controller/fluxplunger/flux_plunger.go (3)
34-165: Consider refactoring the long Reconcile method.The Reconcile method is 131 lines long and handles multiple distinct flows (suspended path, main remediation path). For improved readability and testability, consider extracting logical blocks into helper methods:
handleSuspendedHelmRelease(ctx, hr) (ctrl.Result, error)remediateHelmRelease(ctx, hr) (ctrl.Result, error)This would make the main Reconcile flow easier to follow and each path easier to unit test.
230-241: Add defensive handling for malformed secret names.
extractVersionNumberreturns0when parsing fails (line 238). If multiple secrets have malformed names (or legitimately use v0), they'll all be treated as version 0, making the "latest" selection ingetLatestSecretarbitrary.While this is an edge case, consider:
- Logging a warning when version parsing fails to aid debugging
- Optionally, skipping secrets with unparseable versions rather than treating them as v0
💡 Suggested improvement
func extractVersionNumber(secretName string) int { version := extractVersionFromSecretName(secretName) // Remove 'v' prefix if present version = strings.TrimPrefix(version, "v") num, err := strconv.Atoi(version) if err != nil { + // Log malformed version for visibility + // Note: can't access logger here without refactoring return 0 } return num }Alternatively, return an error or sentinel value (-1) to allow callers to filter out malformed entries.
243-300: Mutation helpers follow correct patterns.The re-fetch-before-patch pattern (lines 246-250, 266-270, 286-290) and optimistic locking are appropriate for concurrent environments. The early-return checks (lines 253-255, 273-275) avoid unnecessary API calls.
Optional: The re-fetch pattern is duplicated across three functions. Consider extracting it into a helper like
getLatestHelmRelease(ctx, hr) (*helmv2.HelmRelease, error)to reduce duplication.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal/controller/fluxplunger/flux_plunger.go
🧰 Additional context used
📓 Path-based instructions (2)
internal/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Follow controller-runtime patterns and kubebuilder style for Go code in internal directory
Files:
internal/controller/fluxplunger/flux_plunger.go
**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)
Files:
internal/controller/fluxplunger/flux_plunger.go
🧬 Code graph analysis (1)
internal/controller/fluxplunger/flux_plunger.go (1)
internal/controller/dashboard/manager.go (1)
Manager(45-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (4)
internal/controller/fluxplunger/flux_plunger.go (4)
31-32: LGTM!RBAC permissions are appropriately scoped for the controller's operations.
135-140: Optimistic lock conflict handling is appropriate.The pattern of logging conflicts as Info and returning
nil(no error) at lines 136-139, 152-153, and 159-160 is correct given the PR objective to avoid requeue storms during concurrent updates by Flux and flux-plunger. The controller will be re-triggered by watch events when the object stabilizes.
167-203: LGTM!The helper functions correctly identify Helm release errors and list Helm release secrets using standard Helm 3 conventions (labels
name/ownerand typehelm.sh/release.v1).
302-333: Controller setup follows correct patterns.The predicate logic correctly filters for relevant HelmRelease events (error condition or suspended with annotation for crash recovery). The setup uses standard controller-runtime patterns.
Note: The crash recovery predicate (lines 319-322) assumes the annotation exists, which ties into the crash recovery issue flagged earlier.
| const ( | ||
| annotationLastProcessedVersion = "flux-plunger.cozystack.io/last-processed-version" | ||
| errorMessageNoDeployedReleases = "has no deployed releases" | ||
| fieldManager = "flux-client-side-apply" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
controller-runtime FieldOwner MergeFromWithOptions merge patch behavior
💡 Result:
Short answer: MergeFromWithOptions builds a JSON Merge Patch (client-side merge per RFC 7396) — not Server‑Side Apply — so it merges at the JSON/object level (maps merged, lists replaced entirely) and does not participate in Kubernetes “field ownership” the way SSA does. controller‑runtime’s MergeFromWithOptions supports options such as optimistic locking (adds resourceVersion checking) via MergeFromWithOptimisticLock. References:
- controller‑runtime Patch/MergeFromWithOptions and MergeFromWithOptimisticLock docs. [1]
- Kubernetes docs: differences between JSON merge patch (lists replaced) and strategic/SSA behavior. [2]
- JSON Merge Patch (RFC 7396) specification. [3]
- Explanation of SSA field ownership (why SSA removes fields you stop declaring). [4]
If you want field ownership semantics (managed fields / FieldManager), use Server‑Side Apply (Apply patch type) rather than MergeFrom/MergePatch.
🏁 Script executed:
cd internal/controller/fluxplunger && sed -n '260p;280p;299p' flux_plunger.goRepository: cozystack/cozystack
Length of output: 275
Remove FieldOwner option or switch to Server-Side Apply.
The client.FieldOwner(fieldManager) option at lines 260, 280, and 299 has no effect with merge patches. According to controller-runtime and Kubernetes specifications, FieldOwner is only meaningful for Server-Side Apply (SSA), not for JSON Merge Patch operations. Using MergeFromWithOptions with FieldOwner() passes an ignored parameter.
Choose one:
- Remove the
FieldOwner()option from merge patch calls if field ownership tracking is not required, OR - Switch to Server-Side Apply (
Patchwithclient.Apply) if you need field ownership semantics
Also reconsider the fieldManager constant value—setting it to "flux-client-side-apply" (Flux's field manager) is misleading if not actually used by Kubernetes field ownership tracking.
🤖 Prompt for AI Agents
In @internal/controller/fluxplunger/flux_plunger.go around lines 20 - 24, The
code passes client.FieldOwner(fieldManager) into MergeFromWithOptions (used when
creating merge patches via MergeFromWithOptions) which is ignored for JSON Merge
Patch; remove the FieldOwner() option from all MergeFromWithOptions calls (the
ones near where MergeFromWithOptions is invoked) if you don't need ownership
tracking, or change the patch strategy to server-side apply by switching those
Patch calls to use client.Apply with appropriate ApplyOptions and keep a
meaningful fieldManager value; also update the fieldManager constant (currently
"flux-client-side-apply") to a non-misleading name if you remove SSA usage.
| // Check if HelmRelease is suspended | ||
| if hr.Spec.Suspend { | ||
| logger.Info("HelmRelease is suspended, checking if we need to unsuspend") | ||
|
|
||
| // Get the list of Helm release secrets | ||
| secrets, err := r.listHelmReleaseSecrets(ctx, hr.Namespace, hr.Name) | ||
| if err != nil { | ||
| logger.Error(err, "Failed to list Helm release secrets") | ||
| return ctrl.Result{}, err | ||
| } | ||
|
|
||
| // If no secrets, treat latest version as 0 | ||
| latestVersion := 0 | ||
| if len(secrets) > 0 { | ||
| latestSecret := getLatestSecret(secrets) | ||
| latestVersion = extractVersionNumber(latestSecret.Name) | ||
| } else { | ||
| logger.Info("No Helm release secrets found while suspended, treating as version 0") | ||
| } | ||
|
|
||
| // Check if version is previous to just processed (latestVersion+1 == processedVersion) | ||
| // This is the ONLY condition when we unsuspend | ||
| shouldUnsuspend := false | ||
| if hr.Annotations != nil { | ||
| if processedVersionStr, exists := hr.Annotations[annotationLastProcessedVersion]; exists { | ||
| processedVersion, err := strconv.Atoi(processedVersionStr) | ||
| if err == nil && latestVersion+1 == processedVersion { | ||
| shouldUnsuspend = true | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if shouldUnsuspend { | ||
| // Unsuspend the HelmRelease | ||
| logger.Info("Secret was already deleted in previous run, removing suspend", "latest", latestVersion, "processed", latestVersion+1) | ||
| if err := r.unsuspendHelmRelease(ctx, hr); err != nil { | ||
| logger.Info("Could not unsuspend HelmRelease, will retry on next reconcile", "error", err.Error()) | ||
| return ctrl.Result{}, nil | ||
| } | ||
| return ctrl.Result{}, nil | ||
| } | ||
|
|
||
| // If not previous to processed, skip all actions | ||
| logger.Info("HelmRelease is suspended by external process, skipping", "latest", latestVersion) | ||
| return ctrl.Result{}, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical crash recovery deadlock - HelmRelease can get permanently stuck.
The crash recovery logic has a critical flaw that can leave a HelmRelease permanently suspended:
Deadlock Scenario:
- Controller suspends HelmRelease (line 135)
- Controller deletes latest secret (line 144)
- Controller crashes before updating annotation (line 151)
- On restart:
- HelmRelease is suspended:
true - Latest version is now N-1 (deleted secret was version N)
- Annotation is missing (crashed before setting it)
- Line 70 check:
latestVersion+1 == processedVersionfails (no annotation) shouldUnsuspend = false- Controller skips all actions (line 88)
- HelmRelease is suspended:
- Result: HelmRelease remains permanently suspended
The predicate (lines 319-322) only matches suspended HelmReleases that have the annotation, so the crashed case won't even trigger reconciliation unless the error persists.
Recommended fix:
Handle the suspended-without-annotation case by checking if we're the ones who suspended it (using field manager metadata) OR add logic to detect partial completion and resume the operation.
🔧 Suggested approach
Add a check in the suspended block to handle the crash scenario:
if hr.Spec.Suspend {
logger.Info("HelmRelease is suspended, checking if we need to unsuspend")
+ // Check if we have the annotation - if not, this might be a crash recovery
+ hasAnnotation := hr.Annotations != nil && hr.Annotations[annotationLastProcessedVersion] != ""
+
+ // If suspended without annotation but still has error, we crashed mid-process
+ // Continue with normal flow to complete the operation
+ if !hasAnnotation && hasNoDeployedReleasesError(hr) {
+ logger.Info("Suspended without annotation but has error - completing crashed operation")
+ // Fall through to main flow instead of returning
+ } else {
// Get the list of Helm release secrets
secrets, err := r.listHelmReleaseSecrets(ctx, hr.Namespace, hr.Name)
// ... rest of existing logic
+ }
}Committable suggestion skipped: line range outside the PR's diff.
This PR prepares the release `v0.40.0`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @packages/system/linstor/values.yaml:
- Line 4: Update the linstor image tagging to match the piraeusServer versioning
strategy: replace the linstorCSI tag that currently uses "latest" with a
consistent pinned tag (e.g., the same
"1.32.3@sha256:c6fbe1825514a3059088175968a651d7e10f477590023a33f1a9e6584fd98e04"
or another explicit sha256-tagged version). Locate the linstorCSI key in the
YAML and set its image tag to the chosen pinned value, or better yet refactor
both piraeusServer and linstorCSI to use a shared parameter (e.g.,
values.image.tag) so both images use the same versioning source.
🧹 Nitpick comments (2)
packages/system/dashboard/templates/configmap.yaml (1)
1-3: Avoid hardcoding the displayed version in the template (reduce churn/drift).Bumping
{{- $tenantText := "v0.40.0" }}works, but it forces template edits for every release and can drift from the chart/app version. Consider moving this default into values.yaml (still overrideable via_cluster.branding.tenantText) or deriving it from chart metadata.Possible refactor (chart-driven default)
-{{- $tenantText := "v0.40.0" }} +{{- $tenantText := (.Values._cluster.branding.defaultTenantText | default (printf "v%s" .Chart.AppVersion)) }}packages/system/kamaji/values.yaml (1)
16-16: Reduce drift risk: avoid duplicating the digest betweenimage.tagand--migrate-image.The image digest appears hardcoded in both places. While the Makefile has a build rule to keep them in sync, a more robust approach is to construct
--migrate-imagefrom the chart values in the template layer. Consider defining a helper value (e.g.,migrateImage) or using template logic to build--migrate-imagefromimage.repositoryandimage.tag, eliminating the manual synchronization requirement for future image updates.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
packages/apps/http-cache/images/nginx-cache.tagpackages/apps/kubernetes/images/cluster-autoscaler.tagpackages/apps/kubernetes/images/kubevirt-cloud-provider.tagpackages/apps/kubernetes/images/kubevirt-csi-driver.tagpackages/apps/kubernetes/images/ubuntu-container-disk.tagpackages/apps/mysql/images/mariadb-backup.tagpackages/core/installer/values.yamlpackages/core/platform/values.yamlpackages/core/testing/values.yamlpackages/extra/bootbox/images/matchbox.tagpackages/extra/monitoring/images/grafana.tagpackages/extra/seaweedfs/images/objectstorage-sidecar.tagpackages/system/bucket/images/s3manager.tagpackages/system/cilium/values.yamlpackages/system/cozystack-api/values.yamlpackages/system/cozystack-controller/values.yamlpackages/system/dashboard/templates/configmap.yamlpackages/system/dashboard/values.yamlpackages/system/kamaji/values.yamlpackages/system/kubeovn-plunger/values.yamlpackages/system/kubeovn-webhook/values.yamlpackages/system/kubevirt-csi-node/values.yamlpackages/system/lineage-controller-webhook/values.yamlpackages/system/linstor/values.yamlpackages/system/metallb/values.yamlpackages/system/objectstorage-controller/values.yamlpackages/system/seaweedfs/values.yaml
✅ Files skipped from review due to trivial changes (5)
- packages/system/cozystack-api/values.yaml
- packages/system/kubeovn-webhook/values.yaml
- packages/apps/http-cache/images/nginx-cache.tag
- packages/system/metallb/values.yaml
- packages/core/installer/values.yaml
🧰 Additional context used
📓 Path-based instructions (4)
packages/system/**
📄 CodeRabbit inference engine (AGENTS.md)
Use Helm Chart umbrella pattern with vendored upstream charts in
charts/directory
Files:
packages/system/kubeovn-plunger/values.yamlpackages/system/dashboard/values.yamlpackages/system/cilium/values.yamlpackages/system/kubevirt-csi-node/values.yamlpackages/system/linstor/values.yamlpackages/system/seaweedfs/values.yamlpackages/system/kamaji/values.yamlpackages/system/dashboard/templates/configmap.yamlpackages/system/lineage-controller-webhook/values.yamlpackages/system/cozystack-controller/values.yamlpackages/system/bucket/images/s3manager.tagpackages/system/objectstorage-controller/values.yaml
**/*.yaml
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms
Files:
packages/system/kubeovn-plunger/values.yamlpackages/system/dashboard/values.yamlpackages/system/cilium/values.yamlpackages/core/platform/values.yamlpackages/system/kubevirt-csi-node/values.yamlpackages/system/linstor/values.yamlpackages/core/testing/values.yamlpackages/system/seaweedfs/values.yamlpackages/system/kamaji/values.yamlpackages/system/dashboard/templates/configmap.yamlpackages/system/lineage-controller-webhook/values.yamlpackages/system/cozystack-controller/values.yamlpackages/system/objectstorage-controller/values.yaml
packages/apps/**
📄 CodeRabbit inference engine (AGENTS.md)
Use Helm Chart umbrella pattern with vendored upstream charts in
charts/directory
Files:
packages/apps/kubernetes/images/kubevirt-csi-driver.tagpackages/apps/kubernetes/images/ubuntu-container-disk.tagpackages/apps/kubernetes/images/kubevirt-cloud-provider.tagpackages/apps/kubernetes/images/cluster-autoscaler.tagpackages/apps/mysql/images/mariadb-backup.tag
packages/core/**
📄 CodeRabbit inference engine (AGENTS.md)
Use Helm Chart umbrella pattern with vendored upstream charts in
charts/directory
Files:
packages/core/platform/values.yamlpackages/core/testing/values.yaml
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to packages/core/** : Use Helm Chart umbrella pattern with vendored upstream charts in `charts/` directory
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (21)
packages/apps/kubernetes/images/cluster-autoscaler.tag (1)
1-1: Digest bump: please verify provenance/compatibility of the new pinned image.
Since this is a digest-only change, can you confirmsha256:372ad087…exists in GHCR forghcr.io/cozystack/cozystack/cluster-autoscaler:0.0.0, and (if you enforce it) that its signature/attestation checks still pass?packages/apps/mysql/images/mariadb-backup.tag (1)
1-1: Verify that this change belongs in this PR.This file updates the mariadb-backup image digest, but the PR objectives focus exclusively on adding a flux-plunger controller to remediate HelmRelease errors. There's no apparent connection between the flux-plunger functionality and the MySQL mariadb-backup image.
Please confirm:
- Is this change intentional and related to the flux-plunger work?
- Should this be in a separate PR focused on MySQL image updates?
- What changes are included in the new image digest that justify bundling it with this PR?
</comment_end>
packages/core/testing/values.yaml (1)
2-2: Good practice using both tag and digest, but digest verification requires manual confirmation outside this environment.Using both a version tag and SHA256 digest provides immutable image references for reproducibility and security. The digest
sha256:78a556a059411cac89249c2e22596223128a1c74fbfcac4787fd74e4a66f66a6should be manually verified to match the v0.40.0 tag, as registry verification tools are not available in the review environment. You can verify this by pulling the image locally and comparing the digest:docker pull ghcr.io/cozystack/cozystack/e2e-sandbox:v0.40.0 docker inspect ghcr.io/cozystack/cozystack/e2e-sandbox:v0.40.0 | grep -i digestpackages/system/cozystack-controller/values.yaml (1)
2-2: Version bump is consistent and digest is properly pinned.Image tag and cozystackVersion both updated to v0.40.0 with no inconsistencies. The image uses a pinned SHA256 digest, which is a security best practice for reproducible deployments.
packages/system/objectstorage-controller/values.yaml (1)
3-3: Image reference format is correct; ensure v0.40.0 version is appropriate for this release.The image update uses proper digest pinning for reproducibility. Verify that this minor version bump (0.38.2 → 0.40.0) is tested and intentional for the release scope.
packages/system/kubeovn-plunger/values.yaml (1)
3-3: This version bump is not unrelated scope creep; it is part of a coordinated platform-wide release to v0.40.0.The v0.40.0 release spans multiple cozystack components (kubeovn-plunger, kubeovn-webhook, objectstorage-controller, lineage-controller-webhook, kamaji, dashboard, cozystack-api, cozystack-controller, and others), indicating this is a systematic platform release rather than an isolated, unintentional change bundled into the flux-plunger PR.
Likely an incorrect or invalid review comment.
packages/system/cilium/values.yaml (1)
16-19: The image tag and digest have been verified and correctly match the published GHCR manifest. The digestsha256:c14a0bcb1a1531c72725b3a4c40aefa2bcd5c129810bf58ea8e37d3fcff2a326corresponds exactly to the tag1.18.5forghcr.io/cozystack/cozystack/cilium. This change follows the correct umbrella pattern for the system package, with the wrapper chart'svalues.yamlproperly configuring the vendored cilium dependency. No issues found.packages/system/kamaji/values.yaml (1)
6-6: No issues found. The chart templates construct image references using onlyrepository:tagformat (confirmed incontroller.yaml), and there is no separate digest field that would cause conflicts. The digest embedded in thetagfield produces valid image references (e.g.,repo:v0.40.0@sha256:...), which container runtimes correctly handle.packages/system/dashboard/values.yaml (2)
2-6: Good: images are pinned by digest (reproducible deployments).
Keeping:v0.40.0@sha256:...prevents accidental upgrades and makes rollbacks deterministic.
1-6: Templates correctly render image references verbatim; tag↔digest correspondence is guaranteed by the build process.Verification confirms the templates (web.yaml lines 54, 174; gatekeeper.yaml line 82) use
.Values.<component>.imagedirectly without tag concatenation. The Makefile build targets (image-openapi-ui,image-openapi-ui-k8s-bff,image-token-proxy) extract the digest from container build metadata immediately after building and write the full reference (repo:tag@sha256:digest) directly to values.yaml, ensuring tag and digest always match.Minor note: gatekeeper.yaml line 82 omits the
| quotefilter present in web.yaml's image fields, though both render validly.packages/extra/bootbox/images/matchbox.tag (1)
1-1: This image reference is part of the coordinated v0.40.0 release—not an unrelated bundled change. The matchbox, objectstorage-sidecar, and other components are intentionally versioned together in this release. The SHA256 digest is properly formatted.Likely an incorrect or invalid review comment.
packages/apps/kubernetes/images/kubevirt-cloud-provider.tag (1)
1-1: Clarify whether the kubevirt-cloud-provider image digest bump is in scope for this PR.The kubevirt-cloud-provider change appears unrelated to flux-plunger (located in separate service/directory hierarchy). If it's unrelated, move to a separate PR to simplify review. If intentional and coordinated, confirm it's produced by your build/provenance process and passes image scanning gates.
packages/extra/monitoring/images/grafana.tag (1)
1-1: Image reference format is correct and properly pinned.The digest is automatically generated during the build process (Makefile line 19-20) via
docker buildx buildand extracted from the build metadata, ensuring it matches the exact built image. No manual verification is needed.packages/apps/kubernetes/images/ubuntu-container-disk.tag (1)
1-1: No action needed—this is a standard image configuration update.The ubuntu-container-disk.tag file is a legitimate part of the Helm chart structure used in packages/apps/kubernetes/templates/cluster.yaml. Digest-only updates to container images are normal operational maintenance and pose no risk due to digest pinning ensuring immutability. The change follows the standard Helm umbrella pattern and requires no verification.
packages/core/platform/values.yaml (1)
2-2: Confirm the cozystack-assets v0.40.0 image and digest are current.As this change is part of the v0.40.0 release, the image tag and SHA256 digest should be verified through standard release and deployment processes to ensure they are correct and the image is available in the registry.
packages/system/kubevirt-csi-node/values.yaml (1)
3-3: Image digest update is consistent but appears unrelated to PR objectives.The csiDriver.image digest update matches the corresponding change in
packages/apps/kubernetes/images/kubevirt-csi-driver.tag, which is good for consistency. However, this kubevirt-csi-driver update appears unrelated to the flux-plunger controller additions described in the PR objectives.Please clarify whether this change should be in this PR or moved to a separate PR focused on kubevirt-csi component updates.
packages/apps/kubernetes/images/kubevirt-csi-driver.tag (1)
1-1: Clarify the scope and rationale for this image update.This change updates the kubevirt-csi-driver image digest in the Kubernetes Helm chart. While the change itself is structurally sound and follows the project's Helm patterns correctly, please clarify:
- Is this image update intentionally bundled with other changes in this PR, or should it be moved to a separate change?
- What is the reason for updating this specific digest (security patch, feature update, etc.)?
packages/extra/seaweedfs/images/objectstorage-sidecar.tag (1)
1-1: This change is intentional and part of the v0.40.0 release.The objectstorage-sidecar image update from v0.38.2 to v0.40.0 is expected as part of the broader v0.40.0 release PR (#1840). SeaweedFS is an integrated component of the platform, and bundling its updates with other component updates is standard practice in release PRs. The image version and digest are consistent with the release being prepared.
Likely an incorrect or invalid review comment.
packages/system/seaweedfs/values.yaml (1)
180-180: Sidecar image update is part of v0.40.0 release.This change is correctly part of the v0.40.0 release commit and follows proper Helm umbrella pattern conventions (values.yaml at umbrella level, not in vendored charts/). The image reference matches the tag file exactly, ensuring consistency across configuration files.
Since this is part of the coordinated release (commit 1685242), verify that:
- The objectstorage-sidecar v0.40.0 upgrade has been tested and is stable
- Any relevant changelog or release notes for this component have been reviewed
packages/system/bucket/images/s3manager.tag (1)
1-1: Digest pin is correct and current. The tagv0.5.0resolves to the pinned digestsha256:ecb140d026ed72660306953a7eec140d7ac81e79544d5bbf1aba5f62aa5f8b69in GHCR.packages/system/lineage-controller-webhook/values.yaml (1)
2-2: This is a new file in v0.40.0, not a version bump from an earlier release.The values.yaml file for lineage-controller-webhook is being created as part of the v0.40.0 release, not updated from v0.38.2. The image digest is correct per the release commit. No breaking changes are documented for this component in v0.40.0.
Likely an incorrect or invalid review comment.
| repository: ghcr.io/cozystack/cozystack/piraeus-server | ||
| tag: latest@sha256:417532baa2801288147cd9ac9ae260751c1a7754f0b829725d09b72a770c111a | ||
|
|
||
| tag: 1.32.3@sha256:c6fbe1825514a3059088175968a651d7e10f477590023a33f1a9e6584fd98e04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, explore the file structure to understand the context
cd packages/system/linstor 2>/dev/null && ls -la || echo "Directory not found, searching for the file"
find . -name "values.yaml" -path "*/linstor/*" -type f 2>/dev/null | head -5Repository: cozystack/cozystack
Length of output: 661
🏁 Script executed:
# Read the linstor values.yaml file to see current state
fd "values.yaml" . --path "*linstor*" --type f | head -1 | xargs cat -n || cat packages/system/linstor/values.yaml 2>/dev/null || echo "File not found"Repository: cozystack/cozystack
Length of output: 278
🏁 Script executed:
# Check the file structure and verify line numbers
if [ -f packages/system/linstor/values.yaml ]; then
echo "=== Lines 1-20 of packages/system/linstor/values.yaml ==="
head -20 packages/system/linstor/values.yaml | cat -n
else
echo "File not found at expected location"
fiRepository: cozystack/cozystack
Length of output: 636
🏁 Script executed:
# Search for all references to image tags in the linstor values file
if [ -f packages/system/linstor/values.yaml ]; then
echo "=== Searching for tag references ==="
grep -n "tag:" packages/system/linstor/values.yaml | head -10
fiRepository: cozystack/cozystack
Length of output: 280
🏁 Script executed:
# Check git history for this file to understand the context
if git rev-parse --git-dir > /dev/null 2>&1; then
echo "=== Recent commits affecting packages/system/linstor/values.yaml ==="
git log --oneline -20 packages/system/linstor/values.yaml 2>/dev/null || echo "No git history available"
fiRepository: cozystack/cozystack
Length of output: 164
🏁 Script executed:
# Search for flux-plunger references in the repository
echo "=== Searching for flux-plunger references ==="
rg "flux-plunger" --type yaml --type md . 2>/dev/null | head -20 || echo "No flux-plunger references found"Repository: cozystack/cozystack
Length of output: 1861
🏁 Script executed:
# Check the broader context - are there other linstor version changes?
echo "=== Checking for linstor-related changes in git diff ==="
git diff HEAD~5 HEAD -- packages/system/linstor/ 2>/dev/null | head -100 || echo "Cannot access git diff"Repository: cozystack/cozystack
Length of output: 119
🏁 Script executed:
# Check if there's a PR or documentation about this change
find . -name "*.md" -o -name "*.txt" 2>/dev/null | xargs grep -l "linstor\|piraeus" 2>/dev/null | head -5Repository: cozystack/cozystack
Length of output: 214
🏁 Script executed:
# Check if image exists and is accessible
echo "=== Verifying piraeus-server image ==="
# Try to get metadata about the image
rg "piraeus-server:1.32.3" . --type yaml | head -5
# Check if there are other version tags in the codebase
echo ""
echo "=== Checking for other piraeus-server versions ==="
rg "piraeus-server:" . --type yaml | head -10Repository: cozystack/cozystack
Length of output: 281
🏁 Script executed:
# Look for commit messages or changelog entries about this release
echo "=== Checking changelog for v0.40.0 ==="
if [ -f ./docs/changelogs/v0.40.0.md ]; then
head -50 ./docs/changelogs/v0.40.0.md
else
# Try to find the latest changelog
ls -lah ./docs/changelogs/ | tail -10
fiRepository: cozystack/cozystack
Length of output: 701
🏁 Script executed:
# Check the actual release or PR context by looking at recent changes
echo "=== Checking what else changed in the linstor system package ==="
git show --stat 1685242e -- packages/system/linstor/ 2>/dev/null || git log -1 --stat 1685242e 2>/dev/null || echo "Cannot access detailed commit info"Repository: cozystack/cozystack
Length of output: 2103
Address inconsistency in image tagging strategy.
Version pinning the piraeusServer image to 1.32.3 improves reproducibility. However, there's an inconsistency worth addressing: piraeusServer now uses a versioned tag while linstorCSI (line 14) still uses latest. Consider aligning the tagging strategy across both images for consistency in how the linstor deployment is versioned.
🤖 Prompt for AI Agents
In @packages/system/linstor/values.yaml at line 4, Update the linstor image
tagging to match the piraeusServer versioning strategy: replace the linstorCSI
tag that currently uses "latest" with a consistent pinned tag (e.g., the same
"1.32.3@sha256:c6fbe1825514a3059088175968a651d7e10f477590023a33f1a9e6584fd98e04"
or another explicit sha256-tagged version). Locate the linstorCSI key in the
YAML and set its image tag to the chosen pinned value, or better yet refactor
both piraeusServer and linstorCSI to use a shared parameter (e.g.,
values.image.tag) so both images use the same versioning source.
What this PR does
This PR introduces a new flux-plunger controller that automatically fixes HelmRelease resources experiencing "has no deployed releases" errors by cleaning up corrupted Helm release secrets.
Problem
When Helm releases fail with "has no deployed releases" error, FluxCD cannot recover automatically. This typically happens when:
Solution
The flux-plunger controller:
flux-client-side-applyfield manager)Key features:
latest+1 == processed, skips (already handled)Components
internal/controller/fluxplunger/flux_plunger.gocmd/flux-plunger/main.gopackages/system/flux-plunger/paas-full.yamlbundleRelease note
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.