Skip to content

Conversation

@kvaps
Copy link
Member

@kvaps kvaps commented Jan 9, 2026

What this PR does

This PR introduces a new flux-plunger controller that automatically fixes HelmRelease resources experiencing "has no deployed releases" errors by cleaning up corrupted Helm release secrets.

Problem

When Helm releases fail with "has no deployed releases" error, FluxCD cannot recover automatically. This typically happens when:

  • Helm release secrets are corrupted
  • Failed upgrades leave the release in an inconsistent state
  • Manual intervention is required to delete the problematic secret

Solution

The flux-plunger controller:

  1. Watches HelmRelease resources for "has no deployed releases" errors
  2. Suspends the HelmRelease (with flux-client-side-apply field manager)
  3. Identifies and deletes the latest (problematic) Helm release secret
  4. Tracks processed versions via annotation to prevent recursive deletion
  5. Unsuspends the HelmRelease to allow FluxCD to retry

Key features:

  • Only deletes one secret at a time (prevents cascading deletions)
  • Uses version tracking: if latest+1 == processed, skips (already handled)
  • Handles suspended HelmReleases: only unsuspends if it was suspended by flux-plunger itself
  • Gracefully handles invalid annotations by treating them as unprocessed

Components

  • Controller: internal/controller/fluxplunger/flux_plunger.go
  • Main: cmd/flux-plunger/main.go
  • Helm Chart: packages/system/flux-plunger/
  • Platform Integration: Added to paas-full.yaml bundle

Release note

[platform] Add flux-plunger controller to automatically fix HelmRelease resources with "has no deployed releases" error by cleaning up corrupted Helm secrets

Summary by CodeRabbit

  • New Features

    • Added FluxPlunger controller to automatically resolve "has no deployed releases" HelmRelease errors and restore release state.
    • Packaged FluxPlunger with Helm chart, image, ServiceAccount, RBAC, Service, Deployment (health/readiness probes, resource limits) and metrics endpoint; integrated into platform install sequence.
  • Chores

    • Bumped multiple component images and pinned updated digests across platform and apps (installer, assets, tests, and various system/app images).

✏️ Tip: You can customize this high-level summary in your review settings.

@kvaps kvaps requested review from lllamnyp and nbykov0 as code owners January 9, 2026 17:05
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Jan 9, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 9, 2026

📝 Walkthrough

Walkthrough

Adds a new FluxPlunger controller binary and Helm chart with platform packaging. The controller reconciles Flux HelmRelease resources to remediate "has no deployed releases" errors by suspending, deleting the latest Helm release secret, annotating processed version, and unsuspending; includes manager entrypoint, RBAC, manifests, and image build.

Changes

Cohort / File(s) Summary
Entrypoint
cmd/flux-plunger/main.go
New controller-runtime main: registers core & Helm v2 schemes, configures metrics server (TLS option and HTTP/2 toggle), health/readiness probes, leader election flags, and starts manager with signal handling.
Controller
internal/controller/fluxplunger/flux_plunger.go
New FluxPlunger reconciler and SetupWithManager: watches HelmRelease events, lists Helm release secrets, parses/selects latest version, suspends HelmRelease, deletes latest secret, updates last-processed-version annotation, and unsuspends; includes crash-recovery logic, predicates, and helper functions.
Helm chart & image build
packages/system/flux-plunger/Chart.yaml, .../Makefile, .../values.yaml, .../images/flux-plunger/Dockerfile
New chart metadata, Makefile with image target writing image digest into values, values pointing to GHCR image digest, and multi-stage Dockerfile producing a static binary.
Kubernetes templates
packages/system/flux-plunger/templates/deployment.yaml, .../rbac.yaml, .../service.yaml, .../serviceaccount.yaml
New Deployment (liveness/readiness probes, resources), ClusterRole/ClusterRoleBinding for HelmRelease and Secret operations, Service exposing metrics/health, and ServiceAccount.
Platform integration
packages/core/platform/bundles/paas-full.yaml, packages/core/platform/sources/flux-plunger.yaml
Adds flux-plunger release entry to paas-full bundle (inserted before multus) and a PackageSource OCIRepository manifest referencing cozystack-packages with component/install metadata.
Bundle/chart images & versions
many packages/.../*.tag and values.yaml files (multiple paths)
Multiple image digest and version bumps across platform, system, and app packages (various tag files and values.yaml updates).

Sequence Diagram(s)

sequenceDiagram
    participant K8s as Kubernetes API
    participant Ctrl as FluxPlunger Controller
    participant Secrets as Helm Release Secrets
    participant HR as HelmRelease Resource

    K8s->>Ctrl: HelmRelease event (error present or suspended)
    Ctrl->>K8s: Get HelmRelease
    alt HelmRelease is suspended
        Ctrl->>Secrets: List release secrets for HelmRelease
        Ctrl->>Ctrl: Determine latest secret version
        Ctrl->>HR: Read `last-processed-version` annotation
        alt annotation == latest version
            Ctrl->>K8s: Patch HelmRelease to unsuspend
        end
    else HelmRelease shows "has no deployed releases"
        Ctrl->>Secrets: List release secrets for HelmRelease
        Ctrl->>Ctrl: Identify latest secret/version
        Ctrl->>K8s: Patch HelmRelease to suspend
        Ctrl->>Secrets: Delete latest secret
        Ctrl->>K8s: Patch HelmRelease to update `last-processed-version` annotation
        Ctrl->>K8s: Patch HelmRelease to unsuspend
    end
    K8s->>Ctrl: Reconciliation finished
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hop through secrets, tiny and spry,
I yank the stuck ones so your releases fly.
Suspend, mark, then give them a shove—
A plunger's jig in code I love.
Hooray for tidy charts — nibble a carrot! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main change: adding a flux-plunger controller to fix HelmRelease errors, which aligns with the primary objective of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @kvaps, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new flux-plunger controller to enhance the robustness of HelmRelease deployments managed by FluxCD. It provides an automated mechanism to recover from common 'has no deployed releases' errors, which typically occur due to corrupted Helm release secrets. By intelligently suspending, cleaning up, and then unsuspending affected HelmReleases, the controller ensures that deployments can self-heal without manual intervention, improving overall system reliability.

Highlights

  • New Flux-Plunger Controller: Introduces a new flux-plunger controller designed to automatically address specific HelmRelease errors within FluxCD environments.
  • Automated HelmRelease Error Remediation: The controller targets and fixes 'has no deployed releases' errors in HelmRelease resources by identifying and cleaning up corrupted Helm release secrets.
  • Controlled Remediation Logic: The solution involves suspending the problematic HelmRelease, deleting only the latest corrupted secret, tracking processed versions via annotations to prevent redundant actions, and then unsuspending the HelmRelease to allow FluxCD to retry.
  • Platform Integration: The flux-plunger controller is integrated into the platform's paas-full.yaml bundle and defined with its own PackageSource and Helm chart.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new flux-plunger controller to automatically handle a specific HelmRelease error state. The implementation is well-structured, but I've found a critical logical flaw in the reconciliation loop that could lead to a deadlock, preventing the controller from functioning correctly under certain failure conditions. I've also provided several suggestions to improve robustness, efficiency, and adherence to Kubernetes best practices, such as enabling leader election, adding health probes to the deployment, and optimizing some of the Go functions. Addressing these points will make the new controller more resilient and maintainable.

// If not previous to processed, skip all actions
logger.Info("HelmRelease is suspended by external process, skipping", "latest", latestVersion)
return ctrl.Result{}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a logical flaw here that can lead to a deadlock. If the controller suspends a HelmRelease but then fails on a subsequent step (e.g., deleting the secret), the next reconciliation will find a suspended HelmRelease. This code block will then incorrectly assume it was suspended by an external process and skip it, preventing the error from ever being fixed. To fix this, you should remove these lines and allow the reconciliation to fall through to the main logic block, which can handle this case correctly.

Comment on lines +32 to +36
ports:
- name: metrics
containerPort: 8080
- name: health
containerPort: 8081
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The controller exposes health check endpoints (/healthz and /readyz), but the deployment manifest is missing liveness and readiness probes. Without these probes, Kubernetes cannot determine the health of the pod or restart it if it becomes unresponsive. Please add livenessProbe and readinessProbe to the container spec.

        ports:
        - name: metrics
          containerPort: 8080
        - name: health
          containerPort: 8081
        livenessProbe:
          httpGet:
            path: /healthz
            port: health
        readinessProbe:
          httpGet:
            path: /readyz
            port: health

Comment on lines +65 to +66
flag.BoolVar(&enableLeaderElection, "leader-elect", false,
"Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The default for leader election is false. For a controller, it's safer to enable leader election by default to prevent multiple instances from acting on the same resources simultaneously if the deployment is ever scaled beyond one replica. Please consider changing the default to true.

Suggested change
flag.BoolVar(&enableLeaderElection, "leader-elect", false,
"Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
flag.BoolVar(&enableLeaderElection, "leader-elect", true,
"Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")

Comment on lines +195 to +198
filtered := []corev1.Secret{}
for _, secret := range secretList.Items {
if secret.Type == "helm.sh/release.v1" {
filtered = append(filtered, secret)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop can be made more efficient. The for...range on a slice of structs creates a copy of each corev1.Secret on every iteration. Also, appending to a slice without pre-allocating capacity can lead to multiple reallocations. A more efficient pattern is to pre-allocate the slice and iterate by index to avoid copying.

Suggested change
filtered := []corev1.Secret{}
for _, secret := range secretList.Items {
if secret.Type == "helm.sh/release.v1" {
filtered = append(filtered, secret)
}
}
filtered := make([]corev1.Secret, 0, len(secretList.Items))
for i := range secretList.Items {
if secretList.Items[i].Type == "helm.sh/release.v1" {
filtered = append(filtered, secretList.Items[i])
}
}

Comment on lines +206 to +216
func getLatestSecret(secrets []corev1.Secret) corev1.Secret {
if len(secrets) == 1 {
return secrets[0]
}

sort.Slice(secrets, func(i, j int) bool {
vi := extractVersionNumber(secrets[i].Name)
vj := extractVersionNumber(secrets[j].Name)
return vi > vj
})

return secrets[0]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function modifies the input secrets slice by sorting it in place, which can be an unexpected side effect for callers. Additionally, sorting is an O(n log n) operation. You can achieve the same result more efficiently with a single O(n) pass to find the secret with the highest version, without modifying the original slice.

func getLatestSecret(secrets []corev1.Secret) corev1.Secret {
	if len(secrets) == 0 {
		return corev1.Secret{}
	}

	var latest corev1.Secret
	latestVersion := -1

	for _, secret := range secrets {
		v := extractVersionNumber(secret.Name)
		if v > latestVersion {
			latest = secret
			latestVersion = v
		}
	}
	return latest
}

}
latestHR.Annotations[annotationLastProcessedVersion] = strconv.Itoa(version)

return r.Patch(ctx, latestHR, patch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the other Patch calls in this file (suspendHelmRelease and unsuspendHelmRelease), you should specify a client.FieldOwner. This is important for tracking changes and managing conflicts, especially when using server-side apply.

return r.Patch(ctx, latestHR, patch, client.FieldOwner(fieldManager))

Comment on lines +24 to +31
args:
{{- if .Values.debug }}
- --zap-log-level=debug
{{- else }}
- --zap-log-level=info
{{- end }}
- --metrics-bind-address=:8080
- --metrics-secure=false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The controller supports leader election, which is a crucial feature for controllers to prevent race conditions when running more than one replica. It's best practice to enable it. Please add the --leader-elect=true flag to the container arguments.

        args:
        {{- if .Values.debug }}
        - --zap-log-level=debug
        {{- else }}
        - --zap-log-level=info
        {{- end }}
        - --leader-elect=true
        - --metrics-bind-address=:8080
        - --metrics-secure=false

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @cmd/flux-plunger/main.go:
- Around line 96-114: The WebhookServer is unused and causes startup failures;
remove the creation and wiring of it: delete the webhook.NewServer(...) call
that assigns webhookServer and remove WebhookServer: webhookServer from the
ctrl.Options passed into ctrl.NewManager (symbols: webhook.NewServer,
webhookServer, ctrl.NewManager, ctrl.Options, WebhookServer) so the manager no
longer tries to start a webhook server requiring TLS certs.

In @internal/controller/fluxplunger/flux_plunger.go:
- Around line 86-91: Controller can orphan a HelmRelease suspended state if it
crashes after Secret deletion but before updating the processed annotation,
causing logger.Info("HelmRelease is suspended by external process, skipping",
"latest", latestVersion) to always skip reconciliation; fix by either updating
the HelmRelease annotation with latestVersion+1 before deleting the Secret
(marking intent so crash recovery can resume) or add/maintain a dedicated
annotation like "flux-plunger-suspended" that this controller sets and clears
around the delete operation so the reconcile path can distinguish
controller-owned suspensions and recover safely (apply the chosen change in the
reconcile flow around the Secret deletion and annotation update logic, ensuring
atomic ordering and clearing on success/failure).

In @packages/core/platform/sources/flux-plunger.yaml:
- Around line 4-22: The manifest installs the flux-plunger component into the
wrong namespace; update the install.namespace value for the component named
"flux-plunger" in the cozystack.flux-plunger source (metadata.name:
cozystack.flux-plunger) from "cozy-flux" to "cozy-fluxcd" so it matches the
paas-full.yaml bundle and the Makefile NAMESPACE export.
🧹 Nitpick comments (9)
cmd/flux-plunger/main.go (2)

110-128: If leader election is enabled, consider setting LeaderElectionNamespace explicitly.

Without it, leader election often defaults to kube-system (and then fails RBAC in restricted clusters). If you want leases in the pod namespace, wire it explicitly (e.g., from POD_NAMESPACE).


104-108: metrics-secure currently means “TLS”, not “authn/authz”. Consider renaming or adding filters.

If you intend “secure” to mean more than TLS, controller-runtime supports authn/authz filter providers for metrics; otherwise the flag can be misleading.

packages/system/flux-plunger/values.yaml (1)

1-1: Good: image is digest-pinned; consider replacing :latest with an immutable tag for clarity.

Digest pinning is the important part here. Using a non-latest tag (e.g., git SHA) makes human review/auditing easier.

packages/system/flux-plunger/Makefile (1)

7-16: Add validation to prevent conflicting --push and --load settings.

The defaults in common-envs.mk are safe (PUSH=1, LOAD=0), but users can override via command line (e.g., make image LOAD=1), which would cause docker buildx to fail since --push and --load are mutually exclusive. Add a guard to prevent this invalid combination.

packages/system/flux-plunger/templates/deployment.yaml (1)

21-36: Add resource limits and health probes for production readiness.

The container definition is missing:

  1. Resource requests and limits - Could lead to scheduling issues or resource contention
  2. Liveness and readiness probes - Even though port 8081 (health) is exposed, no probes are configured
🔧 Proposed enhancements
       - name: flux-plunger
         image: "{{ .Values.image }}"
         args:
         {{- if .Values.debug }}
         - --zap-log-level=debug
         {{- else }}
         - --zap-log-level=info
         {{- end }}
         - --metrics-bind-address=:8080
         - --metrics-secure=false
         ports:
         - name: metrics
           containerPort: 8080
         - name: health
           containerPort: 8081
+        livenessProbe:
+          httpGet:
+            path: /healthz
+            port: health
+          initialDelaySeconds: 15
+          periodSeconds: 20
+        readinessProbe:
+          httpGet:
+            path: /readyz
+            port: health
+          initialDelaySeconds: 5
+          periodSeconds: 10
+        resources:
+          requests:
+            cpu: 100m
+            memory: 128Mi
+          limits:
+            cpu: 500m
+            memory: 256Mi

Note: The YAMLlint syntax error on line 26 is a false positive—Helm's conditional templating syntax is valid.

internal/controller/fluxplunger/flux_plunger.go (4)

27-35: Struct follows controller-runtime patterns.

The RBAC markers correctly declare the required permissions. Minor note: the Scheme field is declared but not referenced in this controller—consider removing it if not needed for future functionality.


220-228: Unused function extractVersionFromSecretName.

This function is only called by extractVersionNumber (line 233). Consider making it unexported or inlining it into extractVersionNumber since it's not used elsewhere.


244-249: suspendHelmRelease doesn't re-fetch before patching.

Unlike unsuspendHelmRelease and updateProcessedVersionAnnotation, this function patches the original hr object without re-fetching. If the HelmRelease was modified after the initial Get, the optimistic lock could fail. Consider re-fetching for consistency with other patch functions.

♻️ Suggested improvement
 func (r *FluxPlunger) suspendHelmRelease(ctx context.Context, hr *helmv2.HelmRelease) error {
+	// Re-fetch the HelmRelease to get the latest state
+	key := types.NamespacedName{Namespace: hr.Namespace, Name: hr.Name}
+	latestHR := &helmv2.HelmRelease{}
+	if err := r.Get(ctx, key, latestHR); err != nil {
+		return fmt.Errorf("failed to get latest HelmRelease: %w", err)
+	}
+
-	patch := client.MergeFromWithOptions(hr.DeepCopy(), client.MergeFromWithOptimisticLock{})
-	hr.Spec.Suspend = true
+	patch := client.MergeFromWithOptions(latestHR.DeepCopy(), client.MergeFromWithOptimisticLock{})
+	latestHR.Spec.Suspend = true
 
-	return r.Patch(ctx, hr, patch, client.FieldOwner(fieldManager))
+	return r.Patch(ctx, latestHR, patch, client.FieldOwner(fieldManager))
 }

275-282: Inconsistent FieldOwner usage.

updateProcessedVersionAnnotation doesn't use client.FieldOwner(fieldManager) unlike suspendHelmRelease and unsuspendHelmRelease. For consistent field ownership tracking, consider adding it.

-	return r.Patch(ctx, latestHR, patch)
+	return r.Patch(ctx, latestHR, patch, client.FieldOwner(fieldManager))
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b45fbb and 22ab852.

📒 Files selected for processing (12)
  • cmd/flux-plunger/main.go
  • internal/controller/fluxplunger/flux_plunger.go
  • packages/core/platform/bundles/paas-full.yaml
  • packages/core/platform/sources/flux-plunger.yaml
  • packages/system/flux-plunger/Chart.yaml
  • packages/system/flux-plunger/Makefile
  • packages/system/flux-plunger/images/flux-plunger/Dockerfile
  • packages/system/flux-plunger/templates/deployment.yaml
  • packages/system/flux-plunger/templates/rbac.yaml
  • packages/system/flux-plunger/templates/service.yaml
  • packages/system/flux-plunger/templates/serviceaccount.yaml
  • packages/system/flux-plunger/values.yaml
🧰 Additional context used
📓 Path-based instructions (6)
packages/system/**

📄 CodeRabbit inference engine (AGENTS.md)

Use Helm Chart umbrella pattern with vendored upstream charts in charts/ directory

Files:

  • packages/system/flux-plunger/Makefile
  • packages/system/flux-plunger/templates/serviceaccount.yaml
  • packages/system/flux-plunger/templates/deployment.yaml
  • packages/system/flux-plunger/templates/service.yaml
  • packages/system/flux-plunger/images/flux-plunger/Dockerfile
  • packages/system/flux-plunger/Chart.yaml
  • packages/system/flux-plunger/templates/rbac.yaml
  • packages/system/flux-plunger/values.yaml
**/*.yaml

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms

Files:

  • packages/system/flux-plunger/templates/serviceaccount.yaml
  • packages/system/flux-plunger/templates/deployment.yaml
  • packages/system/flux-plunger/templates/service.yaml
  • packages/system/flux-plunger/Chart.yaml
  • packages/system/flux-plunger/templates/rbac.yaml
  • packages/core/platform/bundles/paas-full.yaml
  • packages/system/flux-plunger/values.yaml
  • packages/core/platform/sources/flux-plunger.yaml
cmd/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory

Files:

  • cmd/flux-plunger/main.go
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)

Files:

  • cmd/flux-plunger/main.go
  • internal/controller/fluxplunger/flux_plunger.go
internal/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in internal directory

Files:

  • internal/controller/fluxplunger/flux_plunger.go
packages/core/**

📄 CodeRabbit inference engine (AGENTS.md)

Use Helm Chart umbrella pattern with vendored upstream charts in charts/ directory

Files:

  • packages/core/platform/bundles/paas-full.yaml
  • packages/core/platform/sources/flux-plunger.yaml
🧠 Learnings (6)
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to cmd/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory

Applied to files:

  • cmd/flux-plunger/main.go
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to packages/system/** : Use Helm Chart umbrella pattern with vendored upstream charts in `charts/` directory

Applied to files:

  • packages/system/flux-plunger/Chart.yaml
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to **/*.yaml : Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms

Applied to files:

  • packages/system/flux-plunger/Chart.yaml
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to packages/core/** : Use Helm Chart umbrella pattern with vendored upstream charts in `charts/` directory

Applied to files:

  • packages/system/flux-plunger/Chart.yaml
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to packages/apps/** : Use Helm Chart umbrella pattern with vendored upstream charts in `charts/` directory

Applied to files:

  • packages/system/flux-plunger/Chart.yaml
📚 Learning: 2025-12-25T15:24:00.888Z
Learnt from: kvaps
Repo: cozystack/cozystack PR: 1760
File: packages/core/platform/sources/goldpinger.yaml:20-23
Timestamp: 2025-12-25T15:24:00.888Z
Learning: In Goldpinger YAML configurations (e.g., packages/core/platform/sources/goldpinger.yaml), when hostNetwork is enabled, set privileged: true to ensure required host-level capabilities for cross-node network monitoring. Apply this rule to similar Goldpinger YAML files under the same directory. Confirm privileges are limited to scenarios that truly require host access and are scoped to this component.

Applied to files:

  • packages/core/platform/sources/flux-plunger.yaml
🧬 Code graph analysis (1)
internal/controller/fluxplunger/flux_plunger.go (2)
pkg/apiserver/apiserver.go (1)
  • Scheme (53-53)
internal/controller/dashboard/manager.go (1)
  • Manager (45-48)
🪛 checkmake (0.2.2)
packages/system/flux-plunger/Makefile

[warning] 7-7: Target body for "image" exceeds allowed length of 5 (12).

(maxbodylength)


[warning] 1-1: Missing required phony target "all"

(minphony)


[warning] 1-1: Missing required phony target "clean"

(minphony)


[warning] 1-1: Missing required phony target "test"

(minphony)

🪛 YAMLlint (1.37.1)
packages/system/flux-plunger/templates/deployment.yaml

[error] 26-26: syntax error: could not find expected ':'

(syntax)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (8)
packages/core/platform/bundles/paas-full.yaml (1)

61-65: LGTM! Release entry follows established patterns.

The flux-plunger release configuration is consistent with other plunger components in the bundle (e.g., kubeovn-plunger) and follows the established pattern for release definitions.

packages/system/flux-plunger/templates/serviceaccount.yaml (1)

1-8: LGTM! ServiceAccount definition is correct.

The ServiceAccount is properly configured with appropriate labels and explicitly enables token mounting, which is required for the controller to interact with the Kubernetes API.

packages/system/flux-plunger/Chart.yaml (1)

1-6: LGTM! Chart metadata is appropriate for a custom component.

The Chart.yaml follows standard Helm conventions. The umbrella pattern guideline applies to vendored upstream charts, but since flux-plunger is a custom controller developed in this repository, a simple application chart is appropriate.

packages/system/flux-plunger/templates/rbac.yaml (1)

17-24: Runtime filtering is already properly implemented.

The controller's listHelmReleaseSecrets() function correctly scopes access to the reconciled namespace, filters secrets by labels (name and owner: helm), and by type (helm.sh/release.v1). Secrets are only deleted after multiple validity checks during specific error conditions.

The broad RBAC rule is a necessary trade-off in Kubernetes—RBAC rules cannot express label selectors, so controllers use broad rules paired with proper runtime filtering. This implementation is sound and does not pose an unmitigated security risk.

Likely an incorrect or invalid review comment.

packages/system/flux-plunger/templates/service.yaml (1)

1-18: LGTM!

The Service template correctly exposes the metrics (8080) and health (8081) endpoints with matching labels and selectors for the flux-plunger deployment.

internal/controller/fluxplunger/flux_plunger.go (3)

1-25: LGTM!

The imports and constants are appropriate. Using flux-client-side-apply as the field manager aligns with the PR objective of working alongside FluxCD's field ownership.


135-164: Remediation flow logic is sound.

The sequence of suspend → delete → annotate → unsuspend correctly implements the recovery mechanism. The annotation check at lines 119-133 properly prevents re-processing by comparing latestVersion+1 == processedVersion.

Note: The atomicity concern was addressed in the previous comment regarding crash recovery.


286-300: Predicate filter is appropriate but has an edge case.

The predicate correctly limits reconciliation to HelmReleases with the target error. However, this relies on the assumption that suspended HelmReleases retain their error status condition (which is true since Flux won't reconcile suspended releases).

This design choice means the controller won't react to HelmReleases that are suspended without the error condition—which is fine for externally-suspended releases but could be problematic in the crash-recovery scenario flagged earlier.

Add flux-plunger controller to automatically fix HelmRelease resources
with "has no deployed releases" error.

The controller watches HelmRelease resources and performs the following:
- Detects HelmRelease with "has no deployed releases" error
- Suspends the HelmRelease with flux-client-side-apply field manager
- Deletes the latest Helm release secret
- Updates annotation with processed version to prevent recursive deletion
- Unsuspends the HelmRelease to allow Flux to retry

Special handling for suspended HelmRelease:
- If suspend=true and latest+1==processed: removes suspend
- Otherwise: skips processing as suspended by external process

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Andrei Kvapil <[email protected]>
@kvaps kvaps force-pushed the feat/flux-plunger branch from 22ab852 to fc2b687 Compare January 9, 2026 18:10
releaseName: flux-plunger
chart: cozy-flux-plunger
namespace: cozy-fluxcd
dependsOn: [cilium,kubeovn]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dependsOn: [cilium,kubeovn]
dependsOn: [cilium,kubeovn,multus]

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In @cmd/flux-plunger/main.go:
- Around line 53-104: The enable-http2 flag description incorrectly references
webhook servers; update the flag.BoolVar call that sets enableHTTP2 (the flag
registered with flag.BoolVar(&enableHTTP2, "enable-http2", ...)) to use the
corrected description string "If set, HTTP/2 will be enabled for the metrics
server" and ensure any related comments or documentation near the
metricsServerOptions configuration reflect that only the metrics server TLS
options are affected.

In @internal/controller/fluxplunger/flux_plunger.go:
- Around line 290-306: The event filter in SetupWithManager currently only
returns true when hasNoDeployedReleasesError(hr), which can miss suspended
releases after a crash; update the predicate used in WithEventFilter(pred) so it
also returns true when hr.Spec.Suspend == true AND the controller's
ownership/in-progress annotation is present (create a helper like
hasPlungerAnnotation(hr) to check the annotation), or alternatively remove
WithEventFilter(pred) and perform the same checks early in Reconcile
(fast-return when irrelevant) so suspended-but-owned HelmReleases are still
enqueued for recovery; modify SetupWithManager/ predicate and ensure
hasNoDeployedReleasesError and the new annotation check are the deciding
conditions.
- Around line 44-89: When suspending a HelmRelease we must mark ownership so
retries can resume work; update the suspend path (the code that calls
suspendHelmRelease or the suspendHelmRelease implementation) to add an
annotation like "flux-plunger/suspended-by=flux-plunger" (or "in-progress") and
persist it atomically with the spec.suspend change, and ensure any subsequent
steps record the last-processed version annotation
(annotationLastProcessedVersion). In the suspended-branch (the block that calls
r.listHelmReleaseSecrets and currently returns early when hr.Spec.Suspend)
detect that annotation: if "flux-plunger/suspended-by" (or "in-progress") is
present and owned by us, continue the remediation sequence (delete Helm release
secret, update annotationLastProcessedVersion, then call r.unsuspendHelmRelease)
instead of skipping; only treat external suspends (no owned annotation) as skip.
Ensure unsuspendHelmRelease only clears suspend when the suspended-by annotation
matches and remove that annotation when done.
- Around line 140-152: The secret deletion should be idempotent: when calling
r.Delete(ctx, &latestSecret) handle a NotFound error as success instead of
returning an error; change the deletion error handling in the block that logs
"Deleting latest Helm release secret" to check
k8s.io/apimachinery/pkg/api/errors.IsNotFound(err) (or use
client.IgnoreNotFound) and only return on non-NotFound errors, optionally
logging a debug/info that the secret was already absent, then proceed to call
r.updateProcessedVersionAnnotation(ctx, hr, latestVersion) as before.
- Around line 20-24: The current use of MergeFrom (client.MergeFrom) with
fieldManager (variable fieldManager) means client.FieldOwner() is ignored and
managedFields is not updated, so relying on annotationLastProcessedVersion +
version heuristics is unsafe; fix by either (A) adding an explicit annotation
flag like flux-plunger.cozystack.io/suspended-by-controller:"true" when
suspending and checking/removing it when unsuspending, or (B) change the
suspend/unsuspend calls to use Server-Side Apply via client.Apply with the
fieldManager to ensure ownership is recorded in managedFields (and remove the
unreliable version-heuristic logic); pick one approach and update the code paths
that currently call MergeFrom to use the chosen method and checks.
🧹 Nitpick comments (3)
packages/system/flux-plunger/templates/serviceaccount.yaml (1)

1-8: Consider omitting automountServiceAccountToken: true unless you’re intentionally overriding a default.

It’s redundant in most clusters (default is already true), but keeping it is fine if you’re guarding against namespace/SA defaults being changed.

packages/system/flux-plunger/templates/deployment.yaml (1)

37-55: Probes look aligned with :8081 health endpoints; consider making resources configurable via values.

internal/controller/fluxplunger/flux_plunger.go (1)

203-239: Avoid treating parse failures as version 0 (can select the wrong “latest” secret).

extractVersionNumber() returning 0 on parse error means an unexpected secret name can skew sorting/selection. Prefer returning (int, error) and skipping invalid entries (with a log), or treat invalid as lowest and log loudly.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22ab852 and fc2b687.

📒 Files selected for processing (12)
  • cmd/flux-plunger/main.go
  • internal/controller/fluxplunger/flux_plunger.go
  • packages/core/platform/bundles/paas-full.yaml
  • packages/core/platform/sources/flux-plunger.yaml
  • packages/system/flux-plunger/Chart.yaml
  • packages/system/flux-plunger/Makefile
  • packages/system/flux-plunger/images/flux-plunger/Dockerfile
  • packages/system/flux-plunger/templates/deployment.yaml
  • packages/system/flux-plunger/templates/rbac.yaml
  • packages/system/flux-plunger/templates/service.yaml
  • packages/system/flux-plunger/templates/serviceaccount.yaml
  • packages/system/flux-plunger/values.yaml
🚧 Files skipped from review as they are similar to previous changes (7)
  • packages/system/flux-plunger/Chart.yaml
  • packages/core/platform/sources/flux-plunger.yaml
  • packages/system/flux-plunger/values.yaml
  • packages/system/flux-plunger/templates/service.yaml
  • packages/system/flux-plunger/images/flux-plunger/Dockerfile
  • packages/core/platform/bundles/paas-full.yaml
  • packages/system/flux-plunger/templates/rbac.yaml
🧰 Additional context used
📓 Path-based instructions (5)
cmd/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory

Files:

  • cmd/flux-plunger/main.go
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)

Files:

  • cmd/flux-plunger/main.go
  • internal/controller/fluxplunger/flux_plunger.go
packages/system/**

📄 CodeRabbit inference engine (AGENTS.md)

Use Helm Chart umbrella pattern with vendored upstream charts in charts/ directory

Files:

  • packages/system/flux-plunger/templates/serviceaccount.yaml
  • packages/system/flux-plunger/Makefile
  • packages/system/flux-plunger/templates/deployment.yaml
**/*.yaml

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms

Files:

  • packages/system/flux-plunger/templates/serviceaccount.yaml
  • packages/system/flux-plunger/templates/deployment.yaml
internal/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in internal directory

Files:

  • internal/controller/fluxplunger/flux_plunger.go
🧠 Learnings (3)
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to cmd/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory

Applied to files:

  • cmd/flux-plunger/main.go
📚 Learning: 2025-06-26T04:29:24.830Z
Learnt from: NickVolynkin
Repo: cozystack/cozystack PR: 1117
File: packages/apps/mysql/Makefile:8-8
Timestamp: 2025-06-26T04:29:24.830Z
Learning: The cozystack project uses yq v4+ on their CI runner, so yq v4 syntax (-o json --indent 4) is compatible and version checks are not needed.

Applied to files:

  • packages/system/flux-plunger/Makefile
📚 Learning: 2025-07-14T16:23:12.803Z
Learnt from: NickVolynkin
Repo: cozystack/cozystack PR: 1196
File: packages/apps/http-cache/Makefile:24-27
Timestamp: 2025-07-14T16:23:12.803Z
Learning: In the cozystack repository, the `readme-generator` tool removes enum contents from values.schema.json files during its operation. Therefore, when using readme-generator in Makefiles, any enum values need to be injected back into the schema using yq commands after readme-generator has run, not before.

Applied to files:

  • packages/system/flux-plunger/Makefile
🧬 Code graph analysis (1)
internal/controller/fluxplunger/flux_plunger.go (1)
internal/controller/dashboard/manager.go (1)
  • Manager (45-48)
🪛 checkmake (0.2.2)
packages/system/flux-plunger/Makefile

[warning] 7-7: Target body for "image" exceeds allowed length of 5 (12).

(maxbodylength)


[warning] 1-1: Missing required phony target "all"

(minphony)


[warning] 1-1: Missing required phony target "clean"

(minphony)


[warning] 1-1: Missing required phony target "test"

(minphony)

🪛 YAMLlint (1.37.1)
packages/system/flux-plunger/templates/deployment.yaml

[error] 26-26: syntax error: could not find expected ':'

(syntax)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (5)
packages/system/flux-plunger/Makefile (3)

1-5: LGTM!

The environment variable exports and include statements follow the expected pattern for system packages in this repository.


17-19: LGTM!

The digest extraction and values.yaml update workflow is well-structured. The yq v4+ syntax is correct per project standards, and cleanup of the temporary metadata file is good practice.


7-16: No changes required; Dockerfile path and images directory are correct.

The Dockerfile path images/flux-plunger/Dockerfile correctly resolves to the existing file at packages/system/flux-plunger/images/flux-plunger/Dockerfile. The images/ directory exists in the repository, so the metadata file can be written without issue. This pattern is consistent across all system packages in the codebase.

Likely an incorrect or invalid review comment.

cmd/flux-plunger/main.go (1)

105-150: Manager wiring + health/ready checks look correct.

internal/controller/fluxplunger/flux_plunger.go (1)

177-201: The label selectors for Helm release secrets are correct.

The labels name: <releaseName> and owner: helm match how Flux/Helm's helm-controller labels Helm release secrets, and they are sufficient to identify only that HelmRelease's helm.sh/release.v1 secrets. The additional type filter ensures only Helm v3 release secrets are returned. No changes needed.

Comment on lines 140 to 152
// Delete the latest secret
logger.Info("Deleting latest Helm release secret", "secret", latestSecret.Name)
if err := r.Delete(ctx, &latestSecret); err != nil {
logger.Error(err, "Failed to delete Helm release secret")
return ctrl.Result{}, err
}

// Update annotation with processed version
logger.Info("Updating annotation with processed version", "version", latestVersion)
if err := r.updateProcessedVersionAnnotation(ctx, hr, latestVersion); err != nil {
logger.Error(err, "Failed to update annotation")
return ctrl.Result{}, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make secret deletion idempotent (ignore NotFound).

Proposed fix
-	if err := r.Delete(ctx, &latestSecret); err != nil {
+	if err := r.Delete(ctx, &latestSecret); err != nil {
+		if client.IgnoreNotFound(err) == nil {
+			logger.Info("Helm release secret already deleted", "secret", latestSecret.Name)
+		} else {
 			logger.Error(err, "Failed to delete Helm release secret")
 			return ctrl.Result{}, err
+		}
 	}
🤖 Prompt for AI Agents
In @internal/controller/fluxplunger/flux_plunger.go around lines 140 - 152, The
secret deletion should be idempotent: when calling r.Delete(ctx, &latestSecret)
handle a NotFound error as success instead of returning an error; change the
deletion error handling in the block that logs "Deleting latest Helm release
secret" to check k8s.io/apimachinery/pkg/api/errors.IsNotFound(err) (or use
client.IgnoreNotFound) and only return on non-NotFound errors, optionally
logging a debug/info that the secret was already absent, then proceed to call
r.updateProcessedVersionAnnotation(ctx, hr, latestVersion) as before.

Comment on lines +24 to +32
args:
{{- if .Values.debug }}
- --zap-log-level=debug
{{- else }}
- --zap-log-level=info
{{- end }}
- --metrics-bind-address=:8080
- --metrics-secure=false
ports:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find CI configuration files
find . -type f \( -name ".github" -o -name ".gitlab-ci.yml" -o -name ".circleci" \) -o -path "*/.github/workflows/*" 2>/dev/null | head -20

Repository: cozystack/cozystack

Length of output: 362


🏁 Script executed:

#!/bin/bash
# List files to identify CI setup
ls -la .github/workflows/ 2>/dev/null || echo "No GitHub Actions found"
cat .gitlab-ci.yml 2>/dev/null | head -50 || echo "No GitLab CI found"

Repository: cozystack/cozystack

Length of output: 716


🏁 Script executed:

#!/bin/bash
# Find the flux-plunger deployment.yaml file
find . -path "*/flux-plunger/templates/deployment.yaml" -type f

Repository: cozystack/cozystack

Length of output: 119


🏁 Script executed:

#!/bin/bash
# Search for yamllint in the repository (config files, workflows, etc.)
rg -i "yamllint" --type yaml --type json --type shell -A 5

Repository: cozystack/cozystack

Length of output: 92


🏁 Script executed:

#!/bin/bash
# Check pre-commit configuration for yamllint
cat .github/workflows/pre-commit.yml

Repository: cozystack/cozystack

Length of output: 1452


🏁 Script executed:

#!/bin/bash
# Read the deployment.yaml file
cat packages/system/flux-plunger/templates/deployment.yaml

Repository: cozystack/cozystack

Length of output: 1455


🏁 Script executed:

#!/bin/bash
# Search for if/else control flow patterns in helm templates
rg "{{-\s*if" packages/system --type yaml -B 2 -A 2 | head -100

Repository: cozystack/cozystack

Length of output: 146


🏁 Script executed:

#!/bin/bash
# Search for ternary usage in helm templates
rg "ternary" packages/system --type yaml -B 1 -A 1

Repository: cozystack/cozystack

Length of output: 26904


🏁 Script executed:

#!/bin/bash
# Find and read .pre-commit-config.yaml
find . -name ".pre-commit-config.yaml" -type f

Repository: cozystack/cozystack

Length of output: 88


🏁 Script executed:

#!/bin/bash
# Check the structure of flux-plunger
ls -la packages/system/flux-plunger/

Repository: cozystack/cozystack

Length of output: 482


🏁 Script executed:

#!/bin/bash
# Check if flux-plunger is a custom chart or has a charts/ directory
find packages/system/flux-plunger -type f | head -20

Repository: cozystack/cozystack

Length of output: 456


🏁 Script executed:

#!/bin/bash
# Search for yamllint configuration in the repo
rg "yamllint" --type yaml -B 2 -A 5

Repository: cozystack/cozystack

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Read the pre-commit configuration
cat .pre-commit-config.yaml

Repository: cozystack/cozystack

Length of output: 556


🏁 Script executed:

#!/bin/bash
# Search for helm or yaml linting in pre-commit hooks and CI
rg "helm|lint" --type yaml -B 1 -A 2 .github/workflows/ | head -100

Repository: cozystack/cozystack

Length of output: 242


🏁 Script executed:

#!/bin/bash
# Check the Makefile in flux-plunger
cat packages/system/flux-plunger/Makefile

Repository: cozystack/cozystack

Length of output: 815


🏁 Script executed:

#!/bin/bash
# Check values.yaml
cat packages/system/flux-plunger/values.yaml

Repository: cozystack/cozystack

Length of output: 191


🏁 Script executed:

#!/bin/bash
# Check if there's any helm-lint or validation tool in the repo
find . -name "Makefile" -o -name "*.sh" -o -name "tox.ini" | xargs rg -l "helm|lint|validate" 2>/dev/null | head -10

Repository: cozystack/cozystack

Length of output: 481


🏁 Script executed:

#!/bin/bash
# Search for 'debug' field in flux-plunger related files
rg "debug" packages/system/flux-plunger/ -B 2 -A 2

Repository: cozystack/cozystack

Length of output: 558


🏁 Script executed:

#!/bin/bash
# Search more broadly for values schema or documentation
find packages/system/flux-plunger -type f -name "*.md" -o -name "*.txt" -o -name "schema.json"

Repository: cozystack/cozystack

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Check if there's a values schema or documentation
cat packages/system/flux-plunger/Chart.yaml

Repository: cozystack/cozystack

Length of output: 348


🏁 Script executed:

#!/bin/bash
# Test if the if/else pattern actually causes YAML parsing issues
# Create a test file with the original pattern
cat > /tmp/test-original.yaml << 'EOF'
args:
{{- if .Values.debug }}
- --zap-log-level=debug
{{- else }}
- --zap-log-level=info
{{- end }}
- --metrics-bind-address=:8080
EOF

# Try to parse it with a YAML parser (before helm template rendering)
python3 << 'PYEOF'
import yaml
try:
    with open('/tmp/test-original.yaml', 'r') as f:
        content = f.read()
    # Try to parse the raw template (without rendering)
    yaml.safe_load(content)
    print("YAML parsing: SUCCESS (no error)")
except yaml.YAMLError as e:
    print(f"YAML parsing ERROR: {e}")
except Exception as e:
    print(f"ERROR: {e}")
PYEOF

Repository: cozystack/cozystack

Length of output: 292


🏁 Script executed:

#!/bin/bash
# Test if the proposed ternary fix is valid YAML
cat > /tmp/test-proposed.yaml << 'EOF'
args:
- '--zap-log-level={{ ternary "debug" "info" .Values.debug }}'
- --metrics-bind-address=:8080
- --metrics-secure=false
EOF

python3 << 'PYEOF'
import yaml
try:
    with open('/tmp/test-proposed.yaml', 'r') as f:
        content = f.read()
    result = yaml.safe_load(content)
    print("Proposed fix YAML parsing: SUCCESS")
    print(f"Parsed structure: {result}")
except yaml.YAMLError as e:
    print(f"Proposed fix YAML parsing ERROR: {e}")
except Exception as e:
    print(f"ERROR: {e}")
PYEOF

Repository: cozystack/cozystack

Length of output: 246


🏁 Script executed:

#!/bin/bash
# Check if similar debug/ternary patterns exist in other custom charts
rg "debug" packages/system --type yaml | grep -v "charts/" | head -20

Repository: cozystack/cozystack

Length of output: 2566


🏁 Script executed:

#!/bin/bash
# Look for examples of ternary in args sections
rg "args:" packages/system -A 5 --type yaml | grep -B 2 -A 3 "ternary"

Repository: cozystack/cozystack

Length of output: 45


Fix template YAML syntax: if/else in args sequence is invalid without Helm rendering.

The current pattern is syntactically invalid YAML when unrendered (breaks parsers at line 26-27). Replace with a ternary expression to keep the template valid:

Proposed fix
         args:
-        {{- if .Values.debug }}
-        - --zap-log-level=debug
-        {{- else }}
-        - --zap-log-level=info
-        {{- end }}
+        - '--zap-log-level={{ ternary "debug" "info" .Values.debug }}'
         - --metrics-bind-address=:8080
         - --metrics-secure=false

Also add the missing field to values.yaml:

-image: "ghcr.io/cozystack/cozystack/flux-plunger:latest@..."
+image: "ghcr.io/cozystack/cozystack/flux-plunger:latest@..."
+debug: false
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
args:
{{- if .Values.debug }}
- --zap-log-level=debug
{{- else }}
- --zap-log-level=info
{{- end }}
- --metrics-bind-address=:8080
- --metrics-secure=false
ports:
args:
- '--zap-log-level={{ ternary "debug" "info" .Values.debug }}'
- --metrics-bind-address=:8080
- --metrics-secure=false
ports:
🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 26-26: syntax error: could not find expected ':'

(syntax)

- Remove unused webhook server configuration
- Add health probes (liveness and readiness) to deployment
- Add resource requests and limits to deployment
- Fix optimistic lock conflicts by checking current state before patching
- Fix potential deadlock in crash recovery by updating predicate filter
- Fix namespace in PackageSource (cozy-fluxcd)

The predicate now watches both:
1. HelmReleases with "has no deployed releases" error
2. Suspended HelmReleases with flux-plunger annotation (crash recovery)

This prevents deadlock when controller crashes after suspend but before
unsuspend, and FluxCD removes error from status.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Andrei Kvapil <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @internal/controller/fluxplunger/flux_plunger.go:
- Around line 44-89: When HR is suspended and the last-processed annotation is
missing or cannot be parsed, implement a recovery/fallback so the controller
doesn't permanently skip processing: in the suspend-handling block that calls
listHelmReleaseSecrets and computes latestVersion (use hr.Spec.Suspend,
listHelmReleaseSecrets, getLatestSecret, extractVersionNumber,
annotationLastProcessedVersion), treat a missing/invalid annotation as a
potential crash scenario and attempt recovery by unsuspending when appropriate
(at minimum: if annotation is missing/invalid AND len(secrets)==0 then call
unsuspendHelmRelease; otherwise requeue for immediate reconcile when parse fails
so we can re-check state instead of silently skipping). Ensure errors from
unsuspendHelmRelease are handled the same way as before.
🧹 Nitpick comments (1)
internal/controller/fluxplunger/flux_plunger.go (1)

203-239: Consider adding defensive nil/empty check in getLatestSecret.

While callers currently check len(secrets) > 0 before calling, adding a defensive check would make the function safer for future use and prevent potential panics.

Suggested improvement
 // getLatestSecret returns the secret with the highest version number
 func getLatestSecret(secrets []corev1.Secret) corev1.Secret {
+	if len(secrets) == 0 {
+		return corev1.Secret{}
+	}
 	if len(secrets) == 1 {
 		return secrets[0]
 	}
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc2b687 and 179c381.

📒 Files selected for processing (2)
  • cmd/flux-plunger/main.go
  • internal/controller/fluxplunger/flux_plunger.go
🧰 Additional context used
📓 Path-based instructions (3)
cmd/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory

Files:

  • cmd/flux-plunger/main.go
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)

Files:

  • cmd/flux-plunger/main.go
  • internal/controller/fluxplunger/flux_plunger.go
internal/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in internal directory

Files:

  • internal/controller/fluxplunger/flux_plunger.go
🧠 Learnings (1)
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to cmd/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory

Applied to files:

  • cmd/flux-plunger/main.go
🧬 Code graph analysis (2)
cmd/flux-plunger/main.go (3)
internal/controller/dashboard/manager.go (1)
  • NewManager (51-54)
pkg/apiserver/apiserver.go (1)
  • Scheme (53-53)
internal/controller/fluxplunger/flux_plunger.go (1)
  • FluxPlunger (27-29)
internal/controller/fluxplunger/flux_plunger.go (1)
internal/controller/dashboard/manager.go (1)
  • Manager (45-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (6)
cmd/flux-plunger/main.go (2)

1-51: LGTM - Standard kubebuilder scaffold setup.

The license header, imports, and scheme registration follow controller-runtime patterns correctly. The scheme properly registers both core Kubernetes types and Flux HelmRelease types. Based on coding guidelines for cmd/**/*.go.


53-150: LGTM - Main function follows controller-runtime best practices.

The manager configuration is well-structured with:

  • Proper flag parsing for operational flexibility
  • HTTP/2 disabled by default with documented security rationale (CVE references)
  • Appropriate leader election ID namespacing
  • Standard health/readiness probe setup
  • Proper error handling with informative log messages

Based on coding guidelines for controller-runtime patterns.

internal/controller/fluxplunger/flux_plunger.go (4)

20-29: LGTM - Constants and struct definition.

The constants are well-defined with clear purposes, and the FluxPlunger struct follows the standard controller-runtime pattern of embedding client.Client.


91-163: LGTM - Error handling path with proper safeguards.

The reconciliation logic correctly:

  • Checks for the specific error condition before processing
  • Prevents reprocessing with the latestVersion+1 == processedVersion check
  • Re-fetches the HelmRelease before each mutation to avoid stale writes
  • Uses optimistic locking to handle concurrent modifications
  • Returns errors to let the workqueue retry on failures

241-298: LGTM - Mutation functions are well-implemented.

The suspend/unsuspend and annotation update functions correctly:

  • Re-fetch the resource to get the latest state (addressing optimistic lock conflicts)
  • Use MergeFromWithOptimisticLock for conflict detection
  • Check for idempotency (already suspended/unsuspended) to avoid unnecessary patches
  • Use a consistent field manager for server-side apply semantics

300-331: LGTM - Controller setup with appropriate predicate filtering.

The predicate correctly filters for:

  1. HelmReleases with the specific error condition (primary use case)
  2. Suspended HelmReleases with the annotation (crash recovery)

This aligns with the PR objective of handling crash recovery and avoids unnecessary reconciliation of unrelated HelmReleases.

…flicts

When FluxCD and flux-plunger both update the same HelmRelease,
optimistic lock conflicts occur. Previously, returning these errors
caused controller-runtime to immediately requeue, creating an
infinite loop while the object was being modified by Flux.

Now we log conflicts as Info and return success, allowing
controller-runtime to naturally wait for the next update event.
This eliminates the reconcile storm and allows both controllers
to coexist peacefully.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Andrei Kvapil <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @internal/controller/fluxplunger/flux_plunger.go:
- Around line 20-24: The code passes client.FieldOwner(fieldManager) into
MergeFromWithOptions (used when creating merge patches via MergeFromWithOptions)
which is ignored for JSON Merge Patch; remove the FieldOwner() option from all
MergeFromWithOptions calls (the ones near where MergeFromWithOptions is invoked)
if you don't need ownership tracking, or change the patch strategy to
server-side apply by switching those Patch calls to use client.Apply with
appropriate ApplyOptions and keep a meaningful fieldManager value; also update
the fieldManager constant (currently "flux-client-side-apply") to a
non-misleading name if you remove SSA usage.
🧹 Nitpick comments (3)
internal/controller/fluxplunger/flux_plunger.go (3)

34-165: Consider refactoring the long Reconcile method.

The Reconcile method is 131 lines long and handles multiple distinct flows (suspended path, main remediation path). For improved readability and testability, consider extracting logical blocks into helper methods:

  • handleSuspendedHelmRelease(ctx, hr) (ctrl.Result, error)
  • remediateHelmRelease(ctx, hr) (ctrl.Result, error)

This would make the main Reconcile flow easier to follow and each path easier to unit test.


230-241: Add defensive handling for malformed secret names.

extractVersionNumber returns 0 when parsing fails (line 238). If multiple secrets have malformed names (or legitimately use v0), they'll all be treated as version 0, making the "latest" selection in getLatestSecret arbitrary.

While this is an edge case, consider:

  1. Logging a warning when version parsing fails to aid debugging
  2. Optionally, skipping secrets with unparseable versions rather than treating them as v0
💡 Suggested improvement
 func extractVersionNumber(secretName string) int {
     version := extractVersionFromSecretName(secretName)
     // Remove 'v' prefix if present
     version = strings.TrimPrefix(version, "v")
     num, err := strconv.Atoi(version)
     if err != nil {
+        // Log malformed version for visibility
+        // Note: can't access logger here without refactoring
         return 0
     }
     return num
 }

Alternatively, return an error or sentinel value (-1) to allow callers to filter out malformed entries.


243-300: Mutation helpers follow correct patterns.

The re-fetch-before-patch pattern (lines 246-250, 266-270, 286-290) and optimistic locking are appropriate for concurrent environments. The early-return checks (lines 253-255, 273-275) avoid unnecessary API calls.

Optional: The re-fetch pattern is duplicated across three functions. Consider extracting it into a helper like getLatestHelmRelease(ctx, hr) (*helmv2.HelmRelease, error) to reduce duplication.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 179c381 and 98ed9ac.

📒 Files selected for processing (1)
  • internal/controller/fluxplunger/flux_plunger.go
🧰 Additional context used
📓 Path-based instructions (2)
internal/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Follow controller-runtime patterns and kubebuilder style for Go code in internal directory

Files:

  • internal/controller/fluxplunger/flux_plunger.go
**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT manually edit vendor/ directory or auto-generated files (zz_generated.*.go)

Files:

  • internal/controller/fluxplunger/flux_plunger.go
🧬 Code graph analysis (1)
internal/controller/fluxplunger/flux_plunger.go (1)
internal/controller/dashboard/manager.go (1)
  • Manager (45-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (4)
internal/controller/fluxplunger/flux_plunger.go (4)

31-32: LGTM!

RBAC permissions are appropriately scoped for the controller's operations.


135-140: Optimistic lock conflict handling is appropriate.

The pattern of logging conflicts as Info and returning nil (no error) at lines 136-139, 152-153, and 159-160 is correct given the PR objective to avoid requeue storms during concurrent updates by Flux and flux-plunger. The controller will be re-triggered by watch events when the object stabilizes.


167-203: LGTM!

The helper functions correctly identify Helm release errors and list Helm release secrets using standard Helm 3 conventions (labels name/owner and type helm.sh/release.v1).


302-333: Controller setup follows correct patterns.

The predicate logic correctly filters for relevant HelmRelease events (error condition or suspended with annotation for crash recovery). The setup uses standard controller-runtime patterns.

Note: The crash recovery predicate (lines 319-322) assumes the annotation exists, which ties into the crash recovery issue flagged earlier.

Comment on lines +20 to +24
const (
annotationLastProcessedVersion = "flux-plunger.cozystack.io/last-processed-version"
errorMessageNoDeployedReleases = "has no deployed releases"
fieldManager = "flux-client-side-apply"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

controller-runtime FieldOwner MergeFromWithOptions merge patch behavior

💡 Result:

Short answer: MergeFromWithOptions builds a JSON Merge Patch (client-side merge per RFC 7396) — not Server‑Side Apply — so it merges at the JSON/object level (maps merged, lists replaced entirely) and does not participate in Kubernetes “field ownership” the way SSA does. controller‑runtime’s MergeFromWithOptions supports options such as optimistic locking (adds resourceVersion checking) via MergeFromWithOptimisticLock. References:

  • controller‑runtime Patch/MergeFromWithOptions and MergeFromWithOptimisticLock docs. [1]
  • Kubernetes docs: differences between JSON merge patch (lists replaced) and strategic/SSA behavior. [2]
  • JSON Merge Patch (RFC 7396) specification. [3]
  • Explanation of SSA field ownership (why SSA removes fields you stop declaring). [4]

If you want field ownership semantics (managed fields / FieldManager), use Server‑Side Apply (Apply patch type) rather than MergeFrom/MergePatch.


🏁 Script executed:

cd internal/controller/fluxplunger && sed -n '260p;280p;299p' flux_plunger.go

Repository: cozystack/cozystack

Length of output: 275


Remove FieldOwner option or switch to Server-Side Apply.

The client.FieldOwner(fieldManager) option at lines 260, 280, and 299 has no effect with merge patches. According to controller-runtime and Kubernetes specifications, FieldOwner is only meaningful for Server-Side Apply (SSA), not for JSON Merge Patch operations. Using MergeFromWithOptions with FieldOwner() passes an ignored parameter.

Choose one:

  1. Remove the FieldOwner() option from merge patch calls if field ownership tracking is not required, OR
  2. Switch to Server-Side Apply (Patch with client.Apply) if you need field ownership semantics

Also reconsider the fieldManager constant value—setting it to "flux-client-side-apply" (Flux's field manager) is misleading if not actually used by Kubernetes field ownership tracking.

🤖 Prompt for AI Agents
In @internal/controller/fluxplunger/flux_plunger.go around lines 20 - 24, The
code passes client.FieldOwner(fieldManager) into MergeFromWithOptions (used when
creating merge patches via MergeFromWithOptions) which is ignored for JSON Merge
Patch; remove the FieldOwner() option from all MergeFromWithOptions calls (the
ones near where MergeFromWithOptions is invoked) if you don't need ownership
tracking, or change the patch strategy to server-side apply by switching those
Patch calls to use client.Apply with appropriate ApplyOptions and keep a
meaningful fieldManager value; also update the fieldManager constant (currently
"flux-client-side-apply") to a non-misleading name if you remove SSA usage.

Comment on lines +44 to +89
// Check if HelmRelease is suspended
if hr.Spec.Suspend {
logger.Info("HelmRelease is suspended, checking if we need to unsuspend")

// Get the list of Helm release secrets
secrets, err := r.listHelmReleaseSecrets(ctx, hr.Namespace, hr.Name)
if err != nil {
logger.Error(err, "Failed to list Helm release secrets")
return ctrl.Result{}, err
}

// If no secrets, treat latest version as 0
latestVersion := 0
if len(secrets) > 0 {
latestSecret := getLatestSecret(secrets)
latestVersion = extractVersionNumber(latestSecret.Name)
} else {
logger.Info("No Helm release secrets found while suspended, treating as version 0")
}

// Check if version is previous to just processed (latestVersion+1 == processedVersion)
// This is the ONLY condition when we unsuspend
shouldUnsuspend := false
if hr.Annotations != nil {
if processedVersionStr, exists := hr.Annotations[annotationLastProcessedVersion]; exists {
processedVersion, err := strconv.Atoi(processedVersionStr)
if err == nil && latestVersion+1 == processedVersion {
shouldUnsuspend = true
}
}
}

if shouldUnsuspend {
// Unsuspend the HelmRelease
logger.Info("Secret was already deleted in previous run, removing suspend", "latest", latestVersion, "processed", latestVersion+1)
if err := r.unsuspendHelmRelease(ctx, hr); err != nil {
logger.Info("Could not unsuspend HelmRelease, will retry on next reconcile", "error", err.Error())
return ctrl.Result{}, nil
}
return ctrl.Result{}, nil
}

// If not previous to processed, skip all actions
logger.Info("HelmRelease is suspended by external process, skipping", "latest", latestVersion)
return ctrl.Result{}, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical crash recovery deadlock - HelmRelease can get permanently stuck.

The crash recovery logic has a critical flaw that can leave a HelmRelease permanently suspended:

Deadlock Scenario:

  1. Controller suspends HelmRelease (line 135)
  2. Controller deletes latest secret (line 144)
  3. Controller crashes before updating annotation (line 151)
  4. On restart:
    • HelmRelease is suspended: true
    • Latest version is now N-1 (deleted secret was version N)
    • Annotation is missing (crashed before setting it)
    • Line 70 check: latestVersion+1 == processedVersion fails (no annotation)
    • shouldUnsuspend = false
    • Controller skips all actions (line 88)
  5. Result: HelmRelease remains permanently suspended

The predicate (lines 319-322) only matches suspended HelmReleases that have the annotation, so the crashed case won't even trigger reconciliation unless the error persists.

Recommended fix:

Handle the suspended-without-annotation case by checking if we're the ones who suspended it (using field manager metadata) OR add logic to detect partial completion and resume the operation.

🔧 Suggested approach

Add a check in the suspended block to handle the crash scenario:

 if hr.Spec.Suspend {
     logger.Info("HelmRelease is suspended, checking if we need to unsuspend")
     
+    // Check if we have the annotation - if not, this might be a crash recovery
+    hasAnnotation := hr.Annotations != nil && hr.Annotations[annotationLastProcessedVersion] != ""
+    
+    // If suspended without annotation but still has error, we crashed mid-process
+    // Continue with normal flow to complete the operation
+    if !hasAnnotation && hasNoDeployedReleasesError(hr) {
+        logger.Info("Suspended without annotation but has error - completing crashed operation")
+        // Fall through to main flow instead of returning
+    } else {
         // Get the list of Helm release secrets
         secrets, err := r.listHelmReleaseSecrets(ctx, hr.Namespace, hr.Name)
         // ... rest of existing logic
+    }
 }

Committable suggestion skipped: line range outside the PR's diff.

This PR prepares the release `v0.40.0`.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @packages/system/linstor/values.yaml:
- Line 4: Update the linstor image tagging to match the piraeusServer versioning
strategy: replace the linstorCSI tag that currently uses "latest" with a
consistent pinned tag (e.g., the same
"1.32.3@sha256:c6fbe1825514a3059088175968a651d7e10f477590023a33f1a9e6584fd98e04"
or another explicit sha256-tagged version). Locate the linstorCSI key in the
YAML and set its image tag to the chosen pinned value, or better yet refactor
both piraeusServer and linstorCSI to use a shared parameter (e.g.,
values.image.tag) so both images use the same versioning source.
🧹 Nitpick comments (2)
packages/system/dashboard/templates/configmap.yaml (1)

1-3: Avoid hardcoding the displayed version in the template (reduce churn/drift).

Bumping {{- $tenantText := "v0.40.0" }} works, but it forces template edits for every release and can drift from the chart/app version. Consider moving this default into values.yaml (still overrideable via _cluster.branding.tenantText) or deriving it from chart metadata.

Possible refactor (chart-driven default)
-{{- $tenantText := "v0.40.0" }}
+{{- $tenantText := (.Values._cluster.branding.defaultTenantText | default (printf "v%s" .Chart.AppVersion)) }}
packages/system/kamaji/values.yaml (1)

16-16: Reduce drift risk: avoid duplicating the digest between image.tag and --migrate-image.

The image digest appears hardcoded in both places. While the Makefile has a build rule to keep them in sync, a more robust approach is to construct --migrate-image from the chart values in the template layer. Consider defining a helper value (e.g., migrateImage) or using template logic to build --migrate-image from image.repository and image.tag, eliminating the manual synchronization requirement for future image updates.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 98ed9ac and 1685242.

📒 Files selected for processing (27)
  • packages/apps/http-cache/images/nginx-cache.tag
  • packages/apps/kubernetes/images/cluster-autoscaler.tag
  • packages/apps/kubernetes/images/kubevirt-cloud-provider.tag
  • packages/apps/kubernetes/images/kubevirt-csi-driver.tag
  • packages/apps/kubernetes/images/ubuntu-container-disk.tag
  • packages/apps/mysql/images/mariadb-backup.tag
  • packages/core/installer/values.yaml
  • packages/core/platform/values.yaml
  • packages/core/testing/values.yaml
  • packages/extra/bootbox/images/matchbox.tag
  • packages/extra/monitoring/images/grafana.tag
  • packages/extra/seaweedfs/images/objectstorage-sidecar.tag
  • packages/system/bucket/images/s3manager.tag
  • packages/system/cilium/values.yaml
  • packages/system/cozystack-api/values.yaml
  • packages/system/cozystack-controller/values.yaml
  • packages/system/dashboard/templates/configmap.yaml
  • packages/system/dashboard/values.yaml
  • packages/system/kamaji/values.yaml
  • packages/system/kubeovn-plunger/values.yaml
  • packages/system/kubeovn-webhook/values.yaml
  • packages/system/kubevirt-csi-node/values.yaml
  • packages/system/lineage-controller-webhook/values.yaml
  • packages/system/linstor/values.yaml
  • packages/system/metallb/values.yaml
  • packages/system/objectstorage-controller/values.yaml
  • packages/system/seaweedfs/values.yaml
✅ Files skipped from review due to trivial changes (5)
  • packages/system/cozystack-api/values.yaml
  • packages/system/kubeovn-webhook/values.yaml
  • packages/apps/http-cache/images/nginx-cache.tag
  • packages/system/metallb/values.yaml
  • packages/core/installer/values.yaml
🧰 Additional context used
📓 Path-based instructions (4)
packages/system/**

📄 CodeRabbit inference engine (AGENTS.md)

Use Helm Chart umbrella pattern with vendored upstream charts in charts/ directory

Files:

  • packages/system/kubeovn-plunger/values.yaml
  • packages/system/dashboard/values.yaml
  • packages/system/cilium/values.yaml
  • packages/system/kubevirt-csi-node/values.yaml
  • packages/system/linstor/values.yaml
  • packages/system/seaweedfs/values.yaml
  • packages/system/kamaji/values.yaml
  • packages/system/dashboard/templates/configmap.yaml
  • packages/system/lineage-controller-webhook/values.yaml
  • packages/system/cozystack-controller/values.yaml
  • packages/system/bucket/images/s3manager.tag
  • packages/system/objectstorage-controller/values.yaml
**/*.yaml

📄 CodeRabbit inference engine (AGENTS.md)

Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms

Files:

  • packages/system/kubeovn-plunger/values.yaml
  • packages/system/dashboard/values.yaml
  • packages/system/cilium/values.yaml
  • packages/core/platform/values.yaml
  • packages/system/kubevirt-csi-node/values.yaml
  • packages/system/linstor/values.yaml
  • packages/core/testing/values.yaml
  • packages/system/seaweedfs/values.yaml
  • packages/system/kamaji/values.yaml
  • packages/system/dashboard/templates/configmap.yaml
  • packages/system/lineage-controller-webhook/values.yaml
  • packages/system/cozystack-controller/values.yaml
  • packages/system/objectstorage-controller/values.yaml
packages/apps/**

📄 CodeRabbit inference engine (AGENTS.md)

Use Helm Chart umbrella pattern with vendored upstream charts in charts/ directory

Files:

  • packages/apps/kubernetes/images/kubevirt-csi-driver.tag
  • packages/apps/kubernetes/images/ubuntu-container-disk.tag
  • packages/apps/kubernetes/images/kubevirt-cloud-provider.tag
  • packages/apps/kubernetes/images/cluster-autoscaler.tag
  • packages/apps/mysql/images/mariadb-backup.tag
packages/core/**

📄 CodeRabbit inference engine (AGENTS.md)

Use Helm Chart umbrella pattern with vendored upstream charts in charts/ directory

Files:

  • packages/core/platform/values.yaml
  • packages/core/testing/values.yaml
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to packages/core/** : Use Helm Chart umbrella pattern with vendored upstream charts in `charts/` directory
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (21)
packages/apps/kubernetes/images/cluster-autoscaler.tag (1)

1-1: Digest bump: please verify provenance/compatibility of the new pinned image.
Since this is a digest-only change, can you confirm sha256:372ad087… exists in GHCR for ghcr.io/cozystack/cozystack/cluster-autoscaler:0.0.0, and (if you enforce it) that its signature/attestation checks still pass?

packages/apps/mysql/images/mariadb-backup.tag (1)

1-1: Verify that this change belongs in this PR.

This file updates the mariadb-backup image digest, but the PR objectives focus exclusively on adding a flux-plunger controller to remediate HelmRelease errors. There's no apparent connection between the flux-plunger functionality and the MySQL mariadb-backup image.

Please confirm:

  1. Is this change intentional and related to the flux-plunger work?
  2. Should this be in a separate PR focused on MySQL image updates?
  3. What changes are included in the new image digest that justify bundling it with this PR?

</comment_end>

packages/core/testing/values.yaml (1)

2-2: Good practice using both tag and digest, but digest verification requires manual confirmation outside this environment.

Using both a version tag and SHA256 digest provides immutable image references for reproducibility and security. The digest sha256:78a556a059411cac89249c2e22596223128a1c74fbfcac4787fd74e4a66f66a6 should be manually verified to match the v0.40.0 tag, as registry verification tools are not available in the review environment. You can verify this by pulling the image locally and comparing the digest:

docker pull ghcr.io/cozystack/cozystack/e2e-sandbox:v0.40.0
docker inspect ghcr.io/cozystack/cozystack/e2e-sandbox:v0.40.0 | grep -i digest
packages/system/cozystack-controller/values.yaml (1)

2-2: Version bump is consistent and digest is properly pinned.

Image tag and cozystackVersion both updated to v0.40.0 with no inconsistencies. The image uses a pinned SHA256 digest, which is a security best practice for reproducible deployments.

packages/system/objectstorage-controller/values.yaml (1)

3-3: Image reference format is correct; ensure v0.40.0 version is appropriate for this release.

The image update uses proper digest pinning for reproducibility. Verify that this minor version bump (0.38.2 → 0.40.0) is tested and intentional for the release scope.

packages/system/kubeovn-plunger/values.yaml (1)

3-3: This version bump is not unrelated scope creep; it is part of a coordinated platform-wide release to v0.40.0.

The v0.40.0 release spans multiple cozystack components (kubeovn-plunger, kubeovn-webhook, objectstorage-controller, lineage-controller-webhook, kamaji, dashboard, cozystack-api, cozystack-controller, and others), indicating this is a systematic platform release rather than an isolated, unintentional change bundled into the flux-plunger PR.

Likely an incorrect or invalid review comment.

packages/system/cilium/values.yaml (1)

16-19: The image tag and digest have been verified and correctly match the published GHCR manifest. The digest sha256:c14a0bcb1a1531c72725b3a4c40aefa2bcd5c129810bf58ea8e37d3fcff2a326 corresponds exactly to the tag 1.18.5 for ghcr.io/cozystack/cozystack/cilium. This change follows the correct umbrella pattern for the system package, with the wrapper chart's values.yaml properly configuring the vendored cilium dependency. No issues found.

packages/system/kamaji/values.yaml (1)

6-6: No issues found. The chart templates construct image references using only repository:tag format (confirmed in controller.yaml), and there is no separate digest field that would cause conflicts. The digest embedded in the tag field produces valid image references (e.g., repo:v0.40.0@sha256:...), which container runtimes correctly handle.

packages/system/dashboard/values.yaml (2)

2-6: Good: images are pinned by digest (reproducible deployments).
Keeping :v0.40.0@sha256:... prevents accidental upgrades and makes rollbacks deterministic.


1-6: Templates correctly render image references verbatim; tag↔digest correspondence is guaranteed by the build process.

Verification confirms the templates (web.yaml lines 54, 174; gatekeeper.yaml line 82) use .Values.<component>.image directly without tag concatenation. The Makefile build targets (image-openapi-ui, image-openapi-ui-k8s-bff, image-token-proxy) extract the digest from container build metadata immediately after building and write the full reference (repo:tag@sha256:digest) directly to values.yaml, ensuring tag and digest always match.

Minor note: gatekeeper.yaml line 82 omits the | quote filter present in web.yaml's image fields, though both render validly.

packages/extra/bootbox/images/matchbox.tag (1)

1-1: This image reference is part of the coordinated v0.40.0 release—not an unrelated bundled change. The matchbox, objectstorage-sidecar, and other components are intentionally versioned together in this release. The SHA256 digest is properly formatted.

Likely an incorrect or invalid review comment.

packages/apps/kubernetes/images/kubevirt-cloud-provider.tag (1)

1-1: Clarify whether the kubevirt-cloud-provider image digest bump is in scope for this PR.

The kubevirt-cloud-provider change appears unrelated to flux-plunger (located in separate service/directory hierarchy). If it's unrelated, move to a separate PR to simplify review. If intentional and coordinated, confirm it's produced by your build/provenance process and passes image scanning gates.

packages/extra/monitoring/images/grafana.tag (1)

1-1: Image reference format is correct and properly pinned.

The digest is automatically generated during the build process (Makefile line 19-20) via docker buildx build and extracted from the build metadata, ensuring it matches the exact built image. No manual verification is needed.

packages/apps/kubernetes/images/ubuntu-container-disk.tag (1)

1-1: No action needed—this is a standard image configuration update.

The ubuntu-container-disk.tag file is a legitimate part of the Helm chart structure used in packages/apps/kubernetes/templates/cluster.yaml. Digest-only updates to container images are normal operational maintenance and pose no risk due to digest pinning ensuring immutability. The change follows the standard Helm umbrella pattern and requires no verification.

packages/core/platform/values.yaml (1)

2-2: Confirm the cozystack-assets v0.40.0 image and digest are current.

As this change is part of the v0.40.0 release, the image tag and SHA256 digest should be verified through standard release and deployment processes to ensure they are correct and the image is available in the registry.

packages/system/kubevirt-csi-node/values.yaml (1)

3-3: Image digest update is consistent but appears unrelated to PR objectives.

The csiDriver.image digest update matches the corresponding change in packages/apps/kubernetes/images/kubevirt-csi-driver.tag, which is good for consistency. However, this kubevirt-csi-driver update appears unrelated to the flux-plunger controller additions described in the PR objectives.

Please clarify whether this change should be in this PR or moved to a separate PR focused on kubevirt-csi component updates.

packages/apps/kubernetes/images/kubevirt-csi-driver.tag (1)

1-1: Clarify the scope and rationale for this image update.

This change updates the kubevirt-csi-driver image digest in the Kubernetes Helm chart. While the change itself is structurally sound and follows the project's Helm patterns correctly, please clarify:

  • Is this image update intentionally bundled with other changes in this PR, or should it be moved to a separate change?
  • What is the reason for updating this specific digest (security patch, feature update, etc.)?
packages/extra/seaweedfs/images/objectstorage-sidecar.tag (1)

1-1: This change is intentional and part of the v0.40.0 release.

The objectstorage-sidecar image update from v0.38.2 to v0.40.0 is expected as part of the broader v0.40.0 release PR (#1840). SeaweedFS is an integrated component of the platform, and bundling its updates with other component updates is standard practice in release PRs. The image version and digest are consistent with the release being prepared.

Likely an incorrect or invalid review comment.

packages/system/seaweedfs/values.yaml (1)

180-180: Sidecar image update is part of v0.40.0 release.

This change is correctly part of the v0.40.0 release commit and follows proper Helm umbrella pattern conventions (values.yaml at umbrella level, not in vendored charts/). The image reference matches the tag file exactly, ensuring consistency across configuration files.

Since this is part of the coordinated release (commit 1685242), verify that:

  1. The objectstorage-sidecar v0.40.0 upgrade has been tested and is stable
  2. Any relevant changelog or release notes for this component have been reviewed
packages/system/bucket/images/s3manager.tag (1)

1-1: Digest pin is correct and current. The tag v0.5.0 resolves to the pinned digest sha256:ecb140d026ed72660306953a7eec140d7ac81e79544d5bbf1aba5f62aa5f8b69 in GHCR.

packages/system/lineage-controller-webhook/values.yaml (1)

2-2: This is a new file in v0.40.0, not a version bump from an earlier release.

The values.yaml file for lineage-controller-webhook is being created as part of the v0.40.0 release, not updated from v0.38.2. The image digest is correct per the release commit. No breaking changes are documented for this component in v0.40.0.

Likely an incorrect or invalid review comment.

repository: ghcr.io/cozystack/cozystack/piraeus-server
tag: latest@sha256:417532baa2801288147cd9ac9ae260751c1a7754f0b829725d09b72a770c111a

tag: 1.32.3@sha256:c6fbe1825514a3059088175968a651d7e10f477590023a33f1a9e6584fd98e04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, explore the file structure to understand the context
cd packages/system/linstor 2>/dev/null && ls -la || echo "Directory not found, searching for the file"
find . -name "values.yaml" -path "*/linstor/*" -type f 2>/dev/null | head -5

Repository: cozystack/cozystack

Length of output: 661


🏁 Script executed:

# Read the linstor values.yaml file to see current state
fd "values.yaml" . --path "*linstor*" --type f | head -1 | xargs cat -n || cat packages/system/linstor/values.yaml 2>/dev/null || echo "File not found"

Repository: cozystack/cozystack

Length of output: 278


🏁 Script executed:

# Check the file structure and verify line numbers
if [ -f packages/system/linstor/values.yaml ]; then
  echo "=== Lines 1-20 of packages/system/linstor/values.yaml ==="
  head -20 packages/system/linstor/values.yaml | cat -n
else
  echo "File not found at expected location"
fi

Repository: cozystack/cozystack

Length of output: 636


🏁 Script executed:

# Search for all references to image tags in the linstor values file
if [ -f packages/system/linstor/values.yaml ]; then
  echo "=== Searching for tag references ==="
  grep -n "tag:" packages/system/linstor/values.yaml | head -10
fi

Repository: cozystack/cozystack

Length of output: 280


🏁 Script executed:

# Check git history for this file to understand the context
if git rev-parse --git-dir > /dev/null 2>&1; then
  echo "=== Recent commits affecting packages/system/linstor/values.yaml ==="
  git log --oneline -20 packages/system/linstor/values.yaml 2>/dev/null || echo "No git history available"
fi

Repository: cozystack/cozystack

Length of output: 164


🏁 Script executed:

# Search for flux-plunger references in the repository
echo "=== Searching for flux-plunger references ==="
rg "flux-plunger" --type yaml --type md . 2>/dev/null | head -20 || echo "No flux-plunger references found"

Repository: cozystack/cozystack

Length of output: 1861


🏁 Script executed:

# Check the broader context - are there other linstor version changes?
echo "=== Checking for linstor-related changes in git diff ==="
git diff HEAD~5 HEAD -- packages/system/linstor/ 2>/dev/null | head -100 || echo "Cannot access git diff"

Repository: cozystack/cozystack

Length of output: 119


🏁 Script executed:

# Check if there's a PR or documentation about this change
find . -name "*.md" -o -name "*.txt" 2>/dev/null | xargs grep -l "linstor\|piraeus" 2>/dev/null | head -5

Repository: cozystack/cozystack

Length of output: 214


🏁 Script executed:

# Check if image exists and is accessible
echo "=== Verifying piraeus-server image ==="
# Try to get metadata about the image
rg "piraeus-server:1.32.3" . --type yaml | head -5

# Check if there are other version tags in the codebase
echo ""
echo "=== Checking for other piraeus-server versions ==="
rg "piraeus-server:" . --type yaml | head -10

Repository: cozystack/cozystack

Length of output: 281


🏁 Script executed:

# Look for commit messages or changelog entries about this release
echo "=== Checking changelog for v0.40.0 ==="
if [ -f ./docs/changelogs/v0.40.0.md ]; then
  head -50 ./docs/changelogs/v0.40.0.md
else
  # Try to find the latest changelog
  ls -lah ./docs/changelogs/ | tail -10
fi

Repository: cozystack/cozystack

Length of output: 701


🏁 Script executed:

# Check the actual release or PR context by looking at recent changes
echo "=== Checking what else changed in the linstor system package ==="
git show --stat 1685242e -- packages/system/linstor/ 2>/dev/null || git log -1 --stat 1685242e 2>/dev/null || echo "Cannot access detailed commit info"

Repository: cozystack/cozystack

Length of output: 2103


Address inconsistency in image tagging strategy.

Version pinning the piraeusServer image to 1.32.3 improves reproducibility. However, there's an inconsistency worth addressing: piraeusServer now uses a versioned tag while linstorCSI (line 14) still uses latest. Consider aligning the tagging strategy across both images for consistency in how the linstor deployment is versioned.

🤖 Prompt for AI Agents
In @packages/system/linstor/values.yaml at line 4, Update the linstor image
tagging to match the piraeusServer versioning strategy: replace the linstorCSI
tag that currently uses "latest" with a consistent pinned tag (e.g., the same
"1.32.3@sha256:c6fbe1825514a3059088175968a651d7e10f477590023a33f1a9e6584fd98e04"
or another explicit sha256-tagged version). Locate the linstorCSI key in the
YAML and set its image tag to the chosen pinned value, or better yet refactor
both piraeusServer and linstorCSI to use a shared parameter (e.g.,
values.image.tag) so both images use the same versioning source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants