Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
235 changes: 229 additions & 6 deletions pkg/controllers/staticresourceinstaller/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,14 @@ import (
"io/fs"
"path/filepath"

admissionregistrationv1 "k8s.io/api/admissionregistration/v1"
v1 "k8s.io/api/core/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
k8serrors "k8s.io/apimachinery/pkg/util/errors"
"k8s.io/client-go/kubernetes"
"k8s.io/utils/clock"
"k8s.io/utils/ptr"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/builder"
"sigs.k8s.io/controller-runtime/pkg/client"
Expand All @@ -39,6 +42,7 @@ import (
"github.com/openshift/cluster-capi-operator/pkg/util"
"github.com/openshift/library-go/pkg/operator/events"
"github.com/openshift/library-go/pkg/operator/resource/resourceapply"
"github.com/openshift/library-go/pkg/operator/resource/resourceread"
)

// Assets is an interface that can be used to read assets from a filesystem.
Expand All @@ -48,8 +52,10 @@ type Assets interface {
}

type staticResourceInstallerController struct {
assetNames []string // The names of the assets to install.
kubeClient kubernetes.Interface
assetNames []string // The names of the assets to install.
client client.Client
kubeClient kubernetes.Interface
initialClusterOperatorsBootstrapped bool
Comment on lines 54 to +58
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, check the Go version
cat go.mod | head -20

Repository: openshift/cluster-capi-operator

Length of output: 1001


🏁 Script executed:

# Read the relevant sections of the controller file
wc -l pkg/controllers/staticresourceinstaller/controller.go

Repository: openshift/cluster-capi-operator

Length of output: 132


🏁 Script executed:

# Check the struct definition and field declaration
sed -n '50,70p' pkg/controllers/staticresourceinstaller/controller.go

Repository: openshift/cluster-capi-operator

Length of output: 814


🏁 Script executed:

# Check the predicate callback at line 97
sed -n '90,110p' pkg/controllers/staticresourceinstaller/controller.go

Repository: openshift/cluster-capi-operator

Length of output: 1110


🏁 Script executed:

# Check the reconcile method and initialClusterOperatorsBootstrapped access
sed -n '330,360p' pkg/controllers/staticresourceinstaller/controller.go

Repository: openshift/cluster-capi-operator

Length of output: 1145


🏁 Script executed:

# Check the error text at line 260
sed -n '255,265p' pkg/controllers/staticresourceinstaller/controller.go

Repository: openshift/cluster-capi-operator

Length of output: 884


🏁 Script executed:

# Look for the clusterOperatorsBootstrapped method implementation
sed -n '332,355p' pkg/controllers/staticresourceinstaller/controller.go

Repository: openshift/cluster-capi-operator

Length of output: 926


Protect initialClusterOperatorsBootstrapped with synchronization to prevent data race.

Line 97 reads this flag from predicate callbacks that run in the watch event loop, while lines 333 and 352 read and write it during reconciliation in the worker pool. These concurrent execution contexts create an unsynchronized access to a shared boolean field, which is a data race. Use atomic.Bool to synchronize access.

Additionally, the error message at line 260 incorrectly refers to "validating webhook configuration" when the function mutates a MutatingWebhookConfiguration. Update the error text to say "mutating webhook configuration".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/staticresourceinstaller/controller.go` around lines 54 - 58,
Replace the plain bool field initialClusterOperatorsBootstrapped on
staticResourceInstallerController with sync/atomic's atomic.Bool and update all
accesses: use initialClusterOperatorsBootstrapped.Load() where the predicate
callbacks and reconciliation read the flag and
initialClusterOperatorsBootstrapped.Store(true/false) where reconciliation
writes it (ensure you add the "sync/atomic" import). Also fix the error text
that currently mentions "validating webhook configuration" when handling a
MutatingWebhookConfiguration—change the message to "mutating webhook
configuration" so it accurately reflects the MutatingWebhookConfiguration being
mutated.


assets Assets
resourceCache resourceapply.ResourceCache
Expand All @@ -65,6 +71,8 @@ func NewStaticResourceInstallerController(assets Assets) *staticResourceInstalle

// SetupWithManager sets up the static resource installer controller with the given manager.
func (c *staticResourceInstallerController) SetupWithManager(ctx context.Context, mgr ctrl.Manager) error {
c.client = mgr.GetClient()

// The assets are an embedded filesystem and won't change over time.
assets, err := c.assets.ReadAssets()
if err != nil {
Expand All @@ -82,11 +90,11 @@ func (c *staticResourceInstallerController) SetupWithManager(ctx context.Context

build := ctrl.NewControllerManagedBy(mgr).
Named("static-resource-installer").
// We only want to reconcile an initial time when the cluster operator is created
// in the cache, later reconciles will happen based on watches for individual assets.
// We only want to reconcile updates until we have observed that the cluster operators are all bootstrapped.
// This allows us to inject FailurePolicy: Ignore for webhooks during cluster bootstrap.
For(&configv1.ClusterOperator{}, builder.WithPredicates(predicate.Funcs{
CreateFunc: func(e event.CreateEvent) bool { return e.Object.GetName() == controllers.ClusterOperatorName },
UpdateFunc: func(e event.UpdateEvent) bool { return false },
UpdateFunc: func(e event.UpdateEvent) bool { return !c.initialClusterOperatorsBootstrapped },
DeleteFunc: func(e event.DeleteEvent) bool { return false },
GenericFunc: func(e event.GenericEvent) bool { return false },
}))
Expand Down Expand Up @@ -124,7 +132,7 @@ func (c *staticResourceInstallerController) Reconcile(ctx context.Context, req c
Name: "cluster-api",
}, clock.RealClock{}),
c.resourceCache,
c.assets.Asset,
c.mutateAsset(ctx),
c.assetNames...,
)

Expand Down Expand Up @@ -162,3 +170,218 @@ func objectNamePredicate(name string) predicate.Predicate {
return obj.GetName() == name
})
}

func (c *staticResourceInstallerController) mutateAsset(ctx context.Context) func(string) ([]byte, error) {
return func(name string) ([]byte, error) {
raw, err := c.assets.Asset(name)
if err != nil {
return nil, fmt.Errorf("failed to read asset %s: %w", name, err)
}

requiredObj, err := resourceread.ReadGenericWithUnstructured(raw)
if err != nil {
return nil, fmt.Errorf("failed to decode asset %s: %w", name, err)
}

switch t := requiredObj.(type) {
case *admissionregistrationv1.ValidatingWebhookConfiguration:
return c.mutateValidatingWebhookConfiguration(ctx, raw, t)
case *admissionregistrationv1.MutatingWebhookConfiguration:
return c.mutateMutatingWebhookConfiguration(ctx, raw, t)
}

return raw, nil
}
}

type webhookPolicy struct {
Name string
FailurePolicy *admissionregistrationv1.FailurePolicyType
}

func (c *staticResourceInstallerController) mutateValidatingWebhookConfiguration(ctx context.Context, raw []byte, obj *admissionregistrationv1.ValidatingWebhookConfiguration) ([]byte, error) {
currentObj := &admissionregistrationv1.ValidatingWebhookConfiguration{}

if err := c.client.Get(ctx, client.ObjectKey{Name: obj.Name}, currentObj); err != nil && !apierrors.IsNotFound(err) {
return nil, fmt.Errorf("failed to get validating webhook configuration %s: %w", obj.Name, err)
} else if err != nil && apierrors.IsNotFound(err) {
// If the object doesn't currently exist, apply it initially with the failure policy set to ignore
// so that we don't block cluster operators during cluster bootstrap.
for i := range obj.Webhooks {
obj.Webhooks[i].FailurePolicy = ptr.To(admissionregistrationv1.Ignore)
}

data, err := k8syaml.Marshal(obj)
if err != nil {
return nil, fmt.Errorf("failed to marshal object: %w", err)
}

return data, nil
}
Comment on lines +205 to +220
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid forcing Ignore on NotFound after bootstrap is complete.

Line 207 and Line 261 return early with Ignore whenever the webhook object is missing. In a bootstrapped cluster, this still weakens policy even though Fail should be allowed.

Suggested fix (gate NotFound path on bootstrap state)
-	} else if err != nil && apierrors.IsNotFound(err) {
+	} else if err != nil && apierrors.IsNotFound(err) {
+		bootstrapped, berr := c.clusterOperatorsBootstrapped(ctx)
+		if berr != nil {
+			return nil, fmt.Errorf("failed to check if cluster operators are bootstrapped: %w", berr)
+		}
+		if bootstrapped {
+			return raw, nil
+		}
 		// If the object doesn't currently exist, apply it initially with the failure policy set to ignore
 		// so that we don't block cluster operators during cluster bootstrap.
 		for i := range obj.Webhooks {
 			obj.Webhooks[i].FailurePolicy = ptr.To(admissionregistrationv1.Ignore)
 		}

Also applies to: 259-274

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/staticresourceinstaller/controller.go` around lines 205 -
220, The NotFound branch currently forces obj.Webhooks[i].FailurePolicy =
ptr.To(admissionregistrationv1.Ignore) whenever the validating webhook is
missing; change it to only apply this "ignore on create" behavior when the
cluster is still bootstrapping (e.g., check a bootstrap-complete flag or method
on the controller such as c.isBootstrapped() or c.bootstrapComplete) and
otherwise return an error or proceed to create with the default/Fail policy;
update the same gating logic in the other NotFound handling block around lines
259-274 so both places only set FailurePolicy to Ignore if the controller
reports the cluster is not yet bootstrapped.


clusterBootstrapped, err := c.clusterBootstrapped(ctx,
util.SliceMap(obj.Webhooks, func(webhook admissionregistrationv1.ValidatingWebhook) webhookPolicy {
return webhookPolicy{
Name: webhook.Name,
FailurePolicy: webhook.FailurePolicy,
}
}),
util.SliceMap(currentObj.Webhooks, func(webhook admissionregistrationv1.ValidatingWebhook) webhookPolicy {
return webhookPolicy{
Name: webhook.Name,
FailurePolicy: webhook.FailurePolicy,
}
}))
if err != nil {
return nil, fmt.Errorf("failed to check if cluster is bootstrapped: %w", err)
}

if clusterBootstrapped {
return raw, nil
}

// Cluster isn't yet bootstrapped, force all webhooks to ignore failures so that we don't block cluster operators during cluster bootstrap.
for i := range obj.Webhooks {
obj.Webhooks[i].FailurePolicy = ptr.To(admissionregistrationv1.Ignore)
}

data, err := k8syaml.Marshal(obj)
if err != nil {
return nil, fmt.Errorf("failed to marshal object: %w", err)
}

return data, nil
}

func (c *staticResourceInstallerController) mutateMutatingWebhookConfiguration(ctx context.Context, raw []byte, obj *admissionregistrationv1.MutatingWebhookConfiguration) ([]byte, error) {
currentObj := &admissionregistrationv1.MutatingWebhookConfiguration{}

if err := c.client.Get(ctx, client.ObjectKey{Name: obj.Name}, currentObj); err != nil && !apierrors.IsNotFound(err) {
return nil, fmt.Errorf("failed to get validating webhook configuration %s: %w", obj.Name, err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix incorrect resource type in error message.

Line 260 is in mutateMutatingWebhookConfiguration but the message says “validating webhook configuration”, which is misleading when debugging failures.

Suggested fix
-		return nil, fmt.Errorf("failed to get validating webhook configuration %s: %w", obj.Name, err)
+		return nil, fmt.Errorf("failed to get mutating webhook configuration %s: %w", obj.Name, err)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return nil, fmt.Errorf("failed to get validating webhook configuration %s: %w", obj.Name, err)
return nil, fmt.Errorf("failed to get mutating webhook configuration %s: %w", obj.Name, err)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/staticresourceinstaller/controller.go` at line 260, The error
message in mutateMutatingWebhookConfiguration incorrectly says "validating
webhook configuration"; update the fmt.Errorf call that returns an error for
obj.Name to refer to "mutating webhook configuration" instead, i.e. change the
string literal passed to fmt.Errorf in mutateMutatingWebhookConfiguration so it
reads "failed to get mutating webhook configuration %s: %w" (preserving obj.Name
and err).

} else if err != nil && apierrors.IsNotFound(err) {
// If the object doesn't currently exist, apply it initially with the failure policy set to ignore
// so that we don't block cluster operators during cluster bootstrap.
for i := range obj.Webhooks {
obj.Webhooks[i].FailurePolicy = ptr.To(admissionregistrationv1.Ignore)
}

data, err := k8syaml.Marshal(obj)
if err != nil {
return nil, fmt.Errorf("failed to marshal object: %w", err)
}

return data, nil
}

clusterBootstrapped, err := c.clusterBootstrapped(ctx,
util.SliceMap(obj.Webhooks, func(webhook admissionregistrationv1.MutatingWebhook) webhookPolicy {
return webhookPolicy{
Name: webhook.Name,
FailurePolicy: webhook.FailurePolicy,
}
}),
util.SliceMap(currentObj.Webhooks, func(webhook admissionregistrationv1.MutatingWebhook) webhookPolicy {
return webhookPolicy{
Name: webhook.Name,
FailurePolicy: webhook.FailurePolicy,
}
}),
)
if err != nil {
return nil, fmt.Errorf("failed to check if cluster is bootstrapped: %w", err)
}

if clusterBootstrapped {
return raw, nil
}

// Cluster isn't yet bootstrapped, force all webhooks to ignore failures so that we don't block cluster operators during cluster bootstrap.
for i := range obj.Webhooks {
obj.Webhooks[i].FailurePolicy = ptr.To(admissionregistrationv1.Ignore)
}

data, err := k8syaml.Marshal(obj)
if err != nil {
return nil, fmt.Errorf("failed to marshal object: %w", err)
}

return data, nil
}

func (c *staticResourceInstallerController) clusterBootstrapped(ctx context.Context, webhooks, currentWebhooks []webhookPolicy) (bool, error) {
// First check if the existing policies match the desired policies.
// This means we already applied the manifest as it comes from the assets directly
// without any mutation. To do that, the cluster must already have been bootstrapped.
policiesMatch := true

for _, webhook := range webhooks {
for _, currentWebhook := range currentWebhooks {
if webhook.Name == currentWebhook.Name {
policiesMatch = policiesMatch && webhook.FailurePolicy != nil && currentWebhook.FailurePolicy != nil && *webhook.FailurePolicy == *currentWebhook.FailurePolicy
}
}
}

if policiesMatch {
return true, nil
}
Comment on lines +311 to +327
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

clusterBootstrapped can return true for mismatched webhook sets.

At Line 317, the nested comparison only updates policiesMatch when names match. If a required webhook name is missing in currentWebhooks, policiesMatch may stay true and skip bootstrap gating.

Suggested fix (compare by name with existence checks)
 func (c *staticResourceInstallerController) clusterBootstrapped(ctx context.Context, webhooks, currentWebhooks []webhookPolicy) (bool, error) {
-	policiesMatch := true
-
-	for _, webhook := range webhooks {
-		for _, currentWebhook := range currentWebhooks {
-			if webhook.Name == currentWebhook.Name {
-				policiesMatch = policiesMatch && webhook.FailurePolicy != nil && currentWebhook.FailurePolicy != nil && *webhook.FailurePolicy == *currentWebhook.FailurePolicy
-			}
-		}
-	}
+	if len(webhooks) != len(currentWebhooks) {
+		return c.clusterOperatorsBootstrapped(ctx)
+	}
+
+	currentByName := map[string]webhookPolicy{}
+	for _, currentWebhook := range currentWebhooks {
+		currentByName[currentWebhook.Name] = currentWebhook
+	}
+
+	policiesMatch := true
+	for _, webhook := range webhooks {
+		currentWebhook, ok := currentByName[webhook.Name]
+		if !ok {
+			policiesMatch = false
+			break
+		}
+		policiesMatch = policiesMatch &&
+			webhook.FailurePolicy != nil &&
+			currentWebhook.FailurePolicy != nil &&
+			*webhook.FailurePolicy == *currentWebhook.FailurePolicy
+	}
 
 	if policiesMatch {
 		return true, nil
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/staticresourceinstaller/controller.go` around lines 311 -
327, The clusterBootstrapped function currently only updates policiesMatch when
names match, so missing required webhook names in currentWebhooks can leave
policiesMatch true; fix by for each webhook in webhooks, search for a matching
name in currentWebhooks and if none is found set policiesMatch = false (and
continue), and if found compare failurePolicy pointers as before (require both
non-nil and equal) — update the logic in clusterBootstrapped to explicitly check
existence of a matching currentWebhook for every desired webhook
(webhooks/currentWebhooks/policiesMatch).


return c.clusterOperatorsBootstrapped(ctx)
}

func (c *staticResourceInstallerController) clusterOperatorsBootstrapped(ctx context.Context) (bool, error) {
if c.initialClusterOperatorsBootstrapped {
// We have previously seen all cluster operators bootstrapped since we started the controller.
return true, nil
}

// Check all cluster operators and wait for them all to be bootstrapped.
// Once they are bootstrapped, we can apply the manifest directly
// as it is within the assets folder.
clusterOperators := &configv1.ClusterOperatorList{}
if err := c.client.List(ctx, clusterOperators); err != nil {
return false, fmt.Errorf("failed to list cluster operators: %w", err)
}

for _, clusterOperator := range clusterOperators.Items {
if !clusterOperatorBootstrapped(clusterOperator) {
return false, nil
}
}

c.initialClusterOperatorsBootstrapped = true

return true, nil
}

func clusterOperatorBootstrapped(clusterOperator configv1.ClusterOperator) bool {
conditions := clusterOperator.Status.Conditions

available, ok := clusterOperatorCondition(conditions, configv1.OperatorAvailable)
if !ok {
return false
}

progressing, ok := clusterOperatorCondition(conditions, configv1.OperatorProgressing)
if !ok {
return false
}

degraded, ok := clusterOperatorCondition(conditions, configv1.OperatorDegraded)
if !ok {
return false
}

return available.Status == configv1.ConditionTrue &&
progressing.Status == configv1.ConditionFalse &&
degraded.Status == configv1.ConditionFalse
}

func clusterOperatorCondition(conditions []configv1.ClusterOperatorStatusCondition, conditionType configv1.ClusterStatusConditionType) (configv1.ClusterOperatorStatusCondition, bool) {
for _, condition := range conditions {
if condition.Type == conditionType {
return condition, true
}
}
return configv1.ClusterOperatorStatusCondition{}, false
}
50 changes: 50 additions & 0 deletions pkg/controllers/staticresourceinstaller/controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ import (
admissionregistrationv1 "k8s.io/api/admissionregistration/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

configv1 "github.com/openshift/api/config/v1"
"github.com/openshift/cluster-capi-operator/pkg/controllers"
"github.com/openshift/cluster-capi-operator/pkg/controllers/crdcompatibility/bindata"
)

Expand Down Expand Up @@ -100,5 +102,53 @@ var _ = Describe("StaticResourceInstaller Controller", Ordered, ContinueOnFailur
By("Verifying that the resource is recreated")
Eventually(kWithCtx(ctx).Object(vwc), 10*time.Second).WithContext(ctx).Should(HaveField("ObjectMeta.UID", Not(Equal(originalUID))))
})

It("should set the failure policy to ignore for webhooks when the cluster is not bootstrapped", func() {
// Check initially that the clusteroperator isn't reporting that it is bootstrapped.
co := &configv1.ClusterOperator{
ObjectMeta: metav1.ObjectMeta{
Name: controllers.ClusterOperatorName,
},
}
Eventually(kWithCtx(ctx).Object(co), 10*time.Second).WithContext(ctx).Should(HaveField("Status.Conditions", BeEmpty()))

By("Verifying that the failure policy is set to ignore")
Eventually(kWithCtx(ctx).ObjectList(&admissionregistrationv1.ValidatingWebhookConfigurationList{}), 10*time.Second).WithContext(ctx).Should(HaveField("Items", HaveEach(
HaveField("Webhooks", ConsistOf(HaveField("FailurePolicy", HaveValue(Equal(admissionregistrationv1.Ignore))))),
)))
})

It("should set the failure policy to fail once the cluster is bootstrapped", func() {
// Update the conditions on the cluster operator to report that it is bootstrapped.
co := &configv1.ClusterOperator{
ObjectMeta: metav1.ObjectMeta{
Name: controllers.ClusterOperatorName,
},
}
Eventually(kWithCtx(ctx).UpdateStatus(co, func() {
co.Status.Conditions = []configv1.ClusterOperatorStatusCondition{
{
Type: configv1.OperatorAvailable,
Status: configv1.ConditionTrue,
LastTransitionTime: metav1.NewTime(time.Now()),
},
{
Type: configv1.OperatorProgressing,
Status: configv1.ConditionFalse,
LastTransitionTime: metav1.NewTime(time.Now()),
},
{
Type: configv1.OperatorDegraded,
Status: configv1.ConditionFalse,
LastTransitionTime: metav1.NewTime(time.Now()),
},
}
}), 10*time.Second).WithContext(ctx).Should(Succeed())

By("Verifying that the failure policy is set to fail")
Eventually(kWithCtx(ctx).ObjectList(&admissionregistrationv1.ValidatingWebhookConfigurationList{}), 10*time.Second).WithContext(ctx).Should(HaveField("Items", HaveEach(
HaveField("Webhooks", ConsistOf(HaveField("FailurePolicy", HaveValue(Equal(admissionregistrationv1.Fail))))),
)))
})
})
})