Skip to content

Conversation

@pkoutsovasilis
Copy link
Contributor

@pkoutsovasilis pkoutsovasilis commented Nov 20, 2025

Overview

This PR implements support for multiple StackConfigPolicies (SCPs) targeting the same Elasticsearch cluster or Kibana instance, using a weight-based priority system for deterministic policy composition.

Key Features

Weight-Based Priority System

  • Policies are merged in order of weight (lower weight takes precedence)
  • Default weight: 0

Conflict Detection

Conflicts are detected across multiple dimensions and will prevent policy application:

Conflict Type Condition Result
Weight Conflict Two or more policies with identical weights target the same Elasticsearch/Kibana ❌ Conflict
SecretMount Name Conflict Different policies define SecretMount with same SecretName ❌ Conflict
SecretMount Path Conflict Different policies define SecretMount with same MountPath ❌ Conflict
Different Weights Policies have different weights and none of the above applies ✅ Pass - lower weight wins

Important: Even if two policies with the same weight have non-overlapping resources, they still conflict because the weight collision makes the merge order ambiguous.

Configuration Merging Behaviour

Different merge strategies are applied based on the configuration type:

  • Deep Merge (recursive merging):

    • ClusterSettings
    • Config
    • SnapshotLifecyclePolicies
    • SecurityRoleMappings
    • IndexLifecyclePolicies
    • IngestPipelines
    • IndexTemplates.ComposableIndexTemplates
    • IndexTemplates.ComponentTemplates
  • Top-Level Key Replacement (entire keys replaced):

    • SnapshotRepositories - each repository configuration is treated atomically
  • Union Merge (with conflict detection):

    • SecretMounts - conflicts on duplicate SecretName OR duplicate MountPath
    • SecureSettings - merges by SecretName+Key, lower weight wins (no conflicts)

Multi-Soft-Owner Secret Management

File Settings and Policy Config Secrets:

  • Now support multiple soft owners
  • Secrets are only deleted when all referencing soft-owners are removed
  • Uses eck.k8s.elastic.co/owner-refs annotation with JSON-encoded map of owner namespaced names

Secret Sources:

  • Remain single soft owner (existing behaviour unchanged)

This prevents secret leakage while enabling proper cleanup when policies are deleted.

Related Issues

@pkoutsovasilis pkoutsovasilis self-assigned this Nov 20, 2025
@pkoutsovasilis pkoutsovasilis added the >enhancement Enhancement of existing functionality label Nov 20, 2025
@prodsecmachine
Copy link
Collaborator

prodsecmachine commented Nov 20, 2025

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@github-actions
Copy link

github-actions bot commented Nov 20, 2025

🔍 Preview links for changed docs

@kvalliyurnatt
Copy link
Contributor

Just a general question/comment that came to my mind, do we want to have any limits on the number of SCPs that can be associated to a cluster ?
I was just trying to think what potential issues we might face when we have a large number of SCPs(if that is even a practical scenario) associated with a single ES cluster and if there is a practical maximum that we can enforce. One thing that came to my mind while thinking about scale was the annotation for soft owners, maybe we run into some kind of limit with the annotation map size? (I think the annotation limit is 256KB which I think should not be something to worry about ?)
Wondering if there are any other such things to consider.

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass, just looking at the code. I have not tested it yet. Will try to find some more time later today.

Comment on lines 102 to 103
sMntAggr := secretMountsAggregator{}
sSrcAggr := secretSourceAggregator{}
Copy link
Collaborator

@pebrc pebrc Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sMntAggr := secretMountsAggregator{}
sSrcAggr := secretSourceAggregator{}
secretMounts := secretMountsAggregator{}
secretSources := secretSourceAggregator{}

I find the variable names otherwise a bit hard to follow. secretMountsAggr if you want to keep the difference between the actual sources and the aggregator visibile in the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 3bf7a08

Comment on lines 218 to 221
// Check namespace restrictions; the policy must be in operator namespace or same namespace as the target object
if policy.Namespace != operatorNamespace && policy.Namespace != obj.GetNamespace() {
return false, nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the cheapest check first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my plan was to merge first this one and then stick with the definition of DoesPolicyMatchObject from there which has the namespace check first, any objection with the rebasing part @pebrc ? should I merge instead of rebasing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that I have merged main here (commit 0f5d39b) this should be fixed

Comment on lines 242 to 250
if dst == nil {
dst = &commonv1.Config{}
dstCanonicalConfig = settings.NewCanonicalConfig()
} else {
dstCanonicalConfig, err = settings.NewCanonicalConfigFrom(dst.DeepCopy().Data)
if err != nil {
return nil, err
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if dst == nil {
dst = &commonv1.Config{}
dstCanonicalConfig = settings.NewCanonicalConfig()
} else {
dstCanonicalConfig, err = settings.NewCanonicalConfigFrom(dst.DeepCopy().Data)
if err != nil {
return nil, err
}
}
var err error
if dst == nil {
return src.DeepCopy(), nil
}
dstCanonicalConfig, err = settings.NewCanonicalConfigFrom(dst.DeepCopy().Data)
if err != nil {
return nil, err
}

That also allows to get rid of the special casing for the single policy case at least for Kibana.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 3bf7a08

Comment on lines 173 to 203
mergedPolicies := 0
sSrcAggr := secretSourceAggregator{}
cfgPolicy := &configPolicy[policyv1alpha1.KibanaConfigPolicySpec]{
extractFunc: func(p *policyv1alpha1.StackConfigPolicy) policyv1alpha1.KibanaConfigPolicySpec {
return p.Spec.Kibana
},
mergeFunc: func(dst *policyv1alpha1.KibanaConfigPolicySpec, src policyv1alpha1.KibanaConfigPolicySpec, srcPolicy *policyv1alpha1.StackConfigPolicy) error {
if mergedPolicies == 0 {
// First policy: copy directly without merging/canonicalizing to avoid unnecessary differences
specCopy := src.DeepCopy()
// SecureSettings are aggregated by sSrcAggr
specCopy.SecureSettings = nil
*dst = *specCopy
} else {
var err error
if dst.Config, err = deepMergeConfig(dst.Config, src.Config); err != nil {
return err
}
}

sSrcAggr.aggregate(srcPolicy.Spec.Kibana.SecureSettings, srcPolicy)
mergedPolicies++
return nil
},
}

if err := merge(cfgPolicy, kbn, allPolicies, params.OperatorNamespace); err != nil {
return cfgPolicy, err
}
cfgPolicy.SecretSources = sSrcAggr.namespacedSecretSources
return cfgPolicy, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mergedPolicies := 0
sSrcAggr := secretSourceAggregator{}
cfgPolicy := &configPolicy[policyv1alpha1.KibanaConfigPolicySpec]{
extractFunc: func(p *policyv1alpha1.StackConfigPolicy) policyv1alpha1.KibanaConfigPolicySpec {
return p.Spec.Kibana
},
mergeFunc: func(dst *policyv1alpha1.KibanaConfigPolicySpec, src policyv1alpha1.KibanaConfigPolicySpec, srcPolicy *policyv1alpha1.StackConfigPolicy) error {
if mergedPolicies == 0 {
// First policy: copy directly without merging/canonicalizing to avoid unnecessary differences
specCopy := src.DeepCopy()
// SecureSettings are aggregated by sSrcAggr
specCopy.SecureSettings = nil
*dst = *specCopy
} else {
var err error
if dst.Config, err = deepMergeConfig(dst.Config, src.Config); err != nil {
return err
}
}
sSrcAggr.aggregate(srcPolicy.Spec.Kibana.SecureSettings, srcPolicy)
mergedPolicies++
return nil
},
}
if err := merge(cfgPolicy, kbn, allPolicies, params.OperatorNamespace); err != nil {
return cfgPolicy, err
}
cfgPolicy.SecretSources = sSrcAggr.namespacedSecretSources
return cfgPolicy, nil
secretSources := secretSourceAggregator{}
cfgPolicy := &configPolicy[policyv1alpha1.KibanaConfigPolicySpec]{
extractFunc: func(p *policyv1alpha1.StackConfigPolicy) policyv1alpha1.KibanaConfigPolicySpec {
return p.Spec.Kibana
},
mergeFunc: func(dst *policyv1alpha1.KibanaConfigPolicySpec, src policyv1alpha1.KibanaConfigPolicySpec, srcPolicy *policyv1alpha1.StackConfigPolicy) error {
var err error
if dst.Config, err = deepMergeConfig(dst.Config, src.Config); err != nil {
return err
}
secretSources.aggregate(srcPolicy.Spec.Kibana.SecureSettings, srcPolicy)
return nil
},
}
if err := merge(cfgPolicy, kbn, allPolicies, params.OperatorNamespace); err != nil {
return cfgPolicy, err
}
cfgPolicy.SecretSources = secretSources.namespacedSecretSources
return cfgPolicy, nil

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Special casing not necessary if the merge function optimises for the single policy case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 3bf7a08

Comment on lines 281 to 289
if dst == nil {
dst = &commonv1.Config{}
dstCanonicalConfig = settings.NewCanonicalConfig()
} else {
dstCanonicalConfig, err = settings.NewCanonicalConfigFrom(dst.DeepCopy().Data)
if err != nil {
return nil, err
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if dst == nil {
dst = &commonv1.Config{}
dstCanonicalConfig = settings.NewCanonicalConfig()
} else {
dstCanonicalConfig, err = settings.NewCanonicalConfigFrom(dst.DeepCopy().Data)
if err != nil {
return nil, err
}
}
if dst == nil {
return src.DeepCopy(), nil
}
dstCanonicalConfig, err = settings.NewCanonicalConfigFrom(dst.DeepCopy().Data)
if err != nil {
return nil, err
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 3bf7a08

specCopy.SecureSettings = nil
// SecretMounts are aggregated by sMntAggr
specCopy.SecretMounts = nil
*dst = *specCopy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This special casing can be removed if the merge function handles preserves the original structure when the destination is empty. I think the code becomes a little bit easier to reason about without the special cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 3bf7a08

for _, secretMount := range src {
if existingPolicy, exists := s.policiesBySecretName[secretMount.SecretName]; exists {
existingPolicyNsn := k8s.ExtractNamespacedName(existingPolicy)
err := fmt.Errorf("%w: secret with name %q is defined in policies %q, %q", errMergeConflict, secretMount.SecretName,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err := fmt.Errorf("%w: secret with name %q is defined in policies %q, %q", errMergeConflict, secretMount.SecretName,
err := fmt.Errorf("%w: secret with name %q is defined in policy %q, %q", errMergeConflict, secretMount.SecretName,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in e3639c5

}
if existingPolicy, exists := s.policiesByMountPath[secretMount.MountPath]; exists {
existingPolicyNsn := k8s.ExtractNamespacedName(existingPolicy)
err := fmt.Errorf("%w: secret mount path %q is defined in policies %q, %q", errMergeConflict, secretMount.MountPath,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err := fmt.Errorf("%w: secret mount path %q is defined in policies %q, %q", errMergeConflict, secretMount.MountPath,
err := fmt.Errorf("%w: secret mount path %q is defined in policy %q, %q", errMergeConflict, secretMount.MountPath,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in e3639c5

// aggregate merges source secure settings into the aggregator, organizing them by the source
// policy's namespace. Secret sources are sorted deterministically when multiple policies have
// been applied to ensure consistent results.
func (s *secretSourceAggregator) aggregate(src []commonv1.SecretSource, srcPolicy *policyv1alpha1.StackConfigPolicy) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it confusing that both aggregators use the same method name but do differnent things. Also one has a more functional style returning the intermediate results:

func (s *secretMountsAggregator) aggregate(
    dst []policyv1alpha1.SecretMount,
    src []policyv1alpha1.SecretMount,
    srcPolicy *policyv1alpha1.StackConfigPolicy,
) ([]policyv1alpha1.SecretMount, error)

while the other is more imperative:

func (s *secretSourceAggregator) aggregate(
    src []commonv1.SecretSource,
    srcPolicy *policyv1alpha1.StackConfigPolicy,
)

Can we maybe use different method names?

secretMountAggr.mergeInto(dst, src, policy) - emphasizes it modifies dst
secretSourceAggr.addFrom(src, policy) - emphasizes it adds to internal state

or make them both imperative/stateful:

secretMountAggr.add(src, policy) error
secretSourceAggr.add(src, policy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok in 3bf7a08 both of these now should be under the same functional style

// aggregate merges source secret mounts into destination, checking for conflicts on secret names
// and mount paths. Returns the merged slice of secret mounts sorted deterministically when
// multiple policies have been applied, or an error if conflicts are detected.
func (s *secretMountsAggregator) aggregate(dst []policyv1alpha1.SecretMount, src []policyv1alpha1.SecretMount, srcPolicy *policyv1alpha1.StackConfigPolicy) ([]policyv1alpha1.SecretMount, error) {
Copy link
Collaborator

@pebrc pebrc Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe break the long parameter lists into multiple lines (or my monitor is too small 😄 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broken in multiple lines in 3bf7a08

// This uses labels (reconciler.SoftOwnerKindLabel, reconciler.SoftOwnerNameLabel, reconciler.SoftOwnerNamespaceLabel)
// to store the ownership relationship, allowing the policy to manage
// the Secret's lifecycle without using Kubernetes OwnerReferences.
func setSingleSoftOwner(secret *corev1.Secret, policy policyv1alpha1.StackConfigPolicy) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only used in test code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the plan was to use this one also instead of the call to filesettings.SetSoftOwner but this one slipped through the cracks. Either way, I am gonna revisit this approach based on your comment below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 025c76f

}

// Check for multi-policy ownership (annotation-based)
if ownerRefsBytes, exists := obj.GetAnnotations()[SoftOwnerRefsAnnotation]; exists {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a bit awkward that some of the multi-ref functionality is here and some is in ownership.go?

Maybe move all the code here and make it generic?

// In pkg/controller/common/reconciler/secret.go

// SetMultipleSoftOwners adds multiple soft owner references to an object
func SetMultipleSoftOwners(obj metav1.Object, ownerKind string, owners []types.NamespacedName) error

// AddSoftOwner adds a soft owner to an object (handles single->multi transition)
func AddSoftOwner(obj metav1.Object, owner SoftOwnerRef) error

// RemoveSoftOwner removes a soft owner from an object
func RemoveSoftOwner(obj metav1.Object, owner types.NamespacedName) (remainingCount int, err error)

// IsSoftOwnedBy checks if an object is soft-owned by the given owner
func IsSoftOwnedBy(obj metav1.Object, ownerKind string, owner types.NamespacedName) (bool, error)

// SoftOwnerRefs already exists and reads them
func SoftOwnerRefs(obj metav1.Object) ([]SoftOwnerRef, error)

and then call it from the stack config controller, maybe with small wrappers to make the transformation from SCP to NamespacedName where necessary:

func setMultipleSoftOwners(secret *corev1.Secret, policies []policyv1alpha1.StackConfigPolicy) error {
    owners := make([]types.NamespacedName, len(policies))
    for i, p := range policies {
        owners[i] = k8s.ExtractNamespacedName(&p)
    }
    return reconciler.SetMultipleSoftOwners(secret, policyv1alpha1.Kind, owners)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relocated in 025c76f

Comment on lines 629 to 635
owned, err := isPolicySoftOwner(&s, softOwner)
if err != nil {
return err
}
if !owned {
continue
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
owned, err := isPolicySoftOwner(&s, softOwner)
if err != nil {
return err
}
if !owned {
continue
}
if owned, err := isPolicySoftOwner(&secret, softOwner); err != nil {
return err
} else if !owned {
continue
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 025c76f

{&dst.SnapshotRepositories, src.SnapshotRepositories, mergeConfig},
{&dst.SnapshotLifecyclePolicies, src.SnapshotLifecyclePolicies, deepMergeConfig},
{&dst.SecurityRoleMappings, src.SecurityRoleMappings, deepMergeConfig},
{&dst.IndexLifecyclePolicies, src.IndexLifecyclePolicies, deepMergeConfig},
Copy link
Collaborator

@pebrc pebrc Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a problem with null values here. I assume when multiple policies apply and we canonicalize the configs (this happened when I went from 2 to 3 SCPs and depends on the order of merging i.e. the weights):

   java.lang.IllegalStateException: Error processing state change request for file_settings, errors: org.elasticsearch.xcontent.XContentParseException: [1:1112] [reserved_state_chunk] failed to parse field [state]
          Caused by: org.elasticsearch.xcontent.XContentParseException: [1:1112] [state] failed to parse field [ilm]
          Caused by: org.elasticsearch.xcontent.XContentParseException: [1:131] [lifecycle_policy] failed to parse field [phases]
          Caused by: org.elasticsearch.xcontent.XContentParseException: [1:131] [phases] failed to parse field [delete]
          Caused by: org.elasticsearch.xcontent.XContentParseException: [1:131] [phase] failed to parse field [actions]
          Caused by: org.elasticsearch.xcontent.XContentParseException: [1:131] [actions] failed to parse field [delete]
          Caused by: org.elasticsearch.xcontent.XContentParseException: [1:131] [delete] Expected START_OBJECT but was: VALUE_NULL

Using the example from our docs https://www.elastic.co/docs/deploy-manage/deploy/cloud-on-k8s/elastic-stack-configuration-policies

    "ilm": {
      "ilm_test": {
        "phases": {
          "delete": {
            "actions": {
              "delete": {}
            },
            "min_age": "30d"
          },
          "warm": {
            "actions": {
              "forcemerge": {
                "max_num_segments": 1
              }
            },
            "min_age": "10d"
          }
        }
      }
    },

becomes

    "ilm": {
      "ilm_test": {
        "phases": {
          "delete": {
            "actions": {
              "delete": null
            },
            "min_age": "30d"
          },
          "warm": {
            "actions": {
              "forcemerge": {
                "max_num_segments": 1
              }
            },
            "min_age": "10d"
          }
        }
      }
    },

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need a truly shallow merge for everything that is not elasticsearch.yml and clusterSettings.

Something like:

func mergeConfig(dst *commonv1.Config, src *commonv1.Config) (*commonv1.Config, error) {
	if src == nil {
		return dst, nil
	}

	if dst == nil {
		// No destination, just copy source
		return src.DeepCopy(), nil
	}

	if dst.Data == nil {
		dst.Data = make(map[string]interface{})
	}

	// Shallow merge: copy top-level keys from src to dst
	// This preserves empty maps since we're not going through canonicalization
	if src.Data != nil {
		maps.Copy(dst.Data, src.Data)
	}

	return dst, nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching that @pebrc , I will run further tests and try to pinpoint if this originates from go-ucfg or yaml marshalling/unmarshalling. Now about using only shallow merges how would that look like if we use dot-separated keys the other configs aside elasticsearch.yml and clusterSettings? Also sorry if I missed it but what prevents the same thing from happening for elasticsearch.yml and clusterSettings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I investigated this further - the issue is in the go-ucfg library during Unpack, where empty maps are replaced with nil. I considered replacing go-ucfg entirely for canonicalization and deep merging (I had success with custom code), but didn't push those changes to avoid expanding this PR's scope. This might be worth a follow-up PR.

In 799744b, I changed the merging behavior to support canonicalization and deep merging only for Config and ClusterSettings fields. Everything else uses shallow merging. I also added support for merging the deprecated SecureSettings at the SCP spec root, which I missed when testing your SCPs @pebrc.

@pkoutsovasilis
Copy link
Contributor Author

buildkite test this -f p=gke,t=TestStackConfigPolicy*

@pkoutsovasilis
Copy link
Contributor Author

buildkite test this -f p=gke,t=TestStackConfigPolicy*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement Enhancement of existing functionality v3.3.0 (next)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support multiple StackConfigPolicies per cluster

5 participants