Skip to content

Conversation

tjungblu
Copy link

@tjungblu tjungblu commented Oct 8, 2025

API PR in openshift/api#2520
Feature Gate PR in openshift/api#2525

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 8, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 8, 2025

@tjungblu: This pull request references CNTRLPLANE-1575 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 8, 2025

@tjungblu: This pull request references CNTRLPLANE-1575 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

API PR in openshift/api#2520

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 9, 2025

@tjungblu: This pull request references CNTRLPLANE-1575 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

API PR in openshift/api#2520
Feature Gate PR in openshift/api#2525

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@lance5890
Copy link

I'm also curious about the event-ttl setting in OCP. Please correct me if I'm missing something - why not set the event-ttl in apiserver(config.openshift.io/v1), just like:

type APIServerSpec struct {
	···
	// audit specifies the settings for audit configuration to be applied to all OpenShift-provided
	// API servers in the cluster.
	// +optional
	// +kubebuilder:default={profile: Default}
	Audit Audit `json:"audit"`
	···
	// +optional
	EventTTLMinutes int32 `json:"eventTTLMinutes,omitempty"`
}

Maybe we will set more parameters for all OpenShift-provided apiserver in the future, not only the event-ttl

@tjungblu
Copy link
Author

tjungblu commented Oct 9, 2025

Thanks for the review @lance5890 - events are only created through the kube-apiserver, so adding it to the others makes not much sense.

As for other configuration values (we have debated the GOAWAY chance parameter recently), this would be more suitable on the general apiserver config CRD.


1. Allow customers to configure the event-ttl setting for kube-apiserver through the OpenShift API
2. Provide a reasonable range of values (5 minutes to 3 hours) that covers most customer needs
3. Maintain backward compatibility with the current default of 3 hours (180 minutes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just so that we don't have to configure this value in CI? Given kube's default is 1 hour and we state there is little reason for a custom to need a longer period than 3 hours, could we also argue customers would generally want the default to be 1 hour and it's only 3 because of our internal processes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As usual, nothing was functionally specified on what the intended behavior is supposed to be.

I would also argue we could reduce the default to 1h safely and configure the CI for 3h. That would be a discussion for the apiserver folks to chime in, though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have one customer running a batch job to export events every 2 hours, then automatically reducing the TTL to 1 hour breaks them.

Events are meant to be transient, and if you need to retain them, they can be replicated elsewhere. That sort of replication is not currently a part of the platform (maybe there are optional operators on top that can be configured to do this), and I'm afraid to know how many programs are reliant on events for doing "real work".

I would prefer to keep the scope as small as possible while satisfying the request to allow shorter event TTLs. Users with constrained deployments or high event creation rates will appreciate the escape valve. I'm not opposed to migrating the default eventually, but it carries more risk and is not necessary to satisfy the RFE. Besides, the steady-state number of events depends on both the arrival rate and TTL. I suspect APF would be more a effective way to mitigate the worst case number of events by limiting the arrival rate.


## Proposal

We propose to add an `eventTTLMinutes` field to the `APIServer` resource in `config.openshift.io/v1` that allows customers to configure the event-ttl setting for kube-apiserver.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API PR is currently operator, not config btw, which do you intend to proceed with?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good one, no this is going to the KAS Operator CRD, it's an exclusive kube-apiserver setting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't important for other API servers? Looking at the existing operator CR there isn't any config there right now, might be worth checking with the API server folks where they think this should be landing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only kube-apiserver handles the events to attach leases on them, so it would only make sense there.

might be worth checking with the API server folks where they think this should be landing

Yep, just pinged them on slack. Happy to move it wherever it makes most sense.

Copy link
Contributor

@sjenning sjenning Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that, in Hypershift, the APIServer cluster config is part of the HostedCluster but the KASO operator config is not and we do not allow configuration inside the hosted cluster to modify how components in the hosted control plane (HCP) are configured.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjungblu is right that events are a kube-apiserver resource, not a generic API server thing. The event-ttl option only exists for kube-apiserver. I think we want to avoid expanding the surface of the config/v1 APIServer type with things that are not common across most/all API servers. At least, that's the stated intent:

// APIServer holds configuration (like serving certificates, client CA and CORS domains)
// shared by all API servers in the system, among them especially kube-apiserver
// and openshift-apiserver. The canonical name of an instance is 'cluster'.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KubeAPIServerConfig looks like it could work, seems abandoned besides last years addition with MinimumKubeletVersion.

@sjenning I think you're referring to this, right?
https://github.com/openshift/hypershift/blob/main/api/hypershift/v1beta1/hostedcluster_types.go#L1855-L1859

that would be:
https://github.com/openshift/api/blob/master/config/v1/types_apiserver.go#L37

if the others are OK with moving the setting there, we can also lift and shift this into Hypershift that way.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KubeAPIServerConfig in kubecontrolplane/v1 is the format of the config file, it's not a resource that we serve anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that in general config/v1 APIServer is meant to be used for settings shared among most API servers.

The operator/v1 KubeAPIServer resource seems like a good candidate for a setting that applies only to the kube-apiserver. This resource already contains spec.logLevel field which is used to set the log level for the operand.

As a cluster administrator in a regulated environment, I want to configure a longer event retention period so that I can meet compliance requirements for audit trails and debugging.

#### Story 2: Storage Optimization
As a cluster administrator with limited etcd storage, I want to configure a shorter event retention period so that I can reduce etcd storage usage while maintaining sufficient event history for troubleshooting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any metrics in terms of bytes of data that events generate over time in a regular openshift cluster?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check telemeter tomorrow, I think we may have the counts. Bytes are usually difficult to attain because etcd does not expose those per CRD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to understand the impact on the data, so if we can without too much effort have some estimates of the impact of this change, I suspect that will be useful for folks to know

Copy link
Author

@tjungblu tjungblu Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope I got the query right:

topk(10, cluster:usage:resources:sum{resource="events"})

the biggest cluster has about 3-4 million events in it and then it levels off real quick, putting it into quantiles:

image

the very large majority of clusters probably won't notice when changing the TTL to 5m with this few events in storage.

The biggest cluster with 3-4 million also has not much else beyond events in it, so taking it as a sample against its five gigabytes of etcd used storage is 1.5kb per event, on average.

Looking into the arrival rate here, this is constant at about 0.2/s with a one to two hour window at 0.8/s. Not super larger either.


#### Hypershift / Hosted Control Planes

This enhancement does not apply to Hypershift.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if HyperShift also uses 3h by default? Do they have similar requests to be able to change the config?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github is down more than it is up right now, but they also use 3h:
https://github.com/openshift/hypershift/blob/69c68a4003cbeb0048c9c31d8b68bed6fc287970/control-plane-operator/controllers/hostedcontrolplane/v2/kas/config.go#L214

As for the other question, I'll ping Seth tomorrow on reviewing this.

Copy link
Contributor

@sjenning sjenning Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hypershift follows KASO for default config generally and does use 3h for event-ttl same as KASO

While we don't have an explicit request for this in HCP, we should look to allow this for all topologies

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll note this down, we can do the necessary hypershift changes as well

Copy link
Author

@tjungblu tjungblu Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjenning I was just looking into how GOAWAY chance was implemented and it seems to be based on an annotation:
https://github.com/openshift/hypershift/blob/main/control-plane-operator/controllers/hostedcontrolplane/v2/kas/params.go#L121-L123

openshift/hypershift#6019

do you think this could be a viable implementation here, too? Given that, in the other thread, we wanted to keep it in the operator CRD for OCP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we shouldn't be adding new annotations, is there any reason this couldn't be a proper structured API on whichever is the appropriate HyperShift object?

kind: APIServer
apiVersion: config.openshift.io/v1
spec:
eventTTLMinutes: 60 # Integer value in minutes, e.g., 60, 180
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any configuration subsections that it might make sense to group this into? How does this look in the upstream API server configuration file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a plain cmdline argument with a duration in kube-apiserver
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/

--event-ttl duration     Default: 1h0m0s
Amount of time to retain events.

downstream, not sure, also something for the apiserver folks to chime in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no resource/file for configuring kas upstream, settings are passed as cli flags.

We could group this under ApiServerArguments, but I’m not sure if that would provide a good UX for administrators to set/use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect then we just leave it where it is for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, for the downstream fork we have KubeAPIServerConfig, which is filled by the operator and then consumed by the forked kas.

I think the only alternative would be to put this into config/v1 APIServer and mark that fields as kas specific.
(unless we also consider creating a CM that would be created by admins but that would provide terrible UX/validation)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, then let's leave it where it is. We need to find a way to add this into Hypershift then.

Comment on lines 145 to 149
1. **etcd Compaction Bandwidth**: With faster expiring events, etcd will need to perform compaction more frequently to remove expired events. This increases the bandwidth usage for etcd compaction operations.

2. **etcd CPU Usage**: More frequent compaction operations will increase CPU usage on etcd nodes, as the compaction process requires CPU cycles to identify and remove expired events.

3. **Event Availability**: Events will be deleted more quickly, potentially reducing the time window available for debugging and troubleshooting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These impacts might be worth touching on briefly in the godoc for the field, in terms of you are trading lower storage for increased CPU usage. At the moment the godoc only has a positive and doesn't explain the trade off

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to quantify the order of magnitude here. I think this is too negligable to even notice, but I wasn't able to test this in-depth yet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having said that, the compactions do not become more frequent, they just become more expensive.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a practical risk of decreasing event-ttl by 90%+ for a cluster with many events (i.e. how expensive can the first expiry cycle be)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good scenario, let me also measure this. The compaction is always O(n) where n is the number of revisions to be compacted, either 10k or whatever we write in the 5 minute interval.

When we reduce the TTL to 1h, we would delete the prior events for 3h and after 1h the additional incoming events since changing the TTL. For the remaining two hours, there would be an elevated rate of those deletes until the 3h backlog is fully expired. Then it should settle on some "steady-state" for the 1h.

What could be also worrisome is any watchers that run on the events resource. I'm not sure whether a lease-based delete would trigger a delete notification. This seems quite wasteful and I think the keys are just internally tombstoned on the index until compacted.

Copy link
Contributor

openshift-ci bot commented Oct 10, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mrunalp for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

2. **Cluster Administrator** edits the operator configuration resource
3. **Cluster Administrator** sets the `eventTTLMinutes` field to the desired value in minutes (e.g., 60, 180)
4. **kube-apiserver-operator** detects the configuration change
5. **kube-apiserver-operator** updates the kube-apiserver deployment with the new configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think in practice we will create an observer that sets apiServerArguments/event-ttl, which will eventually be propagated to the operator’s observedConfig, and later propagated as a config to kas

xref: https://github.com/openshift/cluster-kube-apiserver-operator/blob/6ea3bb4c477c88376edb91cffee23e8229a1da8a/pkg/operator/configobservation/apiserver/observe_goaway_chance.go#L16C34-L16C52

#### Impact of removing 3 gigabytes of events

To represent the worst case of removing 3 gigabyte of events, we have filled a 4.21 nightly cluster with 3 million events and the default TTL.
Then configured a 5 minute TTL and watch the resource usage over the coming three hours...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be great if you could share some data, especially regarding cpu/memory usage of etcd/kas.

in general kas shouldn’t be affected, since the watch cache should be/is (will try to verify this) disabled for events.

Copy link
Contributor

openshift-ci bot commented Oct 13, 2025

@tjungblu: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants