Skip to content

Commit e16826d

Browse files
Merge pull request #1857 from tjungblu/CNTRLPLANE-1575
CNTRLPLANE-1575: Add support for event-ttl in Kube API Server Operator
2 parents 10e8fd8 + 436248f commit e16826d

File tree

1 file changed

+275
-0
lines changed

1 file changed

+275
-0
lines changed
Lines changed: 275 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,275 @@
1+
---
2+
title: event-ttl
3+
authors:
4+
- "@tjungblu"
5+
- "CursorAI"
6+
reviewers:
7+
- benluddy
8+
- p0lyn0mial
9+
- dgrisonnet
10+
approvers:
11+
- sjenning
12+
api-approvers:
13+
- JoelSpeed
14+
creation-date: 2025-10-08
15+
last-updated: 2025-10-15
16+
tracking-link:
17+
- https://issues.redhat.com/browse/OCPSTRAT-2095
18+
- https://issues.redhat.com/browse/CNTRLPLANE-1539
19+
- https://github.com/openshift/api/pull/2520
20+
- https://github.com/openshift/api/pull/2525
21+
status: proposed
22+
see-also:
23+
replaces:
24+
superseded-by:
25+
---
26+
27+
# Event TTL Configuration
28+
29+
## Summary
30+
31+
This enhancement describes a configuration option in the operator API to configure the event-ttl setting for the kube-apiserver. The event-ttl setting controls how long events are retained in etcd before being automatically deleted.
32+
33+
Currently, OpenShift uses a default event-ttl of 3 hours (180 minutes), while upstream Kubernetes uses 1 hour. This enhancement allows customers to configure this value based on their specific requirements, with a range of 5 minutes to 3 hours (180 minutes), with a default of 180 minutes (3 hours).
34+
35+
## Motivation
36+
37+
The event-ttl setting in kube-apiserver controls the retention period for events in etcd. Events are automatically deleted after this duration to prevent etcd from growing indefinitely. Different customers have different requirements for event retention:
38+
39+
- Some customers need longer retention for compliance or debugging purposes
40+
- Others may want shorter retention to reduce etcd storage usage
41+
- The current fixed value of 3 hours may not suit all use cases
42+
43+
The maximum value of 3 hours (180 minutes) was chosen to align with the current OpenShift default value. While upstream Kubernetes uses 1 hour as the default, OpenShift's 3-hour default was established to support CI runs that may need to retain events for the entire duration of a test run. For customer use cases, the 3-hour maximum provides sufficient retention for compliance and debugging needs, while the 1-hour upstream default would be more appropriate for general customer workloads.
44+
45+
### Goals
46+
47+
1. Allow customers to configure the event-ttl setting for kube-apiserver through the OpenShift API
48+
2. Provide a reasonable range of values (5 minutes to 3 hours) that covers most customer needs
49+
3. Maintain backward compatibility with the current default of 3 hours (180 minutes)
50+
4. Ensure the configuration is properly validated and applied
51+
52+
### Non-Goals
53+
54+
- Changing the default event-ttl value (will remain 3 hours/180 minutes)
55+
- Supporting event-ttl values outside the recommended range (5-180 minutes)
56+
- Modifying the underlying etcd compaction behavior beyond what the event-ttl setting provides
57+
58+
## Proposal
59+
60+
We propose to add an `eventTTLMinutes` field to the operator API that allows customers to configure the event-ttl setting for kube-apiserver.
61+
62+
### User Stories
63+
64+
#### Story 1: Storage Optimization
65+
As a cluster administrator with limited etcd storage, I want to configure a shorter event retention period so that I can reduce etcd storage usage while maintaining sufficient event history for troubleshooting. Event data can consume significant etcd storage over time, and reducing the retention period can help manage storage growth.
66+
67+
#### Story 2: Default Behavior
68+
As a cluster administrator, I want the current default behavior to be preserved so that existing clusters continue to work without changes.
69+
70+
### API Extensions
71+
72+
This enhancement modifies the operator API by adding a new `eventTTLMinutes` field.
73+
74+
### Workflow Description
75+
76+
The workflow for configuring event-ttl is straightforward:
77+
78+
1. **Cluster Administrator** accesses the OpenShift cluster via CLI or web console
79+
2. **Cluster Administrator** edits the operator configuration resource
80+
3. **Cluster Administrator** sets the `eventTTLMinutes` field to the desired value in minutes (e.g., 60, 180)
81+
4. **kube-apiserver-operator** detects the configuration change
82+
5. **kube-apiserver-operator** updates the kube-apiserver deployment with the new configuration
83+
6. **kube-apiserver** restarts with the new event-ttl setting
84+
7. **etcd** begins using the new event retention policy for future events
85+
86+
The configuration change takes effect immediately for new events, while existing events continue to use their original TTL until they expire.
87+
88+
### Topology Considerations
89+
90+
#### Hypershift / Hosted Control Planes
91+
92+
For HyperShift, this enhancement will use an annotation-based approach on the `HostedCluster` resource, just as with other kube-apiserver configurations such as `goaway-chance`. Users can control the event-ttl setting by specifying the `hypershift.openshift.io/event-ttl-minutes` annotation. The control-plane-operator will read this annotation and update the kube-apiserver deployment in the hosted control plane accordingly (following the pattern described in [openshift/hypershift#6019](https://github.com/openshift/hypershift/pull/6019)). HyperShift continues to use the same 3-hour default as standalone OpenShift clusters unless overridden.
93+
94+
#### Standalone Clusters
95+
96+
This enhancement is fully applicable to standalone OpenShift clusters. The event-ttl configuration will be applied to the kube-apiserver running in the control plane, affecting event retention in the cluster's etcd.
97+
98+
#### Single-node Deployments or MicroShift
99+
100+
For single-node OpenShift (SNO) deployments, this enhancement will work as expected. The event-ttl configuration will be applied to the kube-apiserver running on the single node.
101+
102+
For MicroShift, this enhancement is not directly applicable as MicroShift uses a different architecture and may not have the same event-ttl configuration options. MicroShift also uses a 3-hour TTL by default, but since it doesn't use the kube-apiserver operator, the configuration approach described in this enhancement may not work.
103+
104+
### Implementation Details/Notes/Constraints
105+
106+
The proposed API looks like this:
107+
108+
```yaml
109+
apiVersion: operator.openshift.io/v1
110+
kind: KubeAPIServer
111+
metadata:
112+
name: cluster
113+
spec:
114+
eventTTLMinutes: 60 # Integer value in minutes, e.g., 60, 180
115+
```
116+
117+
The `eventTTLMinutes` field will be an integer value representing minutes. The field will be validated to ensure it falls within the required range of 5-180 minutes. In the upstream Kubernetes API server configuration, `event-ttl` is typically set as a standalone parameter, so placing `eventTTLMinutes` directly under the operator spec without additional nesting maintains consistency with upstream patterns.
118+
119+
The API design is based on the changes in [openshift/api PR #2520](https://github.com/openshift/api/pull/2520), and the feature gate implementation is in [openshift/api PR #2525](https://github.com/openshift/api/pull/2525). The API changes include:
120+
121+
```go
122+
type KubeAPIServerSpec struct {
123+
StaticPodOperatorSpec `json:",inline"`
124+
125+
// eventTTLMinutes specifies the amount of time that the events are stored before being deleted.
126+
// The TTL is allowed between 5 minutes minimum up to a maximum of 180 minutes (3 hours).
127+
//
128+
// Lowering this value will reduce the storage required in etcd. Note that this setting will only apply
129+
// to new events being created and will not update existing events.
130+
//
131+
// When omitted this means no opinion, and the platform is left to choose a reasonable default, which is subject to change over time.
132+
// The current default value is 3h (180 minutes).
133+
//
134+
// +openshift:enable:FeatureGate=EventTTL
135+
// +kubebuilder:validation:Minimum=5
136+
// +kubebuilder:validation:Maximum=180
137+
// +optional
138+
EventTTLMinutes int32 `json:"eventTTLMinutes,omitempty"`
139+
}
140+
```
141+
142+
### Impact of Lower TTL Values
143+
144+
etcd uses an optimized lease expiration mechanism where a lessor runs in the background, polling every 500ms for expired leases using a queue ordered by expiration time (not O(N) iteration over all leases). The leader processes expired leases in parallel, and lease deletions are published via raft.
145+
146+
Setting the event-ttl to values lower than the OpenShift default of 3 hours will primarily impact:
147+
148+
1. **etcd Memory and Disk Usage**: Lower TTL values reduce the number of active leases in etcd, resulting in lower memory and disk space consumption for event storage.
149+
150+
2. **Raft Operations**: The number of expired leases per minute remains roughly the same, as it is dependent on event arrival rate.
151+
152+
3. **Event Availability**: Events will be deleted more quickly, reducing the time window available for debugging and troubleshooting.
153+
154+
155+
#### Fleet Analytics Data
156+
157+
Based on fleet analytics data, the storage impact of reducing event TTL can be quantified:
158+
159+
- **Largest Cluster**: ~3-4 million events with average size of 1.5KB
160+
- Reducing TTL from 3 hours to 1 hour (by 1/3) would reduce etcd event storage to approximately 1.5GB
161+
- **Median Cluster**: ~1,391 events in storage
162+
- **90th Percentile**: ~6,700 events in storage
163+
164+
This data shows that while the largest clusters would see significant storage savings (reducing from ~4.5GB to ~1.5GB for the biggest outlier), the majority of clusters have much smaller event footprints where the storage impact would be minimal. We expect, even drastic, lowering to not have any observable impact to CPU or bandwidth on the majority of our clusters.
165+
166+
#### Impact of configuring 5m TTL
167+
168+
After filling etcd with approximately 4GB of events over 3 hours, then switching to a 5-minute TTL, we observed a sharp drop in storage usage and memory consumption on etcd after the 3h events have expired and the storage got compacted and defragmented.
169+
170+
CPU usage showed a slight initial increase followed by a reduction (measured across both etcd and apiserver components), while apiserver memory remained relatively stable. The compaction duration on etcd demonstrated the expected linear relationship with the number of keys being processed, confirming predictable performance characteristics under this workload.
171+
172+
There were no long-term increases in CPU/memory usage found after configuring a 5m TTL vs. the existing default.
173+
174+
### Risks and Mitigations
175+
176+
**Risk**: Customers might set the value to 0, which means that the events will never be deleted
177+
**Mitigation**: The API validation and operator ensures the values are within a reasonable range and never zero (5-180 minutes).
178+
179+
### Drawbacks
180+
181+
- Adds complexity to the configuration API
182+
- Additional validation and error handling required
183+
184+
## Alternatives (Not Implemented)
185+
186+
1. **Hardcoded Values**: Keep the current fixed value of 3 hours
187+
- **Rejected**: Does not meet customer requirements for configurability
188+
189+
2. **Environment Variable**: Use environment variables instead of API configuration
190+
- **Rejected**: Less user-friendly and harder to manage
191+
192+
3. **Separate CRD**: Create a separate CRD for event configuration
193+
- **Rejected**: Overkill for a single setting, better to include in existing APIServer resource
194+
195+
## Test Plan
196+
197+
The test plan will include:
198+
199+
1. **Unit Tests**: Test the API validation and parsing logic
200+
2. **E2E Tests**: Test that the event TTL is properly configured on all apiserver after applying the setting
201+
3. **Performance Tests**: Test the impact of different TTL values on etcd performance
202+
203+
## Tech Preview
204+
205+
The EventTTL feature is controlled by the `EventTTL` feature gate, which is enabled by default in both DevPreview and TechPreview feature sets. This allows the feature to be available for testing and evaluation without requiring additional configuration.
206+
207+
The EventTTL feature gate is implemented in [openshift/api PR #2525](https://github.com/openshift/api/pull/2525) and will be removed when the feature graduates to GA, as the functionality will become a standard part of the platform.
208+
209+
## Graduation Criteria
210+
211+
### Dev Preview -> Tech Preview
212+
213+
- API is implemented and validated
214+
- Basic functionality works end-to-end
215+
- Documentation is available
216+
- Sufficient test coverage
217+
- EventTTL feature gate is enabled in DevPreview and TechPreview feature sets
218+
219+
### Tech Preview -> GA
220+
221+
- More comprehensive testing (upgrade, downgrade, scale)
222+
- Performance testing with various TTL values
223+
- User feedback incorporated
224+
- Documentation updated in openshift-docs
225+
- EventTTL feature gate is removed as the feature becomes GA
226+
227+
### Removing a deprecated feature
228+
229+
This enhancement does not remove any existing features. It only adds new configuration options while maintaining backward compatibility with the existing default behavior.
230+
231+
## Upgrade / Downgrade Strategy
232+
233+
### Upgrade Strategy
234+
235+
- Existing clusters will continue to use the default 3-hour (180-minute) TTL
236+
- No changes required for existing clusters
237+
- New configuration option is available immediately
238+
239+
### Downgrade Strategy
240+
241+
- Configuration will be ignored by older versions
242+
- No impact on cluster functionality
243+
- Events will continue to use the default TTL (180 minutes)
244+
245+
## Version Skew Strategy
246+
247+
- The event-ttl setting is a kube-apiserver configuration
248+
- No coordination required with other components
249+
- Version skew is not a concern for this enhancement
250+
251+
## Operational Aspects of API Extensions
252+
253+
This enhancement modifies the operator API but does not add new API extensions. The impact is limited to:
254+
255+
- Configuration validation in the kube-apiserver-operator
256+
- Application of the setting to kube-apiserver deployment
257+
- No impact on API availability or performance
258+
259+
## Support Procedures
260+
261+
### Detection
262+
263+
- Configuration can be verified by checking the operator configuration resource
264+
- kube-apiserver logs will show the configured event-ttl value
265+
- etcd metrics can be monitored for compaction frequency
266+
267+
### Troubleshooting
268+
269+
- If events are not being deleted as expected, check the event-ttl configuration
270+
- Monitor etcd compaction metrics for unusual patterns
271+
272+
## Implementation History
273+
274+
- 2025-10-08: Initial enhancement proposal
275+

0 commit comments

Comments
 (0)