Skip to content

Commit c864d56

Browse files
committed
kubectl events to beta
1 parent c8c1592 commit c864d56

File tree

3 files changed

+85
-186
lines changed

3 files changed

+85
-186
lines changed

keps/prod-readiness/sig-cli/1440.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 1440
22
alpha:
33
approver: "@wojtek-t"
4+
beta:
5+
approver: "@wojtek-t"

keps/sig-cli/1440-kubectl-events/README.md

Lines changed: 77 additions & 182 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@
1212
- [Risks and Mitigations](#risks-and-mitigations)
1313
- [Design Details](#design-details)
1414
- [Test Plan](#test-plan)
15+
- [Prerequisite testing updates](#prerequisite-testing-updates)
16+
- [Unit tests](#unit-tests)
17+
- [Integration tests](#integration-tests)
18+
- [e2e tests](#e2e-tests)
1519
- [Graduation Criteria](#graduation-criteria)
1620
- [Beta](#beta)
1721
- [GA](#ga)
@@ -35,17 +39,17 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
3539
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
3640
- [x] (R) KEP approvers have approved the KEP status as `implementable`
3741
- [x] (R) Design details are appropriately documented
38-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
39-
- [ ] e2e Tests for all Beta API Operations (endpoints)
40-
- [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
41-
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
42+
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
43+
- [x] e2e Tests for all Beta API Operations (endpoints)
44+
- [x] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
45+
- [x] (R) Minimum Two Week Window for GA e2e tests to prove flake free
4246
- [x] (R) Graduation criteria is in place
4347
- [x] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
4448
- [x] (R) Production readiness review completed
4549
- [x] (R) Production readiness review approved
4650
- [x] "Implementation History" section is up-to-date for milestone
47-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
48-
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
51+
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
52+
- [x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
4953

5054
[kubernetes.io]: https://kubernetes.io/
5155
[kubernetes/enhancements]: https://git.k8s.io/enhancements
@@ -102,7 +106,7 @@ Following is a list of long standing issues for `events`
102106

103107
### Goals
104108

105-
- Add an experimental `events` sub-command under the kubectl
109+
- Add an new `events` sub-command under the kubectl
106110
- Address existing issues mentioned above
107111

108112
### Non-goals
@@ -122,7 +126,8 @@ and most importantly will extend the `kubectl get events` functionality to addre
122126

123127
### Risks and Mitigations
124128

125-
None.
129+
Accessing events to which users don't have access to. This should be mitigated by a proper RBAC rules
130+
allowing access based on a need to know principle.
126131

127132
## Design Details
128133

@@ -138,17 +143,41 @@ Additionally, the new command should support all the printing flags available in
138143

139144
### Test Plan
140145

141-
In addition to standard unit tests for kubectl, the events command will be released as a kubectl alpha subcommand, signaling users to expect instability. During the alpha phase we will gather feedback from users that we expect will improve the design of debug and identify the Critical User Journeys we should test prior to Alpha -> Beta graduation.
146+
[x] I/we understand the owners of the involved components may require updates to
147+
existing tests to make this code solid enough prior to committing the changes necessary
148+
to implement this enhancement.
149+
150+
##### Prerequisite testing updates
151+
152+
Before any additional functional updates we need to ensure the current functionality
153+
is properly cover with unit and integration (`test/cmd`) tests.
154+
Before promoting to beta at least a single e2e test should also be added in
155+
`k8s.io/kubernetes/test/e2e/kubectl/kubectl.go`.
156+
157+
##### Unit tests
158+
159+
- `k8s.io/kubectl/pkg/cmd/events`: `2022-09-21` - `36.6%`
160+
161+
##### Integration tests
162+
163+
- `k8s.io/kubernetes/test/cmd/events.sh`: [test-cmd.run_kubectl_events_tests](https://testgrid.k8s.io/sig-release-master-blocking#integration-master)
164+
165+
##### e2e tests
166+
167+
- missing
142168

143169
### Graduation Criteria
144170

145171
Once the experimental kubectl events command is implemented, this can be rolled out in multiple phases.
146172

147173
##### Beta
148-
- [ ] Gather the feedback, which will help improve the command
149-
- [ ] Extend with the new features based on feedback
174+
175+
- [x] Add e2e tests, increase unit coverage.
176+
- [x] Gather the feedback, which will help improve the command
177+
- [x] Extend with the new features based on feedback
150178

151179
##### GA
180+
152181
- [ ] Address all major issues and bugs raised by community members
153182

154183
### Upgrade / Downgrade Strategy
@@ -172,7 +201,7 @@ so there should be no problems with Version Skew.
172201
- Components depending on the feature gate:
173202
- [X] Other
174203
- Describe the mechanism:
175-
A new command in `kubectl alpha`
204+
A new sub-command in `kubectl`
176205
- Will enabling / disabling the feature require downtime of the control
177206
plane?
178207
No
@@ -198,249 +227,115 @@ There will be explicit command for retrieving events.
198227

199228
###### Are there any tests for feature enablement/disablement?
200229

201-
No, because it cannot be disabled or enabled in a single release
230+
No, because it cannot be disabled or enabled in a single release.
202231

203232
### Rollout, Upgrade and Rollback Planning
204233

205-
<!--
206-
This section must be completed when targeting beta to a release.
207-
-->
234+
None, kubectl rollout requires just shipping a new binary.
208235

209236
###### How can a rollout or rollback fail? Can it impact already running workloads?
210237

211-
<!--
212-
Try to be as paranoid as possible - e.g., what if some components will restart
213-
mid-rollout?
214-
215-
Be sure to consider highly-available clusters, where, for example,
216-
feature flags will be enabled on some API servers and not others during the
217-
rollout. Similarly, consider large clusters and how enablement/disablement
218-
will rollout across nodes.
219-
-->
238+
A wrong binary might be delivered.
220239

221240
###### What specific metrics should inform a rollback?
222241

223-
<!--
224-
What signals should users be paying attention to when the feature is young
225-
that might indicate a serious problem?
226-
-->
242+
There are no metrics to follow.
227243

228244
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
229245

230-
<!--
231-
Describe manual testing that was done and the outcomes.
232-
Longer term, we may want to require automated upgrade/rollback tests, but we
233-
are missing a bunch of machinery and tooling and can't do that now.
234-
-->
246+
E2E which will be added with beta promotion will allow us to verify if the command
247+
behaves correctly during upgrade and downgrade.
235248

236249
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
237250

238-
<!--
239-
Even if applying deprecation policies, they may still surprise some users.
240-
-->
251+
The `kubectl alpha events` is being moved under `kubectl events`. Invoking the old
252+
location will print a warning that this command moved.
241253

242254
### Monitoring Requirements
243255

244-
<!--
245-
This section must be completed when targeting beta to a release.
246-
-->
256+
None.
247257

248258
###### How can an operator determine if the feature is in use by workloads?
249259

250-
<!--
251-
Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
252-
checking if there are objects with field X set) may be a last resort. Avoid
253-
logs or events for this purpose.
254-
-->
260+
There is no way cluster operator to differentiate between `kubectl get events` and `kubectl events`
261+
invocations since both invoke a GET operation on Events endpoint.
255262

256263
###### How can someone using this feature know that it is working for their instance?
257264

258-
<!--
259-
For instance, if this is a pod-related feature, it should be possible to determine if the feature is functioning properly
260-
for each individual pod.
261-
Pick one more of these and delete the rest.
262-
Please describe all items visible to end users below with sufficient detail so that they can verify correct enablement
263-
and operation of this feature.
264-
Recall that end users cannot usually observe component logs or access metrics.
265-
-->
266-
267-
- [ ] Events
268-
- Event Reason:
269-
- [ ] API .status
270-
- Condition name:
271-
- Other field:
272-
- [ ] Other (treat as last resort)
273-
- Details:
265+
`kubectl events` should be returning events similar to `kubectl get events`.
274266

275267
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
276268

277-
<!--
278-
This is your opportunity to define what "normal" quality of service looks like
279-
for a feature.
280-
281-
It's impossible to provide comprehensive guidance, but at the very
282-
high level (needs more precise definitions) those may be things like:
283-
- per-day percentage of API calls finishing with 5XX errors <= 1%
284-
- 99% percentile over day of absolute value from (job creation time minus expected
285-
job creation time) for cron job <= 10%
286-
- 99.9% of /health requests per day finish with 200 code
287-
288-
These goals will help you determine what you need to measure (SLIs) in the next
289-
question.
290-
-->
269+
None.
291270

292271
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
293272

294-
<!--
295-
Pick one more of these and delete the rest.
296-
-->
297-
298-
- [ ] Metrics
299-
- Metric name:
300-
- [Optional] Aggregation method:
301-
- Components exposing the metric:
302-
- [ ] Other (treat as last resort)
303-
- Details:
273+
- [x] Other (treat as last resort)
274+
- Details: invoking `kubectl events` returns data in a timely manner
304275

305276
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
306277

307-
<!--
308-
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
309-
implementation difficulties, etc.).
310-
-->
278+
None.
311279

312280
### Dependencies
313281

314-
<!--
315-
This section must be completed when targeting beta to a release.
316-
-->
282+
None.
317283

318284
###### Does this feature depend on any specific services running in the cluster?
319285

320-
<!--
321-
Think about both cluster-level services (e.g. metrics-server) as well
322-
as node-level agents (e.g. specific version of CRI). Focus on external or
323-
optional services that are needed. For example, if this feature depends on
324-
a cloud provider API, or upon an external software-defined storage or network
325-
control plane.
326-
327-
For each of these, fill in the following—thinking about running existing user workloads
328-
and creating new ones, as well as about cluster-level services (e.g. DNS):
329-
- [Dependency name]
330-
- Usage description:
331-
- Impact of its outage on the feature:
332-
- Impact of its degraded performance or high-error rates on the feature:
333-
-->
286+
None.
334287

335288
### Scalability
336289

337-
<!--
338-
For alpha, this section is encouraged: reviewers should consider these questions
339-
and attempt to answer them.
340-
341-
For beta, this section is required: reviewers must answer these questions.
342-
343-
For GA, this section is required: approvers should be able to confirm the
344-
previous answers based on experience in the field.
345-
-->
346-
347290
###### Will enabling / using this feature result in any new API calls?
348291

349-
<!--
350-
Describe them, providing:
351-
- API call type (e.g. PATCH pods)
352-
- estimated throughput
353-
- originating component(s) (e.g. Kubelet, Feature-X-controller)
354-
Focusing mostly on:
355-
- components listing and/or watching resources they didn't before
356-
- API calls that may be triggered by changes of some Kubernetes resources
357-
(e.g. update of object X triggers new updates of object Y)
358-
- periodic API calls to reconcile state (e.g. periodic fetching state,
359-
heartbeats, leader election, etc.)
360-
-->
292+
No new API calls are expected if compared with `kubectl get events`.
361293

362294
###### Will enabling / using this feature result in introducing new API types?
363295

364-
<!--
365-
Describe them, providing:
366-
- API type
367-
- Supported number of objects per cluster
368-
- Supported number of objects per namespace (for namespace-scoped objects)
369-
-->
296+
No.
370297

371298
###### Will enabling / using this feature result in any new calls to the cloud provider?
372299

373-
<!--
374-
Describe them, providing:
375-
- Which API(s):
376-
- Estimated increase:
377-
-->
300+
No.
378301

379302
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
380303

381-
<!--
382-
Describe them, providing:
383-
- API type(s):
384-
- Estimated increase in size: (e.g., new annotation of size 32B)
385-
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
386-
-->
304+
No.
387305

388306
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
389307

390-
<!--
391-
Look at the [existing SLIs/SLOs].
392-
393-
Think about adding additional work or introducing new steps in between
394-
(e.g. need to do X to start a container), etc. Please describe the details.
395-
396-
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
397-
-->
308+
No.
398309

399310
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
400311

401-
<!--
402-
Things to keep in mind include: additional in-memory state, additional
403-
non-trivial computations, excessive access to disks (including increased log
404-
volume), significant amount of data sent and/or received over network, etc.
405-
This through this both in small and large cases, again with respect to the
406-
[supported limits].
407-
408-
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
409-
-->
312+
No.
410313

411314
### Troubleshooting
412315

413-
<!--
414-
This section must be completed when targeting beta to a release.
415-
416-
The Troubleshooting section currently serves the `Playbook` role. We may consider
417-
splitting it into a dedicated `Playbook` document (potentially with some monitoring
418-
details). For now, we leave it here.
419-
-->
420-
421316
###### How does this feature react if the API server and/or etcd is unavailable?
422317

318+
Running `kubectl events` with unavailable API server and/or etcd will result
319+
in an error reported to user stating that the cluster is not available.
320+
423321
###### What are other known failure modes?
424322

425-
<!--
426-
For each of them, fill in the following information by copying the below template:
427-
- [Failure mode brief description]
428-
- Detection: How can it be detected via metrics? Stated another way:
429-
how can an operator troubleshoot without logging into a master or worker node?
430-
- Mitigations: What can be done to stop the bleeding, especially for already
431-
running user workloads?
432-
- Diagnostics: What are the useful log messages and their required logging
433-
levels that could help debug the issue?
434-
Not required until feature graduated to beta.
435-
- Testing: Are there any tests for failure mode? If not, describe why.
436-
-->
323+
- [No events]
324+
- Detection: Invoking `kubectl events` does not return any events.
325+
- Mitigations: Use `kubectl get events` instead.
326+
- Diagnostics: Compare with the output of `kubectl get events`. It's possible that
327+
there are no events in given namespace. Alternatively, use different namespace
328+
with `--namespace` flag.
437329

438330
###### What steps should be taken if SLOs are not being met to determine the problem?
439331

332+
None.
333+
440334
## Implementation History
441335

442336
- *2020-01-16* - Initial KEP draft
443337
- *2021-09-06* - Updated KEP with the new template and mark implementable for alpha implementation.
338+
- *2022-09-21* - Updated KEP for beta promotion.
444339

445340
## Alternatives
446341

0 commit comments

Comments
 (0)