Skip to content

Commit 2bb4f43

Browse files
committed
Behavior conformance KEP user stories
1 parent 3490def commit 2bb4f43

File tree

2 files changed

+240
-20
lines changed

2 files changed

+240
-20
lines changed

keps/sig-architecture/960-conformance-behaviors/README.md

Lines changed: 237 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,29 @@
99
- [Goals](#goals)
1010
- [Non-Goals](#non-goals)
1111
- [Proposal](#proposal)
12+
- [User Stories](#user-stories)
13+
- [Role: Developer](#role-developer)
14+
- [Promote a Non-Optional Feature to GA](#promote-a-non-optional-feature-to-ga)
15+
- [Creating a brand new feature, either required or optional](#creating-a-brand-new-feature-either-required-or-optional)
16+
- [Role: Kubernetes Vendor](#role-kubernetes-vendor)
17+
- [Evaluating a distribution for conformance](#evaluating-a-distribution-for-conformance)
18+
- [Identifying the profiles supported by a distribution](#identifying-the-profiles-supported-by-a-distribution)
19+
- [Role: CNCF Conformance Program](#role-cncf-conformance-program)
20+
- [Evaluate a Vendor Submission](#evaluate-a-vendor-submission)
21+
- [Role: CI Job](#role-ci-job)
22+
- [Identify a PR as requiring conformance review](#identify-a-pr-as-requiring-conformance-review)
23+
- [Evaluating a PR for conformance coverage](#evaluating-a-pr-for-conformance-coverage)
24+
- [Role: Behavior Approver](#role-behavior-approver)
25+
- [Review / approve new suites and behaviors](#review--approve-new-suites-and-behaviors)
26+
- [Verify behaviors follow the rules](#verify-behaviors-follow-the-rules)
27+
- [Role: Test Approver](#role-test-approver)
28+
- [Review / approve new tests](#review--approve-new-tests)
29+
- [Verify behavior coverage](#verify-behavior-coverage)
30+
- [Verify non-flakiness of tests](#verify-non-flakiness-of-tests)
31+
- [Verify test follows the rules](#verify-test-follows-the-rules)
32+
- [Role: SIG](#role-sig)
33+
- [Define expected behaviors for their area of responsibility](#define-expected-behaviors-for-their-area-of-responsibility)
34+
- [Solution Overview](#solution-overview)
1235
- [Representation of Behaviors](#representation-of-behaviors)
1336
- [Behavior and Test Generation Tooling](#behavior-and-test-generation-tooling)
1437
- [Handwritten Behaviour Scenarios](#handwritten-behaviour-scenarios)
@@ -118,35 +141,228 @@ tests and test scaffolding to quickly cover those behaviors.
118141

119142
## Proposal
120143

121-
The proposal consists of four deliverables:
144+
### User Stories
145+
146+
#### Role: Developer
147+
148+
##### Promote a Non-Optional Feature to GA
149+
150+
Conformance tests are required when promoting a non-optional feature to GA.
151+
152+
Today, the desired process consists of writing the tests as ordinary e2e tests,
153+
making sure they are not flaky by having them run for several weeks without
154+
flakes, and then including the promotion of those tests in the PR that promotes
155+
the feature. However, even without the test promotion, PRs that promote
156+
features are already large; for example:
157+
158+
* [Promote PodDisruptionBudget to
159+
GA](https://github.com/kubernetes/kubernetes/pull/81571) (91 files changed)
160+
* [Promote block volumes to
161+
GA](https://github.com/kubernetes/kubernetes/pull/88673) (46 files changed)
162+
* [Promote node lease to
163+
GA](https://github.com/kubernetes/kubernetes/pull/84351) (17 files changed)
164+
165+
Thus, today developers typically submit the test promotions in a separate PR, in
166+
order to avoid adding more changes, along with an additional review team that
167+
further slows the merge. This makes it difficult to develop a CI job that
168+
prevents features from going to GA without conformance tests.
169+
170+
With the separation of behaviors and tests, the tasks a developer needs to
171+
complete are:
172+
173+
1. Define expected behaviors
174+
1. Get behaviors approved by the conformance-behavior-approvers
175+
1. Write tests to cover those behaviors
176+
1. Get tests approved by the conformance-test-approvers
177+
* Prove that the tests that will be conformance are not flaky
178+
* Promote the tests to conformance in that PR
179+
1. Create a PR that promotes my feature to GA
180+
181+
As a developer, I would like to be able to have as much of this completed and
182+
merged prior to the PR that promotes the feature to GA, in order to avoid
183+
additional reviews on that PR.
184+
185+
<<[UNRESOLVED context and discussion around solutions this use case ]>>
186+
@johnbelamaric
187+
One option: We could get it all to a state where it's all done, but behaviors are
188+
marked as "PENDING". The promo to GA would still require touching the behaviors,
189+
to flip the status from PENDING to ACTIVE, but it should be a formality at that
190+
point. Promo to "conformance" for the tests could have already been done just
191+
with the PENDING status so it won't count yet. Other ideas?
192+
@jefftree
193+
I was thinking something along the same lines. One thing to note is that this is
194+
promoting a set of tests (that cover a set of behaviors) rather than a set of
195+
behaviors themselves. Tests could cover existing behaviors (is this a correct
196+
assumption?) so it might make more sense to have the switch on the tests rather
197+
than the behaviors side.
198+
<<[/UNRESOLVED]>>
199+
200+
<<[UNRESOLVED @spiffxp: should this capture preconditions for testing: ]>>
201+
* All behaviors present
202+
* All behaviors covered by tests
203+
* Tests should have been around to verify non-flakiness
204+
* Tests should have been reviewed by conformance reviewer to make sure they meet
205+
the criteria - can we front load this?
206+
<<[/UNRESOLVED]>>
207+
208+
##### Creating a brand new feature, either required or optional
209+
During creation of an alpha or beta feature, conformance tests are not required,
210+
nor are conformance behaviors. However, at the beta stage, the expectation
211+
should be to have some quality end-to-end tests, and so we may want to allow the
212+
definition of the behaviors at that time too. Tasks then would be similar to
213+
some of those for GA promotion:
214+
1. Define expected behaviors
215+
1. Get provisional behaviors approved by the conformance-behavior-approvers
216+
1. Write tests to cover those behaviors
217+
218+
<<[UNRESOLVED]>>
219+
@johnbelamaric
220+
Ideally we could avoid the provisional behavior approval. Maybe we can have a
221+
way to have a separate behaviors area for beta stuff? Or maybe we just don't
222+
have this at all for beta, and it waits till GA. The reason I bring up making
223+
bahaviors now is because the initial idea of `kubetestgen` was to support these
224+
steps: creation of behaviors, and creation of standard e2e tests for those
225+
behaviors.
226+
@jefftree
227+
Getting this list of behaviors approved is something that needs to eventually be
228+
done before hitting GA. I don't know how much these behaviors would change
229+
between beta and GA, but if they're relatively stable and mainly additive,
230+
starting the process early seems fine. Similar to your previous point, we should
231+
look to move some of these behavior approvals earlier in the process to avoid
232+
the chaos of reviews when a feature is going to GA.
233+
<<[/UNRESOLVED]>>
234+
235+
#### Role: Kubernetes Vendor
236+
237+
##### Evaluating a distribution for conformance
238+
* Must set up test cluster and run sonobuoy conformance tests
239+
* If successful, submit PR to CNCF. If failures exist, debug them
240+
241+
##### Identifying the profiles supported by a distribution
242+
* Must run a set of conformance tests for each profile supported
243+
244+
#### Role: CNCF Conformance Program
245+
246+
##### Evaluate a Vendor Submission
247+
* Must confirm that the version of the tests being run matches the version being
248+
certified
249+
* Must confirm the set of tests being run matches the set of tests for the
250+
version (+ profile(s)) being certified
251+
* Must confirm that all behaviors are covered by a test that executes, and that
252+
no tests fail (This isn’t done today: verify skew policy - confirm a cluster
253+
being certified for version 1.x also passes conformance tests for version
254+
1.x-1)
255+
256+
#### Role: CI Job
257+
258+
##### Identify a PR as requiring conformance review
259+
PR must touch file in conformance-specific directory
260+
* eg: update test/conformance/behaviors/..
261+
* eg: mv from test/e2e to test/conformance
262+
263+
##### Evaluating a PR for conformance coverage
264+
* Must be able to confirm for each behavior that at least one test exercises a
265+
given behavior
266+
* Must be able to list all expected behaviors for conformance
267+
* Coverage is defined by (exercised behaviors) / (expected behaviors)
268+
* May be able to list set of tests that exercise a given behavior
269+
* Should not bother gating or paying attention too closely to coverage until we
270+
have locked (expected behaviors) in place;
271+
272+
#### Role: Behavior Approver
273+
274+
##### Review / approve new suites and behaviors
275+
* Must verify that the listed behaviors are common across cluster providers and
276+
can be supported in new cluster providers.
277+
* Must be able to identify if all of the expected behaviors are listed; this may
278+
mean seeing API definitions and configuration parameters, if those are
279+
expected to be part of the defined behaviors.
280+
* Must be able to identify if any behaviors are LinuxOnly
281+
282+
##### Verify behaviors follow the rules
283+
* The minimal set of behaviors for a given resource must include the basic
284+
functioning of the API CRUD operations, and of the resulting changes in
285+
cluster / data plane state.
286+
* Must be able to verify behaviors do not rely on features that are deprecated
287+
(or pending deprecation, eg: componentstatus)
288+
* May strive to minimize the number behaviors that rely on a specific NodeOS
289+
290+
#### Role: Test Approver
291+
292+
##### Review / approve new tests
293+
* Should be able to reject addition of a new test if there is no associated
294+
behavior
295+
* Should require a behavior approver if a new behavior is added at the same a
296+
test is added
297+
298+
##### Verify behavior coverage
299+
* Must be able to confirm the test in question actually exercises the
300+
described/linked behavior(s)
301+
* Should NOT require all test code maps directly to behavior(s) (eg: “it looks
302+
like you’re exercising the Foo api, is there a Foo behavior that should be
303+
associated with this test?”).
304+
* When promoting to Conformance, must be able to verify no feature flags or
305+
additional configuration is necessary to enable the feature
306+
* Must verify there should be at most one test linked to a given behavior. That
307+
is, implicitly covered behaviors should NOT be listed as covered behaviors.
308+
There should be a single explicit test for any given behavior.
309+
* One test may cover multiple behaviors
310+
311+
##### Verify non-flakiness of tests
312+
* May be able to identify known anti-patterns in the test code (eg: watches that
313+
break down at scale, arbitrary sleeps)
314+
* When promoting to Conformance, test MUST have sufficient history to prove
315+
non-flakiness (eg: today, we link to testgrid and confirm that it looks good…
316+
we don’t mandate specific thresholds, and we don’t mandate specific cluster
317+
configurations)
318+
319+
##### Verify test follows the rules
320+
* Must be able to confirm all associated Behavior(s) are eligible for
321+
Conformance
322+
* Must be able to confirm the test(s) in question exercise only GA APIs
323+
* Must be able to confirm the test(s) in question do NOT require access to
324+
kubelet APIs to pass
325+
* Must not depend on specific Events (nor their contents) to pass
326+
* Must not depend on optional Condition fields
327+
* etc.
328+
329+
#### Role: SIG
330+
331+
##### Define expected behaviors for their area of responsibility
332+
* Should be able to enumerate list of behaviors for a given API/resource
333+
* Should be able to enumerate list of behaviors for a given feature (eg:
334+
[Feature:Foo] suite of tests)
335+
* Should be able to enumerate list of behaviors for a given set of e2e tests
336+
owned by the SIG
337+
338+
### Solution Overview
339+
The proposed solution consists of four deliverables:
122340
* A machine readable format to define conforming behaviors.
123341
* Tooling to generate lists of behaviors from the API schemas.
124-
* Tooling to generate tests and test scaffolding to evaluate those behaviors.
125342
* Tooling to compare the implemented tests to the list of behaviors and
126343
calculate coverage.
344+
* Tooling to generate tests and test scaffolding to evaluate those behaviors.
127345

128346
### Representation of Behaviors
129347

130348
Behaviors will be captured in prose, which is in turn embedded in a YAML file
131-
along with meta-data about the behavior.
349+
along with meta-data about the behavior. More details on exactly what defines
350+
a behavior is documented in the [behaviors
351+
README](https://git.k8s.io/kubernetes/test/conformance/behaviors/README.md).
132352

133353
Behaviors must be captured in the repository and agreed upon as required for
134-
conformance. Behaviors are broken into feature areas, and there are multiple
135-
test suites (named sets of tests) for each feature area. Some of these suites
354+
conformance. Behaviors are broken out by owning SIG, and there are multiple
355+
test suites (named sets of tests) for each SIG. Some of these suites
136356
may be machine-generated based upon the API schema, whereas others are
137357
handwritten. Keeping the generated and handwritten suites in separate files
138358
allows regeneration of the auto-discovered behavior suites. Some areas may be
139359
defined by API group and Kind, while others will be subjectively defined by
140360
subject-matter experts.
141361

142-
Validation and conformance designations are made on a per-suite basis,
143-
not a per-behavior basis. There may be multiple suites in a feature area
144-
that are required for validation and/or conformance.
145-
146362
The grouping at the suite level should be defined based upon subjective
147363
judgement of how behaviors relate to one another, along with an understanding
148364
that all behaviors in a given suite may be required to function for a given
149-
cluster to pass validation for that suite.
365+
cluster to pass conformance for that suite.
150366

151367
Typical suites defined for any given feature will include:
152368
* API spec. This suite is generated from the API schema and represents
@@ -159,28 +375,30 @@ Typical suites defined for any given feature will include:
159375
and other features.
160376

161377
Each suite may be stored in a separate file in a directory for the specific
162-
area. For example, a "Pods" area would be structured as a `pods` directory with
378+
SIG. For example, a "sig-node" my have files such as:
163379
these files:
164380
* `api-generated.yaml` describing the set of behaviors auto-generated from the
165381
API specification.
166382
* `lifecycle.yaml` describing the set of behaviors expected from the Pod
167383
lifecycle.
384+
* `readiness-gates.yaml` describing the set of behaviors expected for Pod
385+
readiness gates functionality.
168386

169387
Behavior files are reviewed separately from the tests themselves, with separate
170388
OWNERs files corresponding to those tests. This may be captured in a directory
171389
structure such as:
172390

173391
```
174-
test/conformance
175-
├── behaviors
176-
│ ├── OWNERS # no-parent: true, approvers: behavior-approvers
177-
│ └── {area}
178-
│ ├── OWNERS # optional: reviewers: area-experts
179-
│ └── {suite}.yaml
180-
├── OWNERS # approvers: test-approvers
181-
└── tests.yaml # promotion updates this file; tests MUST map to a behavior
392+
test/conformance/behaviors
393+
│── OWNERS # no-parent: true, approvers: behavior-approvers
394+
│── {sig}
395+
│ ├── OWNERS # optional: reviewers: area-experts
396+
│ └── {suite}.yaml
182397
```
183398

399+
The relationship between tests and behaviors is captured in the conformance test
400+
metadata, which contains a list of behavior IDs covered by the test.
401+
184402
The structure of the behavior YAML files is described by these Go types:
185403

186404
```go

keps/sig-architecture/960-conformance-behaviors/kep.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ kep-number: 960
33
authors:
44
- "@johnbelamaric"
55
- "@hh"
6+
- "@spiffxp"
7+
- "@jefftree"
68
owning-sig: sig-architecture
79
participating-sigs:
810
- sig-testing
@@ -16,5 +18,5 @@ approvers:
1618
- "@smarterclayton"
1719
editor: TBD
1820
creation-date: 2019-04-12
19-
last-updated: 2020-03-24
21+
last-updated: 2020-04-03
2022
status: implementable

0 commit comments

Comments
 (0)