@@ -28,73 +28,74 @@ collaborate across teams to resolve test maintenance issues.
28
28
[ build-master-canary] ( https://testgrid.k8s.io/sig-release-master-informing#build-master-canary )
29
29
30
30
### Breaking PRs
31
- - [ Use buildx in favor of ` FROM --platform ` syntax
32
- ] ( https://github.com/kubernetes/kubernetes/pull/98529 )
31
+ - [ Use buildx in favor of ` FROM --platform `
32
+ syntax ] ( https://github.com/kubernetes/kubernetes/pull/98529 )
33
33
- [ Switch to ` docker buildx ` for conformance
34
34
image] ( https://github.com/kubernetes/kubernetes/pull/98569 )
35
35
36
36
## Investigation
37
37
38
38
1 . Desire to move from Google-owned infrastructure to Kubernetes community
39
- infrastructure. Thus the introduction of a ** canary** build job to test pushing
40
- building and pushing artifacts with new infrastructure.
39
+ infrastructure. Thus the introduction of a ** canary** build job to test
40
+ pushing building and pushing artifacts with new infrastructure.
41
41
1 . Desire to move off of ` bootstrap.py ` job (currently being used for canary
42
- job) to ` krel ` tooling.
42
+ job) to ` krel ` tooling.
43
43
1 . Separate job existed (` ci-kubernetes-build-no-bootstrap ` ) that was doing the
44
- same thing as the canary job, but with ` krel ` tooling.
44
+ same thing as the canary job, but with ` krel ` tooling.
45
45
1 . The ` no-bootstrap ` job was running smoothly, so [ updated to use it for the
46
- canary job] ( https://github.com/kubernetes/test-infra/pull/20663 ) .
46
+ canary job] ( https://github.com/kubernetes/test-infra/pull/20663 ) .
47
47
1 . Right before the update, we [ switched to using buildx for multi-arch
48
- images] ( https://github.com/kubernetes/kubernetes/pull/98529 ) .
48
+ images] ( https://github.com/kubernetes/kubernetes/pull/98529 ) .
49
49
1 . Job started failing, which showed up in [ some interesting
50
- ways] ( https://kubernetes.slack.com/archives/C09QZ4DQB/p1612269558032700 ) .
50
+ ways] ( https://kubernetes.slack.com/archives/C09QZ4DQB/p1612269558032700 ) .
51
51
1 . Triage begins! Issue
52
- [ opened] ( https://github.com/kubernetes/kubernetes/issues/98646 ) and release
53
- management team is pinged in Slack.
52
+ [ opened] ( https://github.com/kubernetes/kubernetes/issues/98646 ) and release
53
+ management team is pinged in Slack.
54
54
1 . The ` build-master `
55
- [ job] ( https://testgrid.k8s.io/sig-release-master-blocking#build-master ) was
56
- still passing though... interesting.
55
+ [ job] ( https://testgrid.k8s.io/sig-release-master-blocking#build-master ) was
56
+ still passing though... interesting.
57
57
1 . Both are eventually calling ` make release ` , so environment must be different.
58
58
1 . Let's look inside!
59
59
60
- ``` docker run -it --entrypoint /bin/bash
61
- gcr.io/k8s-testimages/bootstrap:v20210130-12516b2 ```
60
+ ```
61
+ docker run -it --entrypoint /bin/bash gcr.io/k8s-testimages/bootstrap:v20210130-12516b2
62
+ ```
62
63
63
- ``` docker run -it
64
- gcr.io/k8s-staging-releng/k8s-ci-builder:v20201128-v0.6.0-6-g6313f696-default
65
- /bin/bash ```
64
+ ```
65
+ docker run -it gcr.io/k8s-staging-releng/k8s-ci-builder:v20201128-v0.6.0-6-g6313f696-default /bin/bash
66
+ ```
66
67
67
68
1. A few directions we could go here:
68
69
1. Update the `k8s-ci-builder` image to you use newer version of Docker
69
70
1. Update the `k8s-ci-builder` image to ensure that
70
- `DOCKER_CLI_EXPERIMENTAL=enabled` is set
71
+ `DOCKER_CLI_EXPERIMENTAL=enabled` is set
71
72
1. Update the `release.sh` script to set `DOCKER_CLI_EXPERIMENTAL=enabled`
72
73
73
74
1. Making the `release.sh` script more flexible serves the community better
74
- because it allows for building with more environments. Would also be good to
75
- update the `k8s-ci-builder` image for this specific case as well.
75
+ because it allows for building with more environments. Would also be good to
76
+ update the `k8s-ci-builder` image for this specific case as well.
76
77
1. And we get a new
77
- [failure](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-build-canary/1356704759045689344/build-log.txt)!
78
+ [failure](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-build-canary/1356704759045689344/build-log.txt)!
78
79
1. Let's see what is going on in those images again...
79
80
1. Why would this cause an error in one but not the other if we have
80
- `DOCKER_CLI_EXPERIMENTAL=enabled`?
81
- ([this](https://github.com/docker/buildx/pull/403) is why)
81
+ `DOCKER_CLI_EXPERIMENTAL=enabled`?
82
+ ([this](https://github.com/docker/buildx/pull/403) is why)
82
83
1. In the mean time we went ahead and [re-enabled the bootstrap
83
- job](https://github.com/kubernetes/test-infra/pull/20712) (consumers of those
84
- images need them!)
84
+ job](https://github.com/kubernetes/test-infra/pull/20712) (consumers of those
85
+ images need them!)
85
86
1. Decided to [increase logging
86
- verbosity](https://github.com/kubernetes/kubernetes/pull/98568) on failures to
87
- see if that would give us a clue into what was going wrong (and to remove those
88
- annoying `quiet currently not implemented` warnings).
87
+ verbosity](https://github.com/kubernetes/kubernetes/pull/98568) on failures
88
+ to see if that would give us a clue into what was going wrong (and to remove
89
+ those annoying `quiet currently not implemented` warnings).
89
90
1. Job turns green! But how?
90
91
1. [Buildx](https://github.com/docker/buildx) is versioned separately than
91
- Docker itself. Turns out that the `--quiet` flag warning was [actually an
92
- error](https://github.com/docker/buildx/pull/403) until `v0.5.1` of Buildx.
92
+ Docker itself. Turns out that the `--quiet` flag warning was [actually an
93
+ error](https://github.com/docker/buildx/pull/403) until `v0.5.1` of Buildx.
93
94
1. The `build-master` job was running with buildx `v0.5.1` while the `krel` job
94
- was running with `v0.4.2`. This meant the quiet flag was causing an error in the
95
- `krel` job, and removing it alleviated the error.
95
+ was running with `v0.4.2`. This meant the quiet flag was causing an error in
96
+ the `krel` job, and removing it alleviated the error.
96
97
1. Finished up by once again [removing the `bootstrap`
97
- job](https://github.com/kubernetes/test-infra/pull/20731).
98
+ job](https://github.com/kubernetes/test-infra/pull/20731).
98
99
99
100
### Fixes
100
101
@@ -133,8 +134,8 @@ Brand new to the project?
133
134
Setup already and interested in maintaining tests?
134
135
- Check out [this video](https://www.youtube.com/watch?v=Ewp8LNY_qTg) from
135
136
Jordan Liggit who describes strategies and tactics to deflake flaking tests
136
- ([Jordan's show notes for that
137
- talk](https://gist.github.com/liggitt/6a3a2217fa5f846b52519acfc0ffece0))
137
+ ([Jordan's show notes for that
138
+ talk](https://gist.github.com/liggitt/6a3a2217fa5f846b52519acfc0ffece0))
138
139
139
140
Here's how the CI Signal Team actively monitors CI during a release cycle:
140
141
- [A Tour of CI on the Kubernetes
0 commit comments