Skip to content

Commit 7e31e74

Browse files
authored
Merge pull request #5300 from saschagrunert/platforms
Add guide for shipping alternative platforms
2 parents f181b1c + 4c14c6f commit 7e31e74

File tree

1 file changed

+128
-0
lines changed

1 file changed

+128
-0
lines changed
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Shipping alternate platforms in Kubernetes release artifacts
2+
3+
The default Kubernetes platform is Linux/amd64. This platform is fully tested
4+
and build and release systems initially supported only that. A while ago we
5+
started an [effort to support multiple architectures][0]. As part of this
6+
effort, we added support in our build/release pipelines for the architectures
7+
arm, arm64, ppc64le and s390x on different operating systems like Linux, Windows
8+
and macOS.
9+
10+
[0]: https://github.com/kubernetes/kubernetes/issues/38067
11+
12+
The main focus was to have binaries and container images to be available for
13+
these architectures/operating systems and for contributors that are interested
14+
to be able to take these artifacts and set up CI jobs to adequately test these
15+
platforms. Specifically to call out the ability to run conformance tests on
16+
these platforms.
17+
18+
Target of this document is to provide a starting point for adding new platforms
19+
to Kubernetes from a SIG Architecture perspective. This does not include release
20+
mechanics or supportability in terms of functionality.
21+
22+
# Step 1: crawling (Build)
23+
24+
- docker based build infrastructure should support this architecture
25+
26+
The above 2 implicitly require the following:
27+
28+
- golang should support the architecture out-of-the-box.
29+
- All our dependencies, whether vendored or run separately, should support this
30+
platform out-of-the-box.
31+
32+
In other words, anyone in the community should be able to use our build infra to
33+
generate all artifacts required to stand up Kubernetes.
34+
35+
More information about how to build Kubernetes can be found in [the build
36+
documentation][1].
37+
38+
[1]: https://github.com/kubernetes/kubernetes/tree/3f7c09e/build#building-kubernetes
39+
40+
# Step 2: walking (Test)
41+
42+
It is not enough for builds to work as it gets bit-rotted quickly when we vendor
43+
in new changes, update versions of things we use etc. So we need a good set of
44+
tests that exercise a wide battery of jobs in this new architecture.
45+
46+
A good starting point from a testing perspective are:
47+
48+
- unit tests
49+
- e2e tests
50+
- node e2e tests
51+
52+
This will ensure that community members can rely on these architectures on a
53+
consistent basis. This will give folks who are making changes a signal when they
54+
break things in a specific architecture.
55+
56+
This implies a set of folks who stand up and maintain both post-submit and
57+
periodic tests, watch them closely and raise the flag when things break. They
58+
will also have to help debug and fix any platform specific issues as well.
59+
60+
Creating custom testgrid dashboards can help to monitor platform specific tests.
61+
62+
# Step 3: running (Release)
63+
64+
So with the first 2 steps we have a reasonable expectation that there is a bunch
65+
of people taking care of an architecture and it mostly works ok ("works on my
66+
machine!"), if things break it gets fixed quickly.
67+
68+
Getting to the next level is a big jump from here. We are talking about real
69+
users who are betting their business literally on the work we are doing here. So
70+
we need guarantees around "can we really ship this!?" question.
71+
72+
Specifically we are talking about a set of CI jobs in our release-informing and
73+
release-blocking tabs of our testgrid. Kubernetes release team has a "CI signal"
74+
team that relies on the status(es) of these jobs to either ship or hold a
75+
release. Essentially, if things are mostly red with occasional green, it would
76+
be prudent to not even bother making this architecture as part of the release.
77+
CI jobs get added to release-informing first and when these get to a point where
78+
they work really well, then they get promoted to release-blocking.
79+
80+
The problem here is once we start shipping something, users will start to rely
81+
on it, whether we like it or not. So it becomes a trust issue on this team that
82+
is talking care of a platform/architecture. Do we really trust this team not
83+
just for this release but on an ongoing basis. Do they show up consistently when
84+
things break, do they proactively work with testing/release on ongoing efforts
85+
and try to apply them to their architectures. It's very easy to setup a CI job
86+
as a one time thing, tick a box and advocate to get something added. It's a
87+
totally different ball game to be there consistently over time and show that you
88+
mean it. There has to be a consistent body of people working on this over time
89+
(life happens!).
90+
91+
What are we looking for here, a strong green CI signal for release managers
92+
to cut a release and for folks to be able to report problems and them getting
93+
addressed. This includes [conformance testing][2] as use of the Kubernetes
94+
trademark is controlled through a conformance ensurance process. So we are
95+
looking for folks here to work with [the conformance sub project][3] in addition
96+
to testing and release.
97+
98+
[2]: https://github.com/cncf/k8s-conformance
99+
[3]: http://bit.ly/sig-architecture-conformance
100+
101+
# Step 4: profit!
102+
103+
If you got this far, you really have made it! You have a clear engagement with
104+
the community, you are working seamlessly with all the relevant SIGs, you have
105+
your stuff in Kubernetes release and get end users to adopt your architecture.
106+
And having achieved conformance, you gain conditional use of the Kubernetes
107+
trademark relative to your offerings.
108+
109+
# Rules of the game (Notes?)
110+
111+
- We should keep it easy for folks to get into Step 1.
112+
- Step 1, by default things should not build and should be switched off.
113+
- Step 1, should not place undue burden on review or infrastructure (case in
114+
point - WINDOWS!).
115+
- Once Step 2 is done, we could consider switching things on by default (but
116+
still not in release artifacts).
117+
- Once Step 3 is done, binaries / images in arch can ship with release.
118+
- Step 2 is at least the default e2e-gce equivalent, PLUS the node e2e tests.
119+
More the better.
120+
- Step 2 will involve 3rd party reporting to test-grid at the least.
121+
- Step 2 may end up needing boskos etc to run against clouds (with these arches)
122+
where we have credits:
123+
- Step 3 is at least the conformance test suite. More the better. Using
124+
community tools like prow/kubeadm is encouraged but not mandated.
125+
- Step 4 is where we take this up to CNCF trademark program.
126+
for at least a year in Step 3 before we go to Step 4.
127+
- If at any stage things bit rot, we go back to a previous step, giving an
128+
opportunity for the community to step up.

0 commit comments

Comments
 (0)