Skip to content

Commit 4c14c6f

Browse files
committed
Add guide for shipping alternative platforms
This outline should help contributors to get an idea about what needs to be done to add additional supported platforms to Kubernetes. Signed-off-by: Sascha Grunert <[email protected]>
1 parent 46f114f commit 4c14c6f

File tree

1 file changed

+128
-0
lines changed

1 file changed

+128
-0
lines changed
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Shipping alternate platforms in Kubernetes release artifacts
2+
3+
The default Kubernetes platform is Linux/amd64. This platform is fully tested
4+
and build and release systems initially supported only that. A while ago we
5+
started an [effort to support multiple architectures][0]. As part of this
6+
effort, we added support in our build/release pipelines for the architectures
7+
arm, arm64, ppc64le and s390x on different operating systems like Linux, Windows
8+
and macOS.
9+
10+
[0]: https://github.com/kubernetes/kubernetes/issues/38067
11+
12+
The main focus was to have binaries and container images to be available for
13+
these architectures/operating systems and for contributors that are interested
14+
to be able to take these artifacts and set up CI jobs to adequately test these
15+
platforms. Specifically to call out the ability to run conformance tests on
16+
these platforms.
17+
18+
Target of this document is to provide a starting point for adding new platforms
19+
to Kubernetes from a SIG Architecture perspective. This does not include release
20+
mechanics or supportability in terms of functionality.
21+
22+
# Step 1: crawling (Build)
23+
24+
- docker based build infrastructure should support this architecture
25+
26+
The above 2 implicitly require the following:
27+
28+
- golang should support the architecture out-of-the-box.
29+
- All our dependencies, whether vendored or run separately, should support this
30+
platform out-of-the-box.
31+
32+
In other words, anyone in the community should be able to use our build infra to
33+
generate all artifacts required to stand up Kubernetes.
34+
35+
More information about how to build Kubernetes can be found in [the build
36+
documentation][1].
37+
38+
[1]: https://github.com/kubernetes/kubernetes/tree/3f7c09e/build#building-kubernetes
39+
40+
# Step 2: walking (Test)
41+
42+
It is not enough for builds to work as it gets bit-rotted quickly when we vendor
43+
in new changes, update versions of things we use etc. So we need a good set of
44+
tests that exercise a wide battery of jobs in this new architecture.
45+
46+
A good starting point from a testing perspective are:
47+
48+
- unit tests
49+
- e2e tests
50+
- node e2e tests
51+
52+
This will ensure that community members can rely on these architectures on a
53+
consistent basis. This will give folks who are making changes a signal when they
54+
break things in a specific architecture.
55+
56+
This implies a set of folks who stand up and maintain both post-submit and
57+
periodic tests, watch them closely and raise the flag when things break. They
58+
will also have to help debug and fix any platform specific issues as well.
59+
60+
Creating custom testgrid dashboards can help to monitor platform specific tests.
61+
62+
# Step 3: running (Release)
63+
64+
So with the first 2 steps we have a reasonable expectation that there is a bunch
65+
of people taking care of an architecture and it mostly works ok ("works on my
66+
machine!"), if things break it gets fixed quickly.
67+
68+
Getting to the next level is a big jump from here. We are talking about real
69+
users who are betting their business literally on the work we are doing here. So
70+
we need guarantees around "can we really ship this!?" question.
71+
72+
Specifically we are talking about a set of CI jobs in our release-informing and
73+
release-blocking tabs of our testgrid. Kubernetes release team has a "CI signal"
74+
team that relies on the status(es) of these jobs to either ship or hold a
75+
release. Essentially, if things are mostly red with occasional green, it would
76+
be prudent to not even bother making this architecture as part of the release.
77+
CI jobs get added to release-informing first and when these get to a point where
78+
they work really well, then they get promoted to release-blocking.
79+
80+
The problem here is once we start shipping something, users will start to rely
81+
on it, whether we like it or not. So it becomes a trust issue on this team that
82+
is talking care of a platform/architecture. Do we really trust this team not
83+
just for this release but on an ongoing basis. Do they show up consistently when
84+
things break, do they proactively work with testing/release on ongoing efforts
85+
and try to apply them to their architectures. It's very easy to setup a CI job
86+
as a one time thing, tick a box and advocate to get something added. It's a
87+
totally different ball game to be there consistently over time and show that you
88+
mean it. There has to be a consistent body of people working on this over time
89+
(life happens!).
90+
91+
What are we looking for here, a strong green CI signal for release managers
92+
to cut a release and for folks to be able to report problems and them getting
93+
addressed. This includes [conformance testing][2] as use of the Kubernetes
94+
trademark is controlled through a conformance ensurance process. So we are
95+
looking for folks here to work with [the conformance sub project][3] in addition
96+
to testing and release.
97+
98+
[2]: https://github.com/cncf/k8s-conformance
99+
[3]: http://bit.ly/sig-architecture-conformance
100+
101+
# Step 4: profit!
102+
103+
If you got this far, you really have made it! You have a clear engagement with
104+
the community, you are working seamlessly with all the relevant SIGs, you have
105+
your stuff in Kubernetes release and get end users to adopt your architecture.
106+
And having achieved conformance, you gain conditional use of the Kubernetes
107+
trademark relative to your offerings.
108+
109+
# Rules of the game (Notes?)
110+
111+
- We should keep it easy for folks to get into Step 1.
112+
- Step 1, by default things should not build and should be switched off.
113+
- Step 1, should not place undue burden on review or infrastructure (case in
114+
point - WINDOWS!).
115+
- Once Step 2 is done, we could consider switching things on by default (but
116+
still not in release artifacts).
117+
- Once Step 3 is done, binaries / images in arch can ship with release.
118+
- Step 2 is at least the default e2e-gce equivalent, PLUS the node e2e tests.
119+
More the better.
120+
- Step 2 will involve 3rd party reporting to test-grid at the least.
121+
- Step 2 may end up needing boskos etc to run against clouds (with these arches)
122+
where we have credits:
123+
- Step 3 is at least the conformance test suite. More the better. Using
124+
community tools like prow/kubeadm is encouraged but not mandated.
125+
- Step 4 is where we take this up to CNCF trademark program.
126+
for at least a year in Step 3 before we go to Step 4.
127+
- If at any stage things bit rot, we go back to a previous step, giving an
128+
opportunity for the community to step up.

0 commit comments

Comments
 (0)