|
| 1 | +# Shipping alternate platforms in Kubernetes release artifacts |
| 2 | + |
| 3 | +The default Kubernetes platform is Linux/amd64. This platform is fully tested |
| 4 | +and build and release systems initially supported only that. A while ago we |
| 5 | +started an [effort to support multiple architectures][0]. As part of this |
| 6 | +effort, we added support in our build/release pipelines for the architectures |
| 7 | +arm, arm64, ppc64le and s390x on different operating systems like Linux, Windows |
| 8 | +and macOS. |
| 9 | + |
| 10 | +[0]: https://github.com/kubernetes/kubernetes/issues/38067 |
| 11 | + |
| 12 | +The main focus was to have binaries and container images to be available for |
| 13 | +these architectures/operating systems and for contributors that are interested |
| 14 | +to be able to take these artifacts and set up CI jobs to adequately test these |
| 15 | +platforms. Specifically to call out the ability to run conformance tests on |
| 16 | +these platforms. |
| 17 | + |
| 18 | +Target of this document is to provide a starting point for adding new platforms |
| 19 | +to Kubernetes from a SIG Architecture perspective. This does not include release |
| 20 | +mechanics or supportability in terms of functionality. |
| 21 | + |
| 22 | +# Step 1: crawling (Build) |
| 23 | + |
| 24 | +- docker based build infrastructure should support this architecture |
| 25 | + |
| 26 | +The above 2 implicitly require the following: |
| 27 | + |
| 28 | +- golang should support the architecture out-of-the-box. |
| 29 | +- All our dependencies, whether vendored or run separately, should support this |
| 30 | + platform out-of-the-box. |
| 31 | + |
| 32 | +In other words, anyone in the community should be able to use our build infra to |
| 33 | +generate all artifacts required to stand up Kubernetes. |
| 34 | + |
| 35 | +More information about how to build Kubernetes can be found in [the build |
| 36 | +documentation][1]. |
| 37 | + |
| 38 | +[1]: https://github.com/kubernetes/kubernetes/tree/3f7c09e/build#building-kubernetes |
| 39 | + |
| 40 | +# Step 2: walking (Test) |
| 41 | + |
| 42 | +It is not enough for builds to work as it gets bit-rotted quickly when we vendor |
| 43 | +in new changes, update versions of things we use etc. So we need a good set of |
| 44 | +tests that exercise a wide battery of jobs in this new architecture. |
| 45 | + |
| 46 | +A good starting point from a testing perspective are: |
| 47 | + |
| 48 | +- unit tests |
| 49 | +- e2e tests |
| 50 | +- node e2e tests |
| 51 | + |
| 52 | +This will ensure that community members can rely on these architectures on a |
| 53 | +consistent basis. This will give folks who are making changes a signal when they |
| 54 | +break things in a specific architecture. |
| 55 | + |
| 56 | +This implies a set of folks who stand up and maintain both post-submit and |
| 57 | +periodic tests, watch them closely and raise the flag when things break. They |
| 58 | +will also have to help debug and fix any platform specific issues as well. |
| 59 | + |
| 60 | +Creating custom testgrid dashboards can help to monitor platform specific tests. |
| 61 | + |
| 62 | +# Step 3: running (Release) |
| 63 | + |
| 64 | +So with the first 2 steps we have a reasonable expectation that there is a bunch |
| 65 | +of people taking care of an architecture and it mostly works ok ("works on my |
| 66 | +machine!"), if things break it gets fixed quickly. |
| 67 | + |
| 68 | +Getting to the next level is a big jump from here. We are talking about real |
| 69 | +users who are betting their business literally on the work we are doing here. So |
| 70 | +we need guarantees around "can we really ship this!?" question. |
| 71 | + |
| 72 | +Specifically we are talking about a set of CI jobs in our release-informing and |
| 73 | +release-blocking tabs of our testgrid. Kubernetes release team has a "CI signal" |
| 74 | +team that relies on the status(es) of these jobs to either ship or hold a |
| 75 | +release. Essentially, if things are mostly red with occasional green, it would |
| 76 | +be prudent to not even bother making this architecture as part of the release. |
| 77 | +CI jobs get added to release-informing first and when these get to a point where |
| 78 | +they work really well, then they get promoted to release-blocking. |
| 79 | + |
| 80 | +The problem here is once we start shipping something, users will start to rely |
| 81 | +on it, whether we like it or not. So it becomes a trust issue on this team that |
| 82 | +is talking care of a platform/architecture. Do we really trust this team not |
| 83 | +just for this release but on an ongoing basis. Do they show up consistently when |
| 84 | +things break, do they proactively work with testing/release on ongoing efforts |
| 85 | +and try to apply them to their architectures. It's very easy to setup a CI job |
| 86 | +as a one time thing, tick a box and advocate to get something added. It's a |
| 87 | +totally different ball game to be there consistently over time and show that you |
| 88 | +mean it. There has to be a consistent body of people working on this over time |
| 89 | +(life happens!). |
| 90 | + |
| 91 | +What are we looking for here, a strong green CI signal for release managers |
| 92 | +to cut a release and for folks to be able to report problems and them getting |
| 93 | +addressed. This includes [conformance testing][2] as use of the Kubernetes |
| 94 | +trademark is controlled through a conformance ensurance process. So we are |
| 95 | +looking for folks here to work with [the conformance sub project][3] in addition |
| 96 | +to testing and release. |
| 97 | + |
| 98 | +[2]: https://github.com/cncf/k8s-conformance |
| 99 | +[3]: http://bit.ly/sig-architecture-conformance |
| 100 | + |
| 101 | +# Step 4: profit! |
| 102 | + |
| 103 | +If you got this far, you really have made it! You have a clear engagement with |
| 104 | +the community, you are working seamlessly with all the relevant SIGs, you have |
| 105 | +your stuff in Kubernetes release and get end users to adopt your architecture. |
| 106 | +And having achieved conformance, you gain conditional use of the Kubernetes |
| 107 | +trademark relative to your offerings. |
| 108 | + |
| 109 | +# Rules of the game (Notes?) |
| 110 | + |
| 111 | +- We should keep it easy for folks to get into Step 1. |
| 112 | +- Step 1, by default things should not build and should be switched off. |
| 113 | +- Step 1, should not place undue burden on review or infrastructure (case in |
| 114 | + point - WINDOWS!). |
| 115 | +- Once Step 2 is done, we could consider switching things on by default (but |
| 116 | + still not in release artifacts). |
| 117 | +- Once Step 3 is done, binaries / images in arch can ship with release. |
| 118 | +- Step 2 is at least the default e2e-gce equivalent, PLUS the node e2e tests. |
| 119 | + More the better. |
| 120 | +- Step 2 will involve 3rd party reporting to test-grid at the least. |
| 121 | +- Step 2 may end up needing boskos etc to run against clouds (with these arches) |
| 122 | + where we have credits: |
| 123 | +- Step 3 is at least the conformance test suite. More the better. Using |
| 124 | + community tools like prow/kubeadm is encouraged but not mandated. |
| 125 | +- Step 4 is where we take this up to CNCF trademark program. |
| 126 | + for at least a year in Step 3 before we go to Step 4. |
| 127 | +- If at any stage things bit rot, we go back to a previous step, giving an |
| 128 | + opportunity for the community to step up. |
0 commit comments