|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: 'Using Go workspaces in Kubernetes' |
| 4 | +date: 2024-03-19T08:30:00-08:00 |
| 5 | +slug: go-workspaces-in-kubernetes |
| 6 | +canonicalUrl: https://www.kubernetes.dev/blog/2024/03/19/go-workspaces-in-kubernetes/ |
| 7 | +--- |
| 8 | + |
| 9 | +**Author:** Tim Hockin (Google) |
| 10 | + |
| 11 | +The [Go programming language](https://go.dev/) has played a huge role in the |
| 12 | +success of Kubernetes. As Kubernetes has grown, matured, and pushed the bounds |
| 13 | +of what "regular" projects do, the Go project team has also grown and evolved |
| 14 | +the language and tools. In recent releases, Go introduced a feature called |
| 15 | +"workspaces" which was aimed at making projects like Kubernetes easier to |
| 16 | +manage. |
| 17 | + |
| 18 | +We've just completed a major effort to adopt workspaces in Kubernetes, and the |
| 19 | +results are great. Our codebase is simpler and less error-prone, and we're no |
| 20 | +longer off on our own technology island. |
| 21 | + |
| 22 | +## GOPATH and Go modules |
| 23 | + |
| 24 | +Kubernetes is one of the most visible open source projects written in Go. The |
| 25 | +earliest versions of Kubernetes, dating back to 2014, were built with Go 1.3. |
| 26 | +Today, 10 years later, Go is up to version 1.22 — and let's just say that a |
| 27 | +_whole lot_ has changed. |
| 28 | + |
| 29 | +In 2014, Go development was entirely based on |
| 30 | +[`GOPATH`](https://go.dev/wiki/GOPATH). As a Go project, Kubernetes lived by the |
| 31 | +rules of `GOPATH`. In the buildup to Kubernetes 1.4 (mid 2016), we introduced a |
| 32 | +directory tree called `staging`. This allowed us to pretend to be multiple |
| 33 | +projects, but still exist within one git repository (which had advantages for |
| 34 | +development velocity). The magic of `GOPATH` allowed this to work. |
| 35 | + |
| 36 | +Kubernetes depends on several code-generation tools which have to find, read, |
| 37 | +and write Go code packages. Unsurprisingly, those tools grew to rely on |
| 38 | +`GOPATH`. This all worked pretty well until Go introduced modules in Go 1.11 |
| 39 | +(mid 2018). |
| 40 | + |
| 41 | +Modules were an answer to many issues around `GOPATH`. They gave more control to |
| 42 | +projects on how to track and manage dependencies, and were overall a great step |
| 43 | +forward. Kubernetes adopted them. However, modules had one major drawback — |
| 44 | +most Go tools could not work on multiple modules at once. This was a problem |
| 45 | +for our code-generation tools and scripts. |
| 46 | + |
| 47 | +Thankfully, Go offered a way to temporarily disable modules (`GO111MODULE` to |
| 48 | +the rescue). We could get the dependency tracking benefits of modules, but the |
| 49 | +flexibility of `GOPATH` for our tools. We even wrote helper tools to create fake |
| 50 | +`GOPATH` trees and played tricks with symlinks in our vendor directory (which |
| 51 | +holds a snapshot of our external dependencies), and we made it all work. |
| 52 | + |
| 53 | +And for the last 5 years it _has_ worked pretty well. That is, it worked well |
| 54 | +unless you looked too closely at what was happening. Woe be upon you if you |
| 55 | +had the misfortune to work on one of the code-generation tools, or the build |
| 56 | +system, or the ever-expanding suite of bespoke shell scripts we use to glue |
| 57 | +everything together. |
| 58 | + |
| 59 | +## The problems |
| 60 | + |
| 61 | +Like any large software project, we Kubernetes developers have all learned to |
| 62 | +deal with a certain amount of constant low-grade pain. Our custom `staging` |
| 63 | +mechanism let us bend the rules of Go; it was a little clunky, but when it |
| 64 | +worked (which was most of the time) it worked pretty well. When it failed, the |
| 65 | +errors were inscrutable and un-Googleable — nobody else was doing the silly |
| 66 | +things we were doing. Usually the fix was to re-run one or more of the `update-*` |
| 67 | +shell scripts in our aptly named `hack` directory. |
| 68 | + |
| 69 | +As time went on we drifted farther and farher from "regular" Go projects. At |
| 70 | +the same time, Kubernetes got more and more popular. For many people, |
| 71 | +Kubernetes was their first experience with Go, and it wasn't always a good |
| 72 | +experience. |
| 73 | + |
| 74 | +Our eccentricities also impacted people who consumed some of our code, such as |
| 75 | +our client library and the code-generation tools (which turned out to be useful |
| 76 | +in the growing ecosystem of custom resources). The tools only worked if you |
| 77 | +stored your code in a particular `GOPATH`-compatible directory structure, even |
| 78 | +though `GOPATH` had been replaced by modules more than four years prior. |
| 79 | + |
| 80 | +This state persisted because of the confluence of three factors: |
| 81 | +1. Most of the time it only hurt a little (punctuated with short moments of |
| 82 | + more acute pain). |
| 83 | +1. Kubernetes was still growing in popularity - we all had other, more urgent |
| 84 | + things to work on. |
| 85 | +1. The fix was not obvious, and whatever we came up with was going to be both |
| 86 | + hard and tedious. |
| 87 | + |
| 88 | +As a Kubernetes maintainer and long-timer, my fingerprints were all over the |
| 89 | +build system, the code-generation tools, and the `hack` scripts. While the pain |
| 90 | +of our mess may have been low _on_average_, I was one of the people who felt it |
| 91 | +regularly. |
| 92 | + |
| 93 | +## Enter workspaces |
| 94 | + |
| 95 | +Along the way, the Go language team saw what we (and others) were doing and |
| 96 | +didn't love it. They designed a new way of stitching multiple modules together |
| 97 | +into a new _workspace_ concept. Once enrolled in a workspace, Go tools had |
| 98 | +enough information to work in any directory structure and across modules, |
| 99 | +without `GOPATH` or symlinks or other dirty tricks. |
| 100 | + |
| 101 | +When I first saw this proposal I knew that this was the way out. This was how |
| 102 | +to break the logjam. If workspaces was the technical solution, then I would |
| 103 | +put in the work to make it happen. |
| 104 | + |
| 105 | +## The work |
| 106 | + |
| 107 | +Adopting workspaces was deceptively easy. I very quickly had the codebase |
| 108 | +compiling and running tests with workspaces enabled. I set out to purge the |
| 109 | +repository of anything `GOPATH` related. That's when I hit the first real bump - |
| 110 | +the code-generation tools. |
| 111 | + |
| 112 | +We had about a dozen tools, totalling several thousand lines of code. All of |
| 113 | +them were built using an internal framework called |
| 114 | +[gengo](https://github.com/kubernetes/gengo), which was built on Go's own |
| 115 | +parsing libraries. There were two main problems: |
| 116 | + |
| 117 | +1. Those parsing libraries didn't understand modules or workspaces. |
| 118 | +1. `GOPATH` allowed us to pretend that Go _package paths_ and directories on |
| 119 | + disk were interchangeable in trivial ways. They are not. |
| 120 | + |
| 121 | +Switching to a |
| 122 | +[modules- and workspaces-aware parsing](https://pkg.go.dev/golang.org/x/tools/go/packages) |
| 123 | +library was the first step. Then I had to make a long series of changes to |
| 124 | +each of the code-generation tools. Critically, I had to find a way to do it |
| 125 | +that was possible for some other person to review! I knew that I needed |
| 126 | +reviewers who could cover the breadth of changes and reviewers who could go |
| 127 | +into great depth on specific topics like gengo and Go's module semantics. |
| 128 | +Looking at the history for the areas I was touching, I asked Joe Betz and Alex |
| 129 | +Zielenski (SIG API Machinery) to go deep on gengo and code-generation, Jordan |
| 130 | +Liggitt (SIG Architecture and all-around wizard) to cover Go modules and |
| 131 | +vendoring and the `hack` scripts, and Antonio Ojea (wearing his SIG Testing |
| 132 | +hat) to make sure the whole thing made sense. We agreed that a series of small |
| 133 | +commits would be easiest to review, even if the codebase might not actually |
| 134 | +work at each commit. |
| 135 | + |
| 136 | +Sadly, these were not mechanical changes. I had to dig into each tool to |
| 137 | +figure out where they were processing disk paths versus where they were |
| 138 | +processing package names, and where those were being conflated. I made |
| 139 | +extensive use of the [delve](https://github.com/go-delve/delve) debugger, which |
| 140 | +I just can't say enough good things about. |
| 141 | + |
| 142 | +One unfortunate result of this work was that I had to break compatibility. The |
| 143 | +gengo library simply did not have enough information to process packages |
| 144 | +outside of GOPATH. After discussion with gengo and Kubernetes maintainers, we |
| 145 | +agreed to make [gengo/v2](https://github.com/kubernetes/gengo/tree/master/v2). |
| 146 | +I also used this as an opportunity to clean up some of the gengo APIs and the |
| 147 | +tools' CLIs to be more understandable and not conflate packages and |
| 148 | +directories. For example you can't just string-join directory names and |
| 149 | +assume the result is a valid package name. |
| 150 | + |
| 151 | +Once I had the code-generation tools converted, I shifted attention to the |
| 152 | +dozens of scripts in the `hack` directory. One by one I had to run them, debug, |
| 153 | +and fix failures. Some of them needed minor changes and some needed to be |
| 154 | +rewritten. |
| 155 | + |
| 156 | +Along the way we hit some cases that Go did not support, like workspace |
| 157 | +vendoring. Kubernetes depends on vendoring to ensure that our dependencies are |
| 158 | +always available, even if their source code is removed from the internet (it |
| 159 | +has happened more than once!). After discussing with the Go team, and looking |
| 160 | +at possible workarounds, they decided the right path was to |
| 161 | +[implement workspace vendoring](https://github.com/golang/go/issues/60056). |
| 162 | + |
| 163 | +The eventual Pull Request contained over 200 individual commits. |
| 164 | + |
| 165 | +## Results |
| 166 | + |
| 167 | +Now that this work has been merged, what does this mean for Kubernetes users? |
| 168 | +Pretty much nothing. No features were added or changed. This work was not |
| 169 | +about fixing bugs (and hopefully none were introduced). |
| 170 | + |
| 171 | +This work was mainly for the benefit of the Kubernetes project, to help and |
| 172 | +simplify the lives of the core maintainers. In fact, it would not be a lie to |
| 173 | +say that it was rather self-serving - my own life is a little bit better now. |
| 174 | + |
| 175 | +This effort, while unusually large, is just a tiny fraction of the overall |
| 176 | +maintenance work that needs to be done. Like any large project, we have lots of |
| 177 | +"technical debt" — tools that made point-in-time assumptions and need |
| 178 | +revisiting, internal APIs whose organization doesn't make sense, code which |
| 179 | +doesn't follow conventions which didn't exist at the time, and tests which |
| 180 | +aren't as rigorous as they could be, just to throw out a few examples. This |
| 181 | +work is often called "grungy" or "dirty", but in reality it's just an |
| 182 | +indication that the project has grown and evolved. I love this stuff, but |
| 183 | +there's far more than I can ever tackle on my own, which makes it an |
| 184 | +interesting way for people to get involved. As our unofficial motto goes: |
| 185 | +"chop wood and carry water". |
| 186 | + |
| 187 | +Kubernetes used to be a case-study of how _not_ to do large-scale Go |
| 188 | +development, but now our codebase is simpler (and in some cases faster!) and |
| 189 | +more consistent. Things that previously seemed like they _should_ work, but |
| 190 | +didn't, now behave as expected. |
| 191 | + |
| 192 | +Our project is now a little more "regular". Not completely so, but we're |
| 193 | +getting closer. |
| 194 | + |
| 195 | +## Thanks |
| 196 | + |
| 197 | +This effort would not have been possible without tons of support. |
| 198 | + |
| 199 | +First, thanks to the Go team for hearing our pain, taking feedback, and solving |
| 200 | +the problems for us. |
| 201 | + |
| 202 | +Special mega-thanks goes to Michael Matloob, on the Go team at Google, who |
| 203 | +designed and implemented workspaces. He guided me every step of the way, and |
| 204 | +was very generous with his time, answering all my questions, no matter how |
| 205 | +dumb. |
| 206 | + |
| 207 | +Writing code is just half of the work, so another special thanks to my |
| 208 | +reviewers: Jordan Liggitt, Joe Betz, Alexander Zielenski, and Antonio Ojea. |
| 209 | +These folks brought a wealth of expertise and attention to detail, and made |
| 210 | +this work smarter and safer. |
0 commit comments