Skip to content

Commit 0bff2ec

Browse files
authored
Merge pull request #5359 from nikhita/wg-k8s-infra-annual-report
wg-k8s-infra: add 2020 annual report
2 parents ad91093 + 8ed27de commit 0bff2ec

File tree

1 file changed

+199
-0
lines changed

1 file changed

+199
-0
lines changed

wg-k8s-infra/2020-annual-report.md

Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
# 2020 WG K8s Infra Annual Report
2+
3+
## You and Your Role
4+
5+
**When did you become a chair and do you enjoy the role?**
6+
7+
- **bartsmykla**: February 2020 and I enjoy the role
8+
- **dims**: along with spiffxp, been there right from the beginning. Lately
9+
having some conflicts on the meeting time, but definitely enjoying the process
10+
- **spiffxp**: Was a chair (organizer?) at group’s formation. I enjoy the role
11+
when I have time to dedicate to it.
12+
13+
**What do you find challenging?**
14+
15+
- **bartsmykla**: As our working group’s efforts are related to multiple SIGs,
16+
and there is multiple places, tools, repositories which are needed to move
17+
some things forward I sometimes feel overwhelmed and anxiety about not
18+
understanding some of the tools (Prow for example), what is also hard is I
19+
don’t feel I have enough access and knowledge to speed up and move things
20+
fasters in relation to Prow migration.
21+
- **dims**: takes too long :) finding/building coalition is hard. Trying hard to
22+
avoid doing everything by a small set of folks, but not doing too good on that
23+
front.
24+
- **spiffxp**: Prioritizing this group’s work, and the work necessary to tend to
25+
this group’s garden (by that I mean weeding/planting workstreams,
26+
building/smoothing onramps). Work that usually takes precedence is related to
27+
company-internal priorities, SIG Testing and kubernetes/kubernetes fires. I
28+
often find myself unprepared for meetings unless I have been actively pushing
29+
a specific item in the interim. Very rarely am I sufficiently aware of the
30+
group’s activity as a whole to effectively drive.
31+
32+
**Do you have goals for the group?**
33+
34+
- **bartsmykla**: My goal would be to improve documentation for people to be
35+
easier to understand what and where is happening and which tools and resources
36+
are being used for which efforts
37+
- **dims**: breaking things up into small chunks that can be easily farmed out
38+
- **spiffxp**: The thing I care most about is community ownership of
39+
prow.k8s.io, including on-call. If possible, I would like to see the group’s
40+
mission through to completion, ensuring that all project infrastructure is
41+
community-owned and maintained.
42+
43+
**Do you want to continue or find a replacement? If you feel that you aren’t
44+
ready to pass the baton, what would you like to accomplish before you do?**
45+
46+
- **bartsmykla**: I would like to continue
47+
- **dims**: Happy to if folks show up who can take over. Always on the look out.
48+
- **spiffxp**: I personally want to continue, but sometimes wonder if my
49+
best-effort availability is doing the group a disservice. If there’s a
50+
replacement and I’m the impediment, I’m happy to step down. The ideal
51+
replacement or a dedicated TL would have:
52+
- ability to craft and build consensus on operational policies, lead implementation
53+
- ability to identify cost hotspots, lead or implement cost-reduction solutions
54+
- ability to identify security vulnerabilities or operational sharp edges
55+
(e.g. no backups, easy accidents), lead or implement mitigations
56+
- familiarity with GCP and Kubernetes
57+
- ability to document/understand how existing project infra is wired together
58+
(e.g. could fix https://github.com/kubernetes/test-infra/issues/13063 )
59+
60+
**Is there something we can provide that would better support you?**
61+
62+
- **bartsmykla**: I can’t think about anything right now
63+
- **dims**: what spiffxp said!
64+
- **spiffxp**: TBH I feel like a lot of what we need to make this group as
65+
active/healthy as I would like needs to come from us. For example I don’t
66+
think a dedicated PM would help without a dedicated TL. I’m not sure how to
67+
more effectively motivate our contributing companies to prioritize this work.
68+
I have pined in the past for dedicated contractors paid by the CNCF for this,
69+
but I think that could just as easily be fulfilled by contributing companies
70+
agreeing to staff this.
71+
72+
**Do you have feedback for Steering? Suggestions for what we should work on?**
73+
74+
- **bartsmykla**: I can’t think about anything right now
75+
- **dims**: yep, talking to CNCF proactively and formulating a plan.
76+
- **spiffxp**: I think there are three things Steering could help with:
77+
- Policy guidance from Steering on what is in-scope / out-of-scope for
78+
Kubernetes’ project-infrastructure budget (e.g. mirroring
79+
dependency/ecosystem projects like cert-manager [1], ci jobs). It might
80+
better drive billing requirements, and make it easier/quicker to decide what
81+
is appropriate to pursue. At the moment we’re using our best judgement, and
82+
I trust it, but I sometimes feel like we’re flying blind or making stuff up.
83+
As far as existing spend and budgeting, we don’t have
84+
quotas/forecasts/alerts; we’re mostly hoping everyone is on their best
85+
behavior until something seems outsized, at which point it’s case-by-case on
86+
what to do.
87+
- I think it would be helpful to get spend on platforms other than Google
88+
above-the-table, and driven through this group. I know how much money Google
89+
has provided, and I know where it’s being spent (though not to the
90+
granularity of per-SIG). I lack the equivalent for other companies “helping
91+
out” (e.g. AWS, Microsoft, DigitalOcean)
92+
- This is not a concrete request that can be acted upon now, but I anticipate
93+
we will want to reduce costs by ensuring that other clouds or large entities
94+
participate in mirroring Kubernetes artifacts.
95+
96+
## Working Group
97+
98+
**What was the initial mission of the group and if it's changed, how?**
99+
100+
Initial mission was to migrate Kubernetes project infrastructure to the CNCF,
101+
creation of teams and processes to support ongoing maintenance.
102+
103+
There has been a slight growth in scope in that new infrastructure that
104+
previously didn't exist is proposed and managed under this group. Examples
105+
include:
106+
- binary-artifact-promotion (project only had image-promotion internally, now
107+
externally, now attempting to expand to binary artifacts)
108+
- [running triage-party for SIG Release](https://github.com/kubernetes/k8s.io/issues/906)
109+
(didn't exist until this year)
110+
- [build infrastructure for windows-based images](https://docs.google.com/document/d/16VBfsFMynA7tObzuZGPpw-sKDKfFc_T5W_E4IeEIaOQ/edit#bookmark=id.3w0g7fo9cp7m)
111+
- [image vulnerability dashboard](https://docs.google.com/document/d/16VBfsFMynA7tObzuZGPpw-sKDKfFc_T5W_E4IeEIaOQ/edit#bookmark=id.s3by3vki8jer)
112+
(it's not clear to me whether even google had this internally before)
113+
- [sharding out / scaling up gitops-based Google Group management](https://docs.google.com/document/d/16VBfsFMynA7tObzuZGPpw-sKDKfFc_T5W_E4IeEIaOQ/edit#bookmark=id.ou5hk544r70m)
114+
115+
**What’s the current roadmap until completion?**
116+
117+
What has been migrated:
118+
- DNS for kubernetes.io, k8s.io
119+
- Container images hosted on k8s.gcr.io
120+
- node-perf-dash.k8s.io
121+
- perf-dash.k8s.io
122+
- publishing-bot
123+
- slack-infra
124+
- 288 / 1780 prow jobs
125+
- GCB projects used to create kubernetes/kubernetes releases
126+
(exception .deb/.rpm packages)
127+
128+
What remains (TODO: we need to update our issues to reflect this)
129+
- migrate .deb/.rpm package building/hosting to community
130+
(this would be owned by SIG Release)
131+
- stop using google-internal tool "rapture"
132+
- come up with signing keys community agrees to host/trust
133+
- migrate apt.kubernetes.io to community
134+
- stop using google-containers GCP project (this would be owned by SIG Release)
135+
- gs://kubernetes-release, dl.k8s.io
136+
- [gs://kubernetes-release-dev](https://github.com/kubernetes/k8s.io/issues/846)
137+
- stop using k8s-prow GCP project (this would be owned by SIG Testing)
138+
- Prow.k8s.io
139+
- Ensure community-staffed on-call can support
140+
- stop using k8s-prow-build GCP project (this would be owned by SIG Testing)
141+
- 288/1780 jobs migrated out thus far
142+
- Ensure community-staffed on-call can support
143+
- [stop using k8s-gubernator GCP project](https://github.com/kubernetes/k8s.io/issues/1308)
144+
(this would be owned by SIG Testing)
145+
- migrate/replace gubernator.k8s.io/pr (triage-party?), drop gubernator.k8s.io
146+
- [migrate kette](https://github.com/kubernetes/k8s.io/issues/787)
147+
- [migrate k8s-gubernator:builds dataset](https://github.com/kubernetes/k8s.io/issues/1307)
148+
- [migrate triage.k8s.io](https://github.com/kubernetes/k8s.io/issues/1305)
149+
- [migrate gs://k8s-metrics](https://github.com/kubernetes/k8s.io/issues/1306)
150+
- stop using kubernetes-jenkins GCP project (this would be owned by SIG Testing)
151+
- gs://kubernetes-jenkins (all CI artifacts/logs for prow.k8s.io jobs)
152+
- sundry other GCS buckets (gs://k8s-kops-gce, gs://kubernetes-staging*)
153+
- [stop using k8s-federated-conformance GCP project](https://github.com/kubernetes/k8s.io/issues/1311)
154+
(this would be owned by SIG Testing)
155+
- Migrate to CNCF-owned k8s-conform (rename/copy sundry GCS buckets, distribute new service account keys)
156+
- [stop using k8s-testimages GCP project](https://github.com/kubernetes/k8s.io/issues/1312)
157+
(this could be owned either by SIG Testing or SIG Release)
158+
- Migrate images used by CI jobs (kubekins, bazel-krte, gcloud, etc.)
159+
- Migrate test-infra components (kettle, greenhouse, etc.)
160+
- (This may push us toward [limited/lifecycle-based retention of images, which
161+
GCR does not natively have](https://github.com/kubernetes/k8s.io/issues/525)?)
162+
- stop using kubernetes-site GCP project (unsure, maybe SIG ContribEx or SIG Docs depending)
163+
- ???
164+
- Ensure SIG ownership of all infra and services
165+
- Must be supportable by non-google community members
166+
- Ensure critical contributor user journeys are well documented for each service
167+
168+
**Have you produced any artifacts, reports, white papers to date?**
169+
170+
We provide a [publicly viewable billing report](https://datastudio.google.com/u/0/reporting/14UWSuqD5ef9E4LnsCD9uJWTPv8MHOA3e)
171+
accessible to members of [email protected].
172+
The project was given $3M/yr for 3 years, and our third year started ~August 2020.
173+
Our spend over the past 28 days has been ~$109K, which works out to ~$1.42M/yr.
174+
A very rough breakdown of the $109k:
175+
- $74k - k8s-artifacts-prod* (~ k8s.gcr.io)
176+
- $34k - k8s-infra-prow*, k8s-infra-e2e*, k8s-staging* (~ project CI thus far, follows kubernetes/kubernetes traffic)
177+
- $0.7k - kubernetes-public (~ everything else)
178+
179+
**Is everything in your readme accurate? posting meetings on youtube?**
180+
181+
Our community
182+
[readme](https://github.com/kubernetes/community/tree/master/wg-k8s-infra) is
183+
accurate if sparse. The
184+
[readme](https://github.com/kubernetes/k8s.io/blob/master/README.md) in k8s.io,
185+
which houses most of the actual infrastructure, is terse and slightly out of
186+
date (missing triage party)
187+
188+
[We are having problems with our zoom automation](https://github.com/kubernetes/community/issues/5199),
189+
causing [our youtube playlist](https://www.youtube.com/playlist?list=PL69nYSiGNLP2Ghq7VW8rFbMFoHwvORuDL)
190+
to fall out of date; I noticed while writing this report and have gotten help
191+
backfilling. We're currently missing 2020-10-14.
192+
193+
**Do you have regular check-ins with your sponsoring SIGs?**
194+
195+
No formal reporting in either direction. Meetings/slack/issues see active
196+
participation from @spiffxp (SIG Testing chair), and occasional participation
197+
from @justaugustus (SIG Release) and @nikhita (SIG Contributor Experience). We
198+
also see participation on slack/issues/PRs from @dims (SIG Architecture) who has
199+
a schedule conflict.

0 commit comments

Comments
 (0)