Skip to content

Commit 35d3a8d

Browse files
authored
Merge pull request kubernetes#3079 from ii/MST-3000
KEP-3000: Image Promotion and Distribution Policy
2 parents c85afff + 4b14ce8 commit 35d3a8d

File tree

2 files changed

+159
-0
lines changed

2 files changed

+159
-0
lines changed
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# KEP 3000: Image Promotion and Distribution Policy
2+
3+
<!-- toc -->
4+
- [Summary](#summary)
5+
- [Why a new domain?](#why-a-new-domain)
6+
- [How can we help?](#how-can-we-help)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [What is not in scope](#what-is-not-in-scope)
10+
- [What are good goals to shoot for](#what-are-good-goals-to-shoot-for)
11+
- [Proposal](#proposal)
12+
- [What exactly are you doing?](#what-exactly-are-you-doing)
13+
- [registry.k8s.io request handling](#registryk8sio-request-handling)
14+
- [Notes/Constraints/Caveats](#notesconstraintscaveats)
15+
- [Risks and Mitigations](#risks-and-mitigations)
16+
- [Alternatives / Background](#alternatives--background)
17+
- [How much is this going to save us?](#how-much-is-this-going-to-save-us)
18+
<!-- /toc -->
19+
20+
## Summary
21+
22+
For a few years now, we have been using k8s.gcr.io in all our repositories as default repository for downloading images from.
23+
24+
The cost of distributing Kubernetes comes at great cost nearing $150kUSD/month (mostly egress) in donations.
25+
26+
Additionally some of our community members are unable to access the official release container images due to country level firewalls that do not them connect to Google services.
27+
28+
Ideally we can dramatically reduce cost and allow everyone in the world to download the container images released by our community.
29+
30+
We are now used to using the [image promoter process](https://github.com/kubernetes/enhancements/tree/master/keps/sig-release/1734-k8s-image-promoter) to promote images to the official kubernetes container registry using the infrastructure (GCR staging repos etc) provided by [sig-k8s-infra](https://github.com/kubernetes/k8s.io/tree/main/k8s.gcr.io)
31+
32+
## Why a new domain?
33+
34+
So far we (all kubernetes project) are using GCP as our default infrastructure provider for all things like GCS, GCR, GKE based prow clusters etc. Google has graciously sponsored a lot of our infrastructure costs as well. However for about a year or so we are finding that our costs are sky-rocketing because the community usage of this infrastructure has been from other cloud providers like AWS, Azure etc. So in conjunction with CNCF staff we are trying to put together a plan to host copies of images and binaries nearer to where they are used rather than incur cross-cloud costs.
35+
36+
One part of this plan is to setup a redirecting web service, that can identify where the traffic is coming from and redirect to the nearest image layer/repository. This is why we are setting up a new service using what we call an [oci-proxy](https://github.com/kubernetes-sigs/oci-proxy) for everyone to use. This redirector will identify traffic coming from, for example, a certain AWS region, then will setup a HTTP redirect to a source in that AWS region. If we get traffic from GKE/GCP or we don't know where the traffic is coming from, it will still redirect to the current infrastructure (k8s.gcr.io).
37+
38+
## How can we help?
39+
40+
When Kubernetes master opens up for v1.25 development, we need to update all default urls in our code and test harness to the new registry url. As a team sig-k8s-infra is signing up to ensure that this oci-proxy based registry.k8s.io will be as robust and available as the current setup. As a backup, we will continue to run the current k8s.gcr.io as well. So do not worry about that going away. Turning on traffic to the new url will help us monitor and fix things if/when they break and we will be able to tune traffic and lower our costs of operation.
41+
42+
### Goals
43+
44+
A policy and procedure for use by SIG Release to promote container images to multiple registries and mirrors.
45+
46+
A solution to allow redirection to appropriate mirrors to lower cost and allow access from any cloud or country globally.
47+
48+
### Non-Goals
49+
50+
Anything related to creation of artifacts, bom, staging buckets.
51+
52+
### What is not in scope
53+
54+
- Currently we focus on AWS only. We are getting a lot of help from AWS in terms of technical details as well as targeted infrastructure costs for standing up and running this infrastructure
55+
56+
### What are good goals to shoot for
57+
58+
- In terms of cost reduction, monitor GCP infrastructure and get to the point where we fully avoid serving large binary image layers from GCR/GCS
59+
- We can add other AWS regions and clouds as needed in well known documented way
60+
- Seamless transition for the community from the old k8s.gcr.io to registry.k8s.io with same rock solid stability as we now have with k8s.gcr.io
61+
62+
## Proposal
63+
64+
There are two intertwined concepts that are part of this proposal.
65+
66+
First, the policy and procedures to promote/upload our container images to multiple providers. Our existing processes upload only to GCS buckets. Ideally we extend the existing software/promotion process to push directly to multiple providers. Alternatively we use a second process to synchronize container images from our existing production buckets to similar constructs at other providers.
67+
68+
Additionally we require a registry and artifact url-redirection solution to the local cloud provider or country.
69+
70+
## What exactly are you doing?
71+
72+
- We are setting up an AWS account with an IAM role and s3 buckets in AWS regions where we see a large percentage of source image pull traffic
73+
- We will iterate on a sandbox url (registry-sandbox.k8s.io) for our experiments and ONLY promote things to (registry.k8s.io) when we have complete confidence
74+
- both registry and registry-sandbox are serving traffic using oci-proxy on google cloud run
75+
- oci-proxy will be updated to identify incoming traffic from AWS regions based on IP ranges so we can route traffic to s3 buckets in that region. If a specific AWS region do not currently host s3 buckets, we will redirect to the nearest region which does have s3 buckets (tradeoff between storage and network costs)
76+
- We will bulk sync existing image layers to these s3 layers as a starting point (from GCS/GCR)
77+
- We will update image-promoter to push to these s3 buckets as well in addition to the current setup
78+
- We will set up monitoring/reporting to check on new costs we incur on the AWS infrastructure and update what we do in GCP infrastructure as well to include the new components
79+
- We will have a plan in place on how we could add additional AWS regions in the future
80+
- We will have CI jobs that will run against registry-sandbox.k8s.io as well to monitor stability before we promote code to registry
81+
- We will automate the deployment/monitoring and testing of code landing in the oci-proxy repository
82+
83+
### registry.k8s.io request handling
84+
85+
Requests to [registry.k8s.io](https://registry.k8s.io) follows the following flow:
86+
87+
1. If it's a request for `/`: redirect to our wiki page about the project
88+
2. If it's not a request for `/` and does not start with `/v2/`: 404 error
89+
3. For registry API requests, all of which start with `/v2/`:
90+
91+
- If it's not a blob request: redirect to _Upstream Registry_
92+
- If it's not a known AWS IP: redirect to _Upstream Registry_
93+
- If it's a known AWS IP AND HEAD request for the layer succeeds in S3: redirect to S3
94+
- If it's a known AWS IP AND HEAD fails: redirect to _Upstream Registry_
95+
96+
Currently the _Upstream Registry_ is https://k8s.gcr.io.
97+
98+
### Notes/Constraints/Caveats
99+
100+
The primary purpose of the KEP is getting consensus on the agreed policy and procedure to unblock our community and move forward together.
101+
102+
There has been a lot of activity around the technology and tooling for both goals, but we need shared agreement on policy and procedure first.
103+
104+
### Risks and Mitigations
105+
106+
This is the primary pipeline for delivering Kubernetes worldwide. Ensuring the appropriate SLAs and support as well as artifact integrity is crucial.
107+
108+
## Alternatives / Background
109+
110+
- Original KEP
111+
- https://github.com/kubernetes/enhancements/tree/master/keps/sig-release/1734-k8s-image-promoter
112+
- Oras
113+
- https://github.com/oras-project/oras
114+
- KubeCon Talk
115+
- https://www.youtube.com/watch?v=F2IFjz7sr9Q
116+
- Apache has a widespread mirror network
117+
- @dims has experience here
118+
- http://ws.apache.org/mirrors.cgi
119+
- https://infra.apache.org/mirrors.html
120+
- [Umbrella issue: k8s.gcr.io => registry.k8s.io solution k/k8s.io#1834
121+
](https://github.com/kubernetes/k8s.io/issues/1834)
122+
- [ii/registry.k8s.io Implementation proposals](https://github.com/ii/registry.k8s.io#registryk8sio)
123+
- [ii.nz/blog :: Building a data pipeline for displaying Kubernetes public artifact traffic
124+
](https://ii.nz/post/building-a-data-pipline-for-displaying-kubernetes-public-artifact-traffic/)
125+
126+
### How much is this going to save us?
127+
128+
Cost of K8s Artifact hosting - Data Studio Graphs
129+
130+
![](https://i.imgur.com/LAn4UIE.png)
131+
132+
Analysis has been done on usage patterns related to providers. AWS participated in this process and have a keen interest to help drive down cost by providing artifacts directly to their clients consuming resources from the public registry.
133+
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
title: Artifact Distribution Policy
2+
kep-number: 3000
3+
authors:
4+
- "@hh"
5+
- "@BobyMCbobs"
6+
owning-sig: sig-release
7+
participating-sigs:
8+
- sig-k8s-infra
9+
status: provisional
10+
creation-date: 2021-11-26
11+
reviewers:
12+
- "@cpanato"
13+
- "@puerco"
14+
- "@spiffxp"
15+
- "@thockin"
16+
approvers:
17+
- "@ameukam"
18+
- "@dims"
19+
- "@justaugustus"
20+
- "@saschagrunert"
21+
stage: alpha
22+
latest-milestone: "v1.24"
23+
milestone:
24+
alpha: "v1.25"
25+
beta: "v1.26"
26+
stable: "v1.27"

0 commit comments

Comments
 (0)