Skip to content

Commit a392225

Browse files
authored
Merge pull request #2414 from minrk/ovh-terraform
new OVH cluster
2 parents 3189394 + 03e64d4 commit a392225

File tree

11 files changed

+489
-7
lines changed

11 files changed

+489
-7
lines changed

.github/workflows/cd.yml

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,14 @@ jobs:
234234
helm_version: ""
235235
experimental: false
236236

237+
- federation_member: ovh2
238+
binder_url: https://ovh2.mybinder.org
239+
hub_url: https://hub.ovh2.mybinder.org
240+
# image-prefix should match ovh registry config in secrets/config/ovh.yaml
241+
chartpress_args: "--push --image-prefix=2lmrrh8f.gra7.container-registry.ovh.net/mybinder-chart/mybinder-"
242+
helm_version: ""
243+
experimental: false
244+
237245
steps:
238246
- name: "Stage 0: Update env vars based on job matrix arguments"
239247
run: |
@@ -288,14 +296,23 @@ jobs:
288296
GIT_CRYPT_KEY: ${{ secrets.GIT_CRYPT_KEY }}
289297

290298
# Action Repo: https://github.com/Azure/docker-login
291-
- name: "Stage 3: Login to Docker regstry (OVH)"
299+
- name: "Stage 3: Login to Docker registry (OVH)"
292300
if: matrix.federation_member == 'ovh'
293301
uses: azure/docker-login@v1
294302
with:
295303
login-server: 3i2li627.gra7.container-registry.ovh.net
296304
username: ${{ secrets.DOCKER_USERNAME_OVH }}
297305
password: ${{ secrets.DOCKER_PASSWORD_OVH }}
298306

307+
- name: "Stage 3: Login to Docker registry (OVH2)"
308+
if: matrix.federation_member == 'ovh2'
309+
uses: azure/docker-login@v1
310+
with:
311+
login-server: 2lmrrh8f.gra7.container-registry.ovh.net
312+
username: ${{ secrets.DOCKER_USERNAME_OVH2 }}
313+
# terraform output registry_chartpress_token
314+
password: ${{ secrets.DOCKER_PASSWORD_OVH2 }}
315+
299316
- name: "Stage 3: Run chartpress to update values.yaml"
300317
run: |
301318
chartpress ${{ matrix.chartpress_args || '--skip-build' }}

.github/workflows/test-helm-template.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ jobs:
4747
k3s-channel: "v1.21"
4848
- release: ovh
4949
k3s-channel: "v1.20"
50+
- release: ovh2
51+
k3s-channel: "v1.24"
5052
- release: turing
5153
k3s-channel: "v1.21"
5254

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,4 @@ travis/crypt-key
1919
env
2020

2121
.terraform
22+
.terraform.lock.hcl

config/ovh2.yaml

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
projectName: ovh2
2+
3+
userNodeSelector: &userNodeSelector
4+
mybinder.org/pool-type: users
5+
coreNodeSelector: &coreNodeSelector
6+
mybinder.org/pool-type: core
7+
8+
binderhub:
9+
config:
10+
BinderHub:
11+
pod_quota: 10
12+
hub_url: https://hub.ovh2.mybinder.org
13+
badge_base_url: https://mybinder.org
14+
build_node_selector: *userNodeSelector
15+
sticky_builds: true
16+
image_prefix: 2lmrrh8f.gra7.container-registry.ovh.net/mybinder-builds/r2d-g5b5b759
17+
DockerRegistry:
18+
# Docker Registry uses harbor
19+
# ref: https://github.com/goharbor/harbor/wiki/Harbor-FAQs#api
20+
token_url: "https://2lmrrh8f.gra7.container-registry.ovh.net/service/token?service=harbor-registry"
21+
22+
replicas: 1
23+
nodeSelector: *coreNodeSelector
24+
25+
extraVolumes:
26+
- name: secrets
27+
secret:
28+
secretName: events-archiver-secrets
29+
extraVolumeMounts:
30+
- name: secrets
31+
mountPath: /secrets
32+
readOnly: true
33+
extraEnv:
34+
GOOGLE_APPLICATION_CREDENTIALS: /secrets/service-account.json
35+
36+
ingress:
37+
hosts:
38+
- ovh2.mybinder.org
39+
40+
jupyterhub:
41+
singleuser:
42+
nodeSelector: *userNodeSelector
43+
hub:
44+
nodeSelector: *coreNodeSelector
45+
46+
proxy:
47+
chp:
48+
nodeSelector: *coreNodeSelector
49+
resources:
50+
requests:
51+
cpu: "1"
52+
limits:
53+
cpu: "1"
54+
ingress:
55+
hosts:
56+
- hub.ovh2.mybinder.org
57+
tls:
58+
- secretName: kubelego-tls-hub
59+
hosts:
60+
- hub.ovh2.mybinder.org
61+
scheduling:
62+
userPlaceholder:
63+
replicas: 5
64+
userScheduler:
65+
nodeSelector: *coreNodeSelector
66+
67+
imageCleaner:
68+
# Use 40GB as upper limit, size is given in bytes
69+
imageGCThresholdHigh: 40e9
70+
imageGCThresholdLow: 30e9
71+
imageGCThresholdType: "absolute"
72+
73+
cryptnono:
74+
enabled: false
75+
76+
grafana:
77+
nodeSelector: *coreNodeSelector
78+
ingress:
79+
hosts:
80+
- grafana.ovh2.mybinder.org
81+
tls:
82+
- hosts:
83+
- grafana.ovh2.mybinder.org
84+
secretName: kubelego-tls-grafana
85+
datasources:
86+
datasources.yaml:
87+
apiVersion: 1
88+
datasources:
89+
- name: prometheus
90+
orgId: 1
91+
type: prometheus
92+
url: https://prometheus.ovh2.mybinder.org
93+
access: direct
94+
isDefault: true
95+
editable: false
96+
persistence:
97+
storageClassName: csi-cinder-high-speed
98+
99+
prometheus:
100+
server:
101+
nodeSelector: *coreNodeSelector
102+
persistentVolume:
103+
size: 50Gi
104+
retention: 30d
105+
ingress:
106+
hosts:
107+
- prometheus.ovh2.mybinder.org
108+
tls:
109+
- hosts:
110+
- prometheus.ovh2.mybinder.org
111+
secretName: kubelego-tls-prometheus
112+
113+
ingress-nginx:
114+
controller:
115+
scope:
116+
enabled: true
117+
service:
118+
loadBalancerIP: 162.19.17.37
119+
120+
static:
121+
ingress:
122+
hosts:
123+
- static.ovh2.mybinder.org
124+
tls:
125+
secretName: kubelego-tls-static

deploy.py

Lines changed: 35 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ def setup_auth_ovh(release, cluster):
7676
"""
7777
print(f"Setup the OVH authentication for namespace {release}")
7878

79-
ovh_kubeconfig = os.path.join(ABSOLUTE_HERE, "secrets", "ovh-kubeconfig.yml")
79+
ovh_kubeconfig = os.path.join(ABSOLUTE_HERE, "secrets", f"{release}-kubeconfig.yml")
8080
os.environ["KUBECONFIG"] = ovh_kubeconfig
8181
print(f"Current KUBECONFIG='{ovh_kubeconfig}'")
8282
stdout = subprocess.check_output(["kubectl", "config", "use-context", cluster])
@@ -124,7 +124,7 @@ def update_networkbans(cluster):
124124
# some members have special logic in ban.py,
125125
# in which case they must be specified on the command-line
126126
ban_command = [sys.executable, "secrets/ban.py"]
127-
if cluster in {"turing-prod", "turing-staging", "turing", "ovh"}:
127+
if cluster in {"turing-prod", "turing-staging", "turing", "ovh", "ovh2"}:
128128
ban_command.append(cluster)
129129

130130
subprocess.check_call(ban_command)
@@ -245,13 +245,43 @@ def setup_certmanager():
245245
subprocess.check_call(helm_upgrade)
246246

247247

248+
def patch_coredns():
249+
"""Patch coredns resource allocation
250+
251+
OVH2 coredns does not have sufficient memory by default after our ban patches
252+
"""
253+
print(BOLD + GREEN + "Patching coredns resources" + NC, flush=True)
254+
subprocess.check_call(
255+
[
256+
"kubectl",
257+
"set",
258+
"resources",
259+
"-n",
260+
"kube-system",
261+
"deployments/coredns",
262+
"--limits",
263+
"memory=250Mi",
264+
"--requests",
265+
"memory=200Mi",
266+
]
267+
)
268+
269+
248270
def main():
249271
# parse command line args
250272
argparser = argparse.ArgumentParser()
251273
argparser.add_argument(
252274
"release",
253275
help="Release to deploy",
254-
choices=["staging", "prod", "ovh", "turing-prod", "turing-staging", "turing"],
276+
choices=[
277+
"staging",
278+
"prod",
279+
"ovh",
280+
"ovh2",
281+
"turing-prod",
282+
"turing-staging",
283+
"turing",
284+
],
255285
)
256286
argparser.add_argument(
257287
"--name",
@@ -302,8 +332,9 @@ def main():
302332

303333
# script is running on CI, proceed with auth and helm setup
304334

305-
if cluster == "ovh":
335+
if cluster.startswith("ovh"):
306336
setup_auth_ovh(args.release, cluster)
337+
patch_coredns()
307338
elif cluster in AZURE_RGs:
308339
setup_auth_turing(cluster)
309340
elif cluster in GCP_PROJECTS:

secrets/ban.py

8 Bytes
Binary file not shown.

secrets/config/ovh2.yaml

4.47 KB
Binary file not shown.

secrets/ovh2-kubeconfig.yml

9.46 KB
Binary file not shown.

terraform/README.md

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Terraform deployment info
22

3-
Common configuration is in terraform/modules/mybinder
3+
Common configuration for GKE is in terraform/modules/mybinder
44

55
most deployed things are in mybinder/resource.tf
66
variables (mostly things that should differ in staging/prod) in mybinder/variables.tf
@@ -49,11 +49,27 @@ terraform output -json private_keys | jq '.["events-archiver"]' | pbcopy
4949

5050
with key names: "events-archiver", "matomo", and "binderhub-builder" and paste them into the appropriate fields in `secrets/config/$deployment.yaml`.
5151

52-
### Notes
52+
## Notes
5353

5454
- requesting previously-allocated static ip via loadBalancerIP did not work.
5555
Had to manually mark LB IP as static via cloud console.
5656

5757
- sql admin API needed to be manually enabled [here](https://console.developers.google.com/apis/library/sqladmin.googleapis.com)
5858
- matomo sql data was manually imported/exported via sql dashboard and gsutil in cloud console
5959
- events archive history was manually migrated via `gsutil -m rsync` in cloud console
60+
61+
## OVH
62+
63+
The new OVH cluster is also deployed via terraform in the `ovh` directory.
64+
This has a lot less to deploy than flagship GKE,
65+
but deploys a Harbor (container image) registry as well.
66+
67+
### OVH Notes
68+
69+
- credentials are in `terraform/secrets/ovh-creds.py`
70+
- token in credentials is owned by Min because OVH tokens are always owned by real OVH users, not per-project 'service account'.
71+
The token only has permissions on the MyBinder cloud project, however.
72+
- the only manual creation step was the s3 bucket and user for terraform state, the rest is created with terraform
73+
- harbor registry on OVH is old, and this forces us to use an older
74+
harbor _provider_.
75+
Once OVH upgrades harbor to at least 2.2 (2.4 expected in 2022-12), we should be able to upgrade the harbor provider and robot accounts.

0 commit comments

Comments
 (0)