Skip to content

Commit 8c40087

Browse files
authored
Merge pull request #83 from sguyennet/dev/sgu/kubeflow-on-pci
feat: Kubeflow on PCI
2 parents b011e14 + 0ff6f2e commit 8c40087

40 files changed

+126915
-0
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,3 +48,5 @@ ai/ai-endpoints/java-langchain4j-chatbot/target/test-classes/com/ovhcloud/exampl
4848

4949
# Dot env files
5050
.env
51+
use-cases/kubeflow/ovhrc.sh
52+
use-cases/kubeflow/kubeconfig

use-cases/kubeflow/README.md

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# Kubeflow on OVHcloud Public Cloud
2+
3+
The purpose of this tutorial is to deploy KubeFlow in an OVHcloud Managed Kubernetes cluster with all the essential tools.
4+
5+
This Terraform will create and configure:
6+
7+
* A private network
8+
* A gateway
9+
* A managed Kubernetes cluster
10+
* A Public Cloud load balancer with a public IP
11+
* A MySQL managed database
12+
* An object storage bucket
13+
* A Kubeflow deployment
14+
* A Nvidia GPU Operator to install automatically Nvidia drivers on GPU nodes
15+
* A Kyverno deployment to secure the workload created by the Kubeflow users
16+
* A FQDN for Kubeflow
17+
* Let's Encrypt TLS certificates for Kubeflow
18+
19+
![Kubeflow on OVHcloud Public Cloud](./img/kubeflow-public-cloud.png)
20+
21+
**Requirements:**
22+
23+
You need the following:
24+
* [Terraform](https://www.terraform.io/) installed
25+
* An [OVHcloud Public Cloud project](https://www.ovhcloud.com/en/public-cloud/)
26+
* An [OVHcloud vRack private network](https://www.ovhcloud.com/en/network/vrack/)
27+
* An [OVHcloud domain name](https://www.ovhcloud.com/en/domains/)
28+
29+
As we are going to configure the infrastructure using a private network, your public cloud project needs to be in a vRack.
30+
31+
## Configure the deployment
32+
33+
### Configure the Terraform providers
34+
35+
Create an OVHcloud API token:
36+
37+
https://api.ovh.com/createToken?GET=/\*&POST=/\*&PUT=/\*&DELETE=/\*
38+
39+
Configure Terraform with this token:
40+
41+
```bash
42+
vim ovhrc.sh
43+
```
44+
45+
```bash
46+
export OVH_ENDPOINT="ovh-eu"
47+
export OVH_APPLICATION_KEY="<your_application_key>"
48+
export OVH_APPLICATION_SECRET="<your_application_secret>"
49+
export OVH_CONSUMER_KEY="<your_consumer_key>"
50+
export OVH_CLOUD_PROJECT_SERVICE="<your_public_cloud_project_ID>"
51+
```
52+
53+
You should create a second OVHcloud credential specific for the DNS configuration with limited permissions (better for security).
54+
55+
Create an API token with the following permissions:
56+
https://www.ovh.com/auth/api/createToken
57+
58+
```
59+
GET on /domain/zone
60+
GET on /domain/zone/*/record
61+
GET on /domain/zone/*/record/*
62+
POST on /domain/zone/*/record
63+
DELETE on /domain/zone/*/record/*
64+
GET on /domain/zone/*/soa
65+
POST on /domain/zone/*/refresh
66+
```
67+
68+
```bash
69+
vim ovhrc.sh
70+
```
71+
72+
Add at the end of the file:
73+
74+
```bash
75+
export TF_VAR_ovh_api_dns_application_key="<your_dns_application_key>"
76+
export TF_VAR_ovh_api_dns_application_secret="<your_dns_application_secret>"
77+
export TF_VAR_ovh_api_dns_consumer_key="<your_dns_consumer_key>"
78+
```
79+
80+
Create a default Kubernetes configuration file if you don't already have one:
81+
82+
```bash
83+
[ ! -f ~/.kube/config ] && { mkdir -p ~/.kube; touch ~/.kube/config; }
84+
```
85+
86+
## Customize the deployment
87+
88+
Configure Terraform with your OVH domain name:
89+
90+
```bash
91+
vim terraform.tfvars
92+
```
93+
94+
```bash
95+
ovh_dns_domain = "<your_ovh_domain_name>"
96+
```
97+
98+
You can find the list of configuration variables in `variables.tf` and you can override the default values in `terraform.tfvars`.
99+
100+
## Deploy the stack
101+
102+
```bash
103+
source ovhrc.sh
104+
terraform init
105+
terraform apply -target module.s3_user
106+
terraform apply
107+
```
108+
109+
## Access to the Kubeflow UI
110+
111+
Get the Kubeflow URL:
112+
113+
```bash
114+
KUBEFLOW_URL=$(terraform output kubeflow_url)
115+
echo $KUBEFLOW_URL
116+
```
117+
118+
Get the username and password:
119+
120+
```bash
121+
KUBEFLOW_USER=$(terraform output kubeflow_user)
122+
KUBEFLOW_PASSWORD=$(terraform output kubeflow_password)
123+
echo $KUBEFLOW_USER
124+
echo $KUBEFLOW_PASSWORD
125+
```
126+
127+
You can now access to the KubeFlow URL with the user and password.
128+
129+
You have to wait until the DNS propagation and then you should be able to reach the Kubeflow URL.
130+
131+
## Pipeline example
132+
133+
In the Kubeflow UI, in the menu on the left, click on `Experiments (KFP)` and create a new experiment.
134+
135+
Click on `Pipelines` and choose one of the existing pipeline (example: `[Tutorial] Data passing in python components`).
136+
To launch the pipeline click on `Create Run` and choose in which experiment you would like to run the pipeline.
137+
138+
After a while the status of the run should turn green. You can see the logs of the run in your object storage.
139+
140+
## Troubleshoot
141+
142+
### Access the Kubernetes cluster
143+
144+
```bash
145+
terraform output --raw ovh_kube_cluster_kubeconfig > ./kubeconfig
146+
kubectl get nodes --kubeconfig ./kubeconfig
147+
```
265 KB
Loading
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
resource "helm_release" "external-dns" {
2+
name = "external-dns"
3+
namespace = "external-dns"
4+
5+
repository = "https://kubernetes-sigs.github.io/external-dns"
6+
chart = "external-dns"
7+
version = "1.12.2"
8+
9+
create_namespace = true
10+
11+
set {
12+
name = "provider"
13+
value = "ovh"
14+
}
15+
16+
set {
17+
name = "domainFilters[0]"
18+
value = "${var.ovh_dns_domain}"
19+
}
20+
21+
set {
22+
name = "sources[0]"
23+
value = "istio-gateway"
24+
}
25+
26+
set {
27+
name = "policy"
28+
value = "sync"
29+
}
30+
31+
set {
32+
name = "rbac.additionalPermissions[0].apiGroups[0]"
33+
value = "networking.istio.io"
34+
}
35+
36+
set {
37+
name = "rbac.additionalPermissions[0].resources[0]"
38+
value = "gateways"
39+
}
40+
41+
set {
42+
name = "rbac.additionalPermissions[0].resources[1]"
43+
value = "virtualservices"
44+
}
45+
46+
set {
47+
name = "rbac.additionalPermissions[0].verbs[0]"
48+
value = "get"
49+
}
50+
51+
set {
52+
name = "rbac.additionalPermissions[0].verbs[1]"
53+
value = "watch"
54+
}
55+
56+
set {
57+
name = "rbac.additionalPermissions[0].verbs[2]"
58+
value = "list"
59+
}
60+
61+
set {
62+
name = "env[0].name"
63+
value = "OVH_APPLICATION_KEY"
64+
}
65+
66+
set {
67+
name = "env[0].value"
68+
value = "${var.ovh_api_dns_application_key}"
69+
}
70+
71+
set {
72+
name = "env[1].name"
73+
value = "OVH_APPLICATION_SECRET"
74+
}
75+
76+
set {
77+
name = "env[1].value"
78+
value = "${var.ovh_api_dns_application_secret}"
79+
}
80+
81+
set {
82+
name = "env[2].name"
83+
value = "OVH_CONSUMER_KEY"
84+
}
85+
86+
set {
87+
name = "env[2].value"
88+
value = "${var.ovh_api_dns_consumer_key}"
89+
}
90+
91+
set {
92+
name = "nodeSelector.kubeflow"
93+
value = "control-plane"
94+
}
95+
96+
set {
97+
name = "tolerations[0].effect"
98+
value = "NoSchedule"
99+
}
100+
101+
set {
102+
name = "tolerations[0].key"
103+
value = "kubeflow"
104+
}
105+
106+
set {
107+
name = "tolerations[0].operator"
108+
value = "Equal"
109+
}
110+
111+
set {
112+
name = "tolerations[0].value"
113+
value = "control-plane"
114+
}
115+
116+
depends_on = [ovh_cloud_project_kube.ovh_kube_cluster, ovh_cloud_project_kube_nodepool.control_plane_pool]
117+
}

0 commit comments

Comments
 (0)