Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Commit a841776

Browse files
authored
Merge pull request #225 from panchul/gpu_misc
Updating container references, fixing docs to more portable links
2 parents 081478f + d2493d3 commit a841776

File tree

6 files changed

+37
-20
lines changed

6 files changed

+37
-20
lines changed

edge_k8s_gpu_sharing/deploy_infer.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ spec:
2222
spec:
2323
containers:
2424
- name: my-infer
25+
# !!! put your own image location instead
2526
image: myregistry.azurecr.io/rollingstone/myinfer:1.0
2627
ports:
2728
# we use only 5001, but the container exposes EXPOSE 5001 8883 8888
@@ -37,5 +38,5 @@ spec:
3738
# memory: "128Mi" #128 MB
3839
# cpu: "200m" # 200 millicpu (0.2 or 20% of the cpu)
3940
nvidia.com/gpu: 1
40-
imagePullSecrets:
41-
- name: secret4acr2infer
41+
#imagePullSecrets:
42+
# - name: secret4acr2infer

edge_k8s_gpu_sharing/deploy_infer2.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ spec:
2424
spec:
2525
containers:
2626
- name: my-infer2
27+
# !!! put your own image location instead
2728
image: myregistry.azurecr.io/rollingstone/myinfer:1.0
2829
env:
2930
- name: NVIDIA_VISIBLE_DEVICES
@@ -38,5 +39,5 @@ spec:
3839
limits:
3940
# not using gpu allocation via `limits`, using NVIDIA_VISIBLE_DEVICES env.
4041
# nvidia.com/gpu: 1
41-
imagePullSecrets:
42-
- name: secret4acr2infer
42+
#imagePullSecrets:
43+
# - name: secret4acr2infer

edge_k8s_gpu_sharing/deploy_infer3.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ spec:
2424
spec:
2525
containers:
2626
- name: my-infer3
27+
# !!! put your own image location instead
2728
image: myregistry.azurecr.io/rollingstone/myinfer:1.0
2829
env:
2930
- name: NVIDIA_VISIBLE_DEVICES
@@ -38,5 +39,5 @@ spec:
3839
limits:
3940
# not using gpu allocation via `limits`, using NVIDIA_VISIBLE_DEVICES env.
4041
# nvidia.com/gpu: 1
41-
imagePullSecrets:
42-
- name: secret4acr2infer
42+
#imagePullSecrets:
43+
# - name: secret4acr2infer

edge_k8s_gpu_sharing/deploy_infer_GPU_GREED.yaml

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,21 +7,22 @@
77
apiVersion: apps/v1
88
kind: Deployment
99
metadata:
10-
name: my-infer-GPU_GREED
10+
name: my-infer-gpugreed
1111
labels:
12-
app: my-infer-GPU_GREED
12+
app: my-infer-gpugreed
1313
spec:
1414
replicas: 1
1515
selector:
1616
matchLabels:
17-
app: my-infer-GPU_GREED
17+
app: my-infer-gpugreed
1818
template:
1919
metadata:
2020
labels:
21-
app: my-infer-GPU_GREED
21+
app: my-infer-gpugreed
2222
spec:
2323
containers:
24-
- name: my-infer-GPU_GREED
24+
- name: my-infer-gpugreed
25+
# !!! put your own image location instead
2526
image: myregistry.azurecr.io/rollingstone/myinfer:1.0
2627
ports:
2728
# we use only 5001, but the container exposes EXPOSE 5001 8883 8888
@@ -37,5 +38,5 @@ spec:
3738
# memory: "128Mi" #128 MB
3839
# cpu: "200m" # 200 millicpu (0.2 or 20% of the cpu)
3940
nvidia.com/gpu: 100
40-
imagePullSecrets:
41-
- name: secret4acr2infer
41+
#imagePullSecrets:
42+
# - name: secret4acr2infer

edge_k8s_gpu_sharing/kubernetes_gpu_sharing_edge.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ in a deployment .yaml these entries would request an allocation of one gpu devic
4848
nvidia.com/gpu: 1
4949
...
5050

51-
To damonstrate this, let's deploy one of our previous models from [machine-learning-notebooks/deploying-on-k8s](../machine-learning-notebooks/deploying-on-k8s),
51+
To damonstrate this, let's deploy one of our previous models from [machine-learning-notebooks/deploying-on-k8s](../machine-learning-notebooks/deploying-on-k8s/Readme.md),
5252
you will need to run this notebook to create the container image: [machine-learning-notebooks/deploying-on-k8s/production-deploy-to-k8s-gpu.ipynb](../machine-learning-notebooks/deploying-on-k8s/production-deploy-to-k8s-gpu.ipynb).
5353

5454
`deploy_infer.yaml` will look like this:
@@ -366,5 +366,4 @@ To clean the environment from what we created, we need to delete the deployments
366366
# Links
367367

368368
- https://docs.microsoft.com/en-us/azure/databox-online/azure-stack-edge-gpu-connect-powershell-interface#view-gpu-driver-information
369-
370369
- https://nvidia.github.io/gpu-operator/

edge_k8s_gpu_sharing/kubernetes_gpu_sharing_one_node.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,19 @@ This demo shows how to deploy multiple gpu-requiring workloads on a cluster with
44

55
## Pre-requisites
66

7-
Please follow the instructions in [Deploying model to Kubernetes](../deploying-on-k8s/README.md)
7+
To create a one-node gpu-capable Kubernetes cluster, you need a gpu-capable VM. During creation of the
8+
VMs, you need to specify a GPU-capable VM Size(either at Portal, or in your deployment template).
9+
10+
Please follow the instructions in [Deploying model to Kubernetes](../machine-learning-notebooks/deploying-on-k8s/Readme.md)
811
to make sure you have a GPU-capable node on your vm.
912

10-
Please see [NVIDIA webpage](https://docs.nvidia.com/datacenter/kubernetes/kubernetes-upstream/index.html#kubernetes-run-a-workload) if you have any problems. You should be able to run nvidia-smi:
13+
If you need to install docker, follow the instructions at [Nvidia cloud native containers](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker).
14+
15+
And if you need to install the drivers, see [Azure VM driver setup](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup) or related. You might have to upgrade your system and/or drivers to work.
16+
17+
Please see [NVIDIA webpage](https://docs.nvidia.com/datacenter/kubernetes/kubernetes-upstream/index.html#kubernetes-run-a-workload) if you have any problems.
18+
19+
Before moving forward, you should be able to run nvidia-smi:
1120

1221
$ sudo docker run --rm --runtime=nvidia nvidia/cuda nvidia-smi
1322
+-----------------------------------------------------------------------------+
@@ -27,7 +36,7 @@ Please see [NVIDIA webpage](https://docs.nvidia.com/datacenter/kubernetes/kubern
2736
| No running processes found |
2837
+-----------------------------------------------------------------------------+
2938

30-
Once you installed `microk8s` as in our demo [Deploying model to Kubernetes](../machine-learning-notebooks/deploying-on-k8s/README.md),
39+
Once you installed `microk8s` as in our demo [Deploying model to Kubernetes](../machine-learning-notebooks/deploying-on-k8s/Readme.md),
3140
you should also be able to see `nvidia-smi` from within a pod:
3241

3342
$ kubectl exec -it gpu-pod nvidia-smi
@@ -67,7 +76,7 @@ in a deployment .yaml these entries would request an allocation of one gpu devic
6776
nvidia.com/gpu: 1
6877
...
6978

70-
To damonstrate this, let's deploy one of our previous models from [machine-learning-notebooks/deploying-on-k8s](../machine-learning-notebooks/deploying-on-k8s),
79+
To damonstrate this, let's deploy one of our previous models from [machine-learning-notebooks/deploying-on-k8s](../machine-learning-notebooks/deploying-on-k8s/Readme.md),
7180
you will need to run this notebook to create the container image: [machine-learning-notebooks/deploying-on-k8s/production-deploy-to-k8s-gpu.ipynb](../machine-learning-notebooks/deploying-on-k8s/production-deploy-to-k8s-gpu.ipynb).
7281

7382
`deploy_infer.yaml` will look like this:
@@ -153,6 +162,11 @@ indicate insufficient resource:
153162
Warning FailedScheduling <unknown> default-scheduler 0/2 nodes are available: 100 Insufficient nvidia.com/gpu.
154163
...
155164

165+
$ kubectl get pods -n myasetest1
166+
NAMESPACE NAME READY STATUS RESTARTS AGE
167+
myasetest1 my-infer-f79869b88-vfbnx 1/1 Running 0 41m
168+
myasetest1 my-infer-gpugreed-5c88f68f6b-c9gd5 0/1 Pending 0 9m
169+
156170
You can delete it like so:
157171

158172
$ kubectl delete -f deploy_infer_GPU_GREED.yaml -n myasetest1
@@ -376,4 +390,4 @@ To clean the environment from what we created, we need to delete the deployments
376390
- https://docs.microsoft.com/en-us/azure/databox-online/azure-stack-edge-gpu-connect-powershell-interface#view-gpu-driver-information
377391
- https://nvidia.github.io/gpu-operator/
378392
- https://github.com/NVIDIA/k8s-device-plugin/blob/examples/workloads/pod.yml
379-
- [Deploying model to Kubernetes](../machine-learning-notebooks/deploying-on-k8s/README.md)
393+
- [Deploying model to Kubernetes](../machine-learning-notebooks/deploying-on-k8s/Readme.md)

0 commit comments

Comments
 (0)