You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-deploy-kubernetes-extension.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,7 +31,7 @@ In this article, you can learn:
31
31
* An AKS cluster is up and running in Azure.
32
32
* If you have not previously used cluster extensions, you need to [register the KubernetesConfiguration service provider](../aks/dapr.md#register-the-kubernetesconfiguration-service-provider).
33
33
* Or an Arc Kubernetes cluster is up and running. Follow instructions in [connect existing Kubernetes cluster to Azure Arc](../azure-arc/kubernetes/quickstart-connect-cluster.md).
34
-
* If the cluster is an Azure RedHat OpenShift Service (ARO) cluster or OpenShift Container Platform (OCP) cluster, you must satisfy other prerequisite steps as documented in the [Reference for configuring Kuberenetes cluster](./reference-kubernetes.md#prerequisites-for-aro-or-ocp-clusters) article.
34
+
* If the cluster is an Azure RedHat OpenShift Service (ARO) cluster or OpenShift Container Platform (OCP) cluster, you must satisfy other prerequisite steps as documented in the [Reference for configuring Kubernetes cluster](./reference-kubernetes.md#prerequisites-for-aro-or-ocp-clusters) article.
35
35
* The Kubernetes cluster must have minimum of 4 vCPU cores and 8-GB memory.
36
36
* Cluster running behind an outbound proxy server or firewall needs extra [network configurations](./how-to-access-azureml-behind-firewall.md#kubernetes-compute)
37
37
* Install or upgrade Azure CLI to version 2.24.0 or higher.
@@ -56,10 +56,10 @@ You can use AzureML CLI command `k8s-extension create` to deploy AzureML extensi
56
56
|`allowInsecureConnections`|`True` or `False`, default `False`. **Can** be set to `True` to use inference HTTP endpoints for development or test purposes. |N/A| Optional | Optional |
57
57
|`inferenceRouterServiceType`|`loadBalancer`, `nodePort` or `clusterIP`. **Required** if `enableInference=True`. | N/A|**✓**|**✓**|
58
58
|`internalLoadBalancerProvider`| This config is only applicable for Azure Kubernetes Service(AKS) cluster now. Set to `azure` to allow the inference router using internal load balancer. | N/A| Optional | Optional |
59
-
|`sslSecret`| The name of Kubernetes secret in `azureml` namespaceto store `cert.pem` (PEM-encoded TLS/SSL cert) and `key.pem` (PEM-encoded TLS/SSL key), required for inference HTTPS endpoint support, when ``allowInsecureConnections`` is set to False. You can find a sample YAML definition of sslSecret[here](./reference-kubernetes.md#sample-yaml-definition-of-kubernetes-secret-for-tlsssl). Use this config or combination of `sslCertPemFile` and `sslKeyPemFile` protected config settings. |N/A| Optional | Optional |
60
-
|`sslCname`|An TLS/SSL CName is used by inference HTTPS endpoint. **Required** if `allowInsecureConnections=False`| N/A | Optional | Optional|
59
+
|`sslSecret`| The name of the Kubernetes secret in the `azureml` namespace. This config is used to store `cert.pem` (PEM-encoded TLS/SSL cert) and `key.pem` (PEM-encoded TLS/SSL key), which are required for inference HTTPS endpoint support when ``allowInsecureConnections`` is set to `False`. For a sample YAML definition of `sslSecret`, see [Configure sslSecret](./how-to-secure-kubernetes-online-endpoint.md#configure-sslsecret). Use this config or a combination of `sslCertPemFile` and `sslKeyPemFile` protected config settings. |N/A| Optional | Optional |
60
+
|`sslCname`|An TLS/SSL CNAME is used by inference HTTPS endpoint. **Required** if `allowInsecureConnections=False`| N/A | Optional | Optional|
61
61
|`inferenceRouterHA`|`True` or `False`, default `True`. By default, AzureML extension will deploy three inference router replicas for high availability, which requires at least three worker nodes in a cluster. Set to `False` if your cluster has fewer than three worker nodes, in this case only one inference router service is deployed. | N/A| Optional | Optional |
62
-
|`nodeSelector`| By default, the deployed kubernetes resources are randomly deployed to one or more nodes of the cluster, and daemonset resources are deployed to ALL nodes. If you want to restrict the extension deployment to specific nodes with label `key1=value1` and `key2=value2`, use `nodeSelector.key1=value1`, `nodeSelector.key2=value2` correspondingly. | Optional| Optional | Optional |
62
+
|`nodeSelector`| By default, the deployed kubernetes resources are randomly deployed to one or more nodes of the cluster, and DaemonSet resources are deployed to ALL nodes. If you want to restrict the extension deployment to specific nodes with label `key1=value1` and `key2=value2`, use `nodeSelector.key1=value1`, `nodeSelector.key2=value2` correspondingly. | Optional| Optional | Optional |
63
63
|`installNvidiaDevicePlugin`|`True` or `False`, default `False`. [NVIDIA Device Plugin](https://github.com/NVIDIA/k8s-device-plugin#nvidia-device-plugin-for-kubernetes) is required for ML workloads on NVIDIA GPU hardware. By default, AzureML extension deployment won't install NVIDIA Device Plugin regardless Kubernetes cluster has GPU hardware or not. User can specify this setting to `True`, to install it, but make sure to fulfill [Prerequisites](https://github.com/NVIDIA/k8s-device-plugin#prerequisites). | Optional |Optional |Optional |
64
64
|`installPromOp`|`True` or `False`, default `True`. AzureML extension needs prometheus operator to manage prometheus. Set to `False` to reuse the existing prometheus operator. For more information about reusing the existing prometheus operator, refer to [reusing the prometheus operator](./how-to-troubleshoot-kubernetes-extension.md#prometheus-operator)| Optional| Optional | Optional |
65
65
|`installVolcano`|`True` or `False`, default `True`. AzureML extension needs volcano scheduler to schedule the job. Set to `False` to reuse existing volcano scheduler. For more information about reusing the existing volcano scheduler, refer to [reusing volcano scheduler](./how-to-troubleshoot-kubernetes-extension.md#volcano-scheduler)| Optional| N/A | Optional |
@@ -80,8 +80,8 @@ If you plan to deploy AzureML extension for real-time inference workload and wan
80
80
81
81
*`azureml-fe` router service is required for real-time inference support and you need to specify `inferenceRouterServiceType` config setting for `azureml-fe`. `azureml-fe` can be deployed with one of following `inferenceRouterServiceType`:
82
82
* Type `LoadBalancer`. Exposes `azureml-fe` externally using a cloud provider's load balancer. To specify this value, ensure that your cluster supports load balancer provisioning. Note most on-premises Kubernetes clusters might not support external load balancer.
83
-
* Type `NodePort`. Exposes `azureml-fe` on each Node's IP at a static port. You'll be able to contact `azureml-fe`, from outside of cluster, by requesting `<NodeIP>:<NodePort>`. Using `NodePort` also allows you to set up your own load balancing solution and TLS/SSL termination for `azureml-fe`.
84
-
* Type `ClusterIP`. Exposes `azureml-fe` on a cluster-internal IP, and it makes `azureml-fe` only reachable from within the cluster. For `azureml-fe` to serve inference requests coming outside of cluster, it requires you to set up your own load balancing solution and TLS/SSL termination for `azureml-fe`.
83
+
* Type `NodePort`. Exposes `azureml-fe` on each Node's IP at a static port. You'll be able to contact `azureml-fe`, from outside of cluster, by requesting `<NodeIP>:<NodePort>`. Using `NodePort` also allows you to setup your own load balancing solution and TLS/SSL termination for `azureml-fe`.
84
+
* Type `ClusterIP`. Exposes `azureml-fe` on a cluster-internal IP, and it makes `azureml-fe` only reachable from within the cluster. For `azureml-fe` to serve inference requests coming outside of cluster, it requires you to setup your own load balancing solution and TLS/SSL termination for `azureml-fe`.
85
85
* To ensure high availability of `azureml-fe` routing service, AzureML extension deployment by default creates three replicas of `azureml-fe` for clusters having three nodes or more. If your cluster has **less than 3 nodes**, set `inferenceLoadbalancerHA=False`.
86
86
* You also want to consider using **HTTPS** to restrict access to model endpoints and secure the data that clients submit. For this purpose, you would need to specify either `sslSecret` config setting or combination of `sslKeyPemFile` and `sslCertPemFile` config-protected settings.
87
87
* By default, AzureML extension deployment expects config settings for **HTTPS** support. For development or testing purposes, **HTTP** support is conveniently provided through config setting `allowInsecureConnections=True`.
@@ -167,7 +167,7 @@ Upon AzureML extension deployment completes, you can use `kubectl get deployment
167
167
168
168
|Resource name |Resource type |Training |Inference |Training and Inference| Description | Communication with cloud|
169
169
|--|--|--|--|--|--|--|
170
-
|relayserver|Kubernetes deployment|**✓**|**✓**|**✓**|Relayserver is only created for Arc Kubernetes cluster, and **not** in AKS cluster. Relayserver works with Azure Relay to communicate with the cloud services.|Receive the request of job creation, model deployment from cloud service; sync the job status with cloud service.|
170
+
|relayserver|Kubernetes deployment|**✓**|**✓**|**✓**|Relay server is only created for Arc Kubernetes cluster, and **not** in AKS cluster. Relay server works with Azure Relay to communicate with the cloud services.|Receive the request of job creation, model deployment from cloud service; sync the job status with cloud service.|
171
171
|gateway|Kubernetes deployment|**✓**|**✓**|**✓**|The gateway is used to communicate and send data back and forth.|Send nodes and cluster resource information to cloud services.|
172
172
|aml-operator|Kubernetes deployment|**✓**|N/A|**✓**|Manage the lifecycle of training jobs.| Token exchange with the cloud token service for authentication and authorization of Azure Container Registry.|
173
173
|metrics-controller-manager|Kubernetes deployment|**✓**|**✓**|**✓**|Manage the configuration for Prometheus|N/A|
The `azureml-fe` can reach 5K requests per second (QPS) with good latency, having an overhead not exceeding 3ms on average and 15ms at 99% percentile.
104
+
105
+
101
106
>[!Note]
102
107
>
103
-
>`azureml-fe` can reach to 5K requests per second (QPS) with good latency, with no more than 3ms overhead in average, and 15ms at 99% percentile.
104
-
>
105
-
>If you have RPS requirements higher than 10K, consider following options:
108
+
>If you have RPS requirements higher than 10K, consider the following options:
106
109
>
107
-
>* Increase resource requests/limits for `azureml-fe` pods, by default it has 2 vCPU and 1.2G memory resource limit.
108
-
>* Increase number of instances for `azureml-fe`, by default AzureML creates 3 `azureml-fe` instances per cluster.
110
+
>* Increase resource requests/limits for `azureml-fe` pods; by default it has 2 vCPU and 1.2G memory resource limit.
111
+
>* Increase the number of instances for `azureml-fe`. By default, AzureML creates 3 or 1 `azureml-fe` instances per cluster.
112
+
> * This instance count depends on your configuration of `inferenceRouterHA` of the [AzureML entension](how-to-deploy-kubernetes-extension.md#review-azureml-extension-configuration-settings).
113
+
> * The increased instance count cannot be persisted, since it will be overwritten with your configured value once the extension is upgraded.
109
114
>* Reach out to Microsoft experts for help.
110
115
111
116
## Understand connectivity requirements for AKS inferencing cluster
@@ -26,17 +26,19 @@ In short, a `nodeSelector` lets you specify which node a pod should run on. The
26
26
27
27
## Default instance type
28
28
29
-
By default, a `defaultinstancetype` with following definition is created when you attach Kuberenetes cluster to AzureML workspace:
29
+
By default, a `defaultinstancetype` with the following definition is created when you attach a Kubernetes cluster to an AzureML workspace:
30
30
- No `nodeSelector` is applied, meaning the pod can get scheduled on any node.
31
-
- The workload's pods are assigned default resources with 0.6 cpu cores, 1536Mi memory and 0 GPU:
31
+
- The workload's pods are assigned default resources with 0.1 cpu cores, 500Mi memory and 0 GPU for request.
32
+
- Resource use by the workload's pods is limited to 2 cpu cores and 8 GB memory:
33
+
32
34
```yaml
33
35
resources:
34
36
requests:
35
-
cpu: "0.6"
36
-
memory: "1536Mi"
37
+
cpu: "100m"
38
+
memory: "500MB"
37
39
limits:
38
-
cpu: "0.6"
39
-
memory: "1536Mi"
40
+
cpu: "2"
41
+
memory: "8Gi"
40
42
nvidia.com/gpu: null
41
43
```
42
44
@@ -77,13 +79,18 @@ The following steps will create an instance type with the labeled behavior:
77
79
- Pods will be assigned resource requests of `700m` CPU and `1500Mi` memory.
78
80
- Pods will be assigned resource limits of `1` CPU, `2Gi` memory and `1` NVIDIA GPU.
79
81
80
-
> [!NOTE]
81
-
> - NVIDIA GPU resources are only specified in the `limits` section as integer values. For more information,
82
-
see the Kubernetes [documentation](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#using-device-plugins).
83
-
> - CPU and memory resources are string values.
84
-
> - CPU can be specified in millicores, for example `100m`, or in full numbers, for example `"1"`
85
-
is equivalent to `1000m`.
86
-
> - Memory can be specified as a full number + suffix, for example `1024Mi` for 1024 MiB.
82
+
Creation of custom instance types must meet the following parameters and definition rules, otherwise the instance type creation will fail:
83
+
84
+
| Parameter | Required | Description |
85
+
| --- | --- | --- |
86
+
| name | required | String values, which must be unique in cluster.|
87
+
| CPU request | required | String values, which cannot be 0 or empty. <br>CPU can be specified in millicores; for example, `100m`. Can also be specified as full numbers; for example, `"1"` is equivalent to `1000m`.|
88
+
| Memory request | required | String values, which cannot be 0 or empty. <br>Memory can be specified as a full number + suffix; for example, `1024Mi` for 1024 MiB.|
89
+
| CPU limit | required | String values, which cannot be 0 or empty. <br>CPU can be specified in millicores; for example, `100m`. Can also be specified as full numbers; for example, `"1"` is equivalent to `1000m`.|
90
+
| Memory limit | required | String values, which cannot be 0 or empty. <br>Memory can be specified as a full number + suffix; for example, `1024Mi` for 1024 MiB.|
91
+
| GPU | optional | Integer values, which can only be specified in the `limits` section. <br>For more information, see the Kubernetes [documentation](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#using-device-plugins). |
92
+
| nodeSelector | optional | Map of string keys and values. |
93
+
87
94
88
95
It's also possible to create multiple instance types at once:
89
96
@@ -128,23 +135,45 @@ If a training or inference workload is submitted without an instance type, it us
To select an instance type for a training job using CLI (V2), specify its name as part of the
132
141
`resources` properties section in job YAML. For example:
142
+
133
143
```yaml
134
144
command: python -c "print('Hello world!')"
135
145
environment:
136
146
image: library/python:latest
137
-
compute: azureml:<compute_target_name>
147
+
compute: azureml:<Kubernetes-compute_target_name>
138
148
resources:
139
149
instance_type: <instance_type_name>
140
150
```
141
151
142
-
In the above example, replace `<compute_target_name>` with the name of your Kubernetes compute
143
-
target and `<instance_type_name>` with the name of the instance type you wish to select. If there's no `instance_type` property specified, the system will use `defaultinstancetype` to submit job.
In the above example, replace `<Kubernetes-compute_target_name>` with the name of your Kubernetes compute
170
+
target and replace `<instance_type_name>` with the name of the instance type you wish to select. If there's no `instance_type` property specified, the system will use `defaultinstancetype` to submit the job.
144
171
145
172
### Select instance type to deploy model
146
173
147
-
To select an instance type for a model deployment using CLI (V2), specify its name for `instance_type` property in deployment YAML. For example:
To select an instance type for a model deployment using CLI (V2), specify its name for the `instance_type` property in the deployment YAML. For example:
In the above example, replace `<instance_type_name>` with the name of the instance type you wish to select. If there's no `instance_type` property specified, the system will use `defaultinstancetype` to deploy model.
To select an instance type for a model deployment using SDK (V2), specify its name for the `instance_type` property in the `KubernetesOnlineDeployment` class. For example:
196
+
197
+
```python
198
+
from azure.ai.ml import KubernetesOnlineDeployment,Model,Environment,CodeConfiguration
199
+
200
+
model = Model(path="./model/sklearn_mnist_model.pkl")
In the above example, replace `<instance_type_name>` with the name of the instance type you wish to select. If there's no `instance_type` property specified, the system will use `defaultinstancetype` to deploy the model.
222
+
165
223
166
224
## Next steps
167
225
168
226
- [AzureML inference router and connectivity requirements](./how-to-kubernetes-inference-routing-azureml-fe.md)
169
-
- [Secure AKS inferencing environment](./how-to-secure-kubernetes-inferencing-environment.md)
227
+
- [Secure AKS inferencing environment](./how-to-secure-kubernetes-inferencing-environment.md)
0 commit comments