Skip to content

Commit df8c0e0

Browse files
Merge pull request #217723 from jiaochenlu/update-instancetype
update TaintTolerance and InstanceType
2 parents 396e819 + 92b2d75 commit df8c0e0

7 files changed

+172
-53
lines changed

articles/machine-learning/how-to-deploy-kubernetes-extension.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ In this article, you can learn:
3131
* An AKS cluster is up and running in Azure.
3232
* If you have not previously used cluster extensions, you need to [register the KubernetesConfiguration service provider](../aks/dapr.md#register-the-kubernetesconfiguration-service-provider).
3333
* Or an Arc Kubernetes cluster is up and running. Follow instructions in [connect existing Kubernetes cluster to Azure Arc](../azure-arc/kubernetes/quickstart-connect-cluster.md).
34-
* If the cluster is an Azure RedHat OpenShift Service (ARO) cluster or OpenShift Container Platform (OCP) cluster, you must satisfy other prerequisite steps as documented in the [Reference for configuring Kuberenetes cluster](./reference-kubernetes.md#prerequisites-for-aro-or-ocp-clusters) article.
34+
* If the cluster is an Azure RedHat OpenShift Service (ARO) cluster or OpenShift Container Platform (OCP) cluster, you must satisfy other prerequisite steps as documented in the [Reference for configuring Kubernetes cluster](./reference-kubernetes.md#prerequisites-for-aro-or-ocp-clusters) article.
3535
* The Kubernetes cluster must have minimum of 4 vCPU cores and 8-GB memory.
3636
* Cluster running behind an outbound proxy server or firewall needs extra [network configurations](./how-to-access-azureml-behind-firewall.md#kubernetes-compute)
3737
* Install or upgrade Azure CLI to version 2.24.0 or higher.
@@ -56,10 +56,10 @@ You can use AzureML CLI command `k8s-extension create` to deploy AzureML extensi
5656
| `allowInsecureConnections` |`True` or `False`, default `False`. **Can** be set to `True` to use inference HTTP endpoints for development or test purposes. |N/A| Optional | Optional |
5757
| `inferenceRouterServiceType` |`loadBalancer`, `nodePort` or `clusterIP`. **Required** if `enableInference=True`. | N/A| **✓** | **✓** |
5858
| `internalLoadBalancerProvider` | This config is only applicable for Azure Kubernetes Service(AKS) cluster now. Set to `azure` to allow the inference router using internal load balancer. | N/A| Optional | Optional |
59-
|`sslSecret`| The name of Kubernetes secret in `azureml` namespace to store `cert.pem` (PEM-encoded TLS/SSL cert) and `key.pem` (PEM-encoded TLS/SSL key), required for inference HTTPS endpoint support, when ``allowInsecureConnections`` is set to False. You can find a sample YAML definition of sslSecret [here](./reference-kubernetes.md#sample-yaml-definition-of-kubernetes-secret-for-tlsssl). Use this config or combination of `sslCertPemFile` and `sslKeyPemFile` protected config settings. |N/A| Optional | Optional |
60-
|`sslCname` |An TLS/SSL CName is used by inference HTTPS endpoint. **Required** if `allowInsecureConnections=False` | N/A | Optional | Optional|
59+
|`sslSecret`| The name of the Kubernetes secret in the `azureml` namespace. This config is used to store `cert.pem` (PEM-encoded TLS/SSL cert) and `key.pem` (PEM-encoded TLS/SSL key), which are required for inference HTTPS endpoint support when ``allowInsecureConnections`` is set to `False`. For a sample YAML definition of `sslSecret`, see [Configure sslSecret](./how-to-secure-kubernetes-online-endpoint.md#configure-sslsecret). Use this config or a combination of `sslCertPemFile` and `sslKeyPemFile` protected config settings. |N/A| Optional | Optional |
60+
|`sslCname` |An TLS/SSL CNAME is used by inference HTTPS endpoint. **Required** if `allowInsecureConnections=False` | N/A | Optional | Optional|
6161
| `inferenceRouterHA` |`True` or `False`, default `True`. By default, AzureML extension will deploy three inference router replicas for high availability, which requires at least three worker nodes in a cluster. Set to `False` if your cluster has fewer than three worker nodes, in this case only one inference router service is deployed. | N/A| Optional | Optional |
62-
|`nodeSelector` | By default, the deployed kubernetes resources are randomly deployed to one or more nodes of the cluster, and daemonset resources are deployed to ALL nodes. If you want to restrict the extension deployment to specific nodes with label `key1=value1` and `key2=value2`, use `nodeSelector.key1=value1`, `nodeSelector.key2=value2` correspondingly. | Optional| Optional | Optional |
62+
|`nodeSelector` | By default, the deployed kubernetes resources are randomly deployed to one or more nodes of the cluster, and DaemonSet resources are deployed to ALL nodes. If you want to restrict the extension deployment to specific nodes with label `key1=value1` and `key2=value2`, use `nodeSelector.key1=value1`, `nodeSelector.key2=value2` correspondingly. | Optional| Optional | Optional |
6363
|`installNvidiaDevicePlugin` | `True` or `False`, default `False`. [NVIDIA Device Plugin](https://github.com/NVIDIA/k8s-device-plugin#nvidia-device-plugin-for-kubernetes) is required for ML workloads on NVIDIA GPU hardware. By default, AzureML extension deployment won't install NVIDIA Device Plugin regardless Kubernetes cluster has GPU hardware or not. User can specify this setting to `True`, to install it, but make sure to fulfill [Prerequisites](https://github.com/NVIDIA/k8s-device-plugin#prerequisites). | Optional |Optional |Optional |
6464
|`installPromOp`|`True` or `False`, default `True`. AzureML extension needs prometheus operator to manage prometheus. Set to `False` to reuse the existing prometheus operator. For more information about reusing the existing prometheus operator, refer to [reusing the prometheus operator](./how-to-troubleshoot-kubernetes-extension.md#prometheus-operator)| Optional| Optional | Optional |
6565
|`installVolcano`| `True` or `False`, default `True`. AzureML extension needs volcano scheduler to schedule the job. Set to `False` to reuse existing volcano scheduler. For more information about reusing the existing volcano scheduler, refer to [reusing volcano scheduler](./how-to-troubleshoot-kubernetes-extension.md#volcano-scheduler) | Optional| N/A | Optional |
@@ -80,8 +80,8 @@ If you plan to deploy AzureML extension for real-time inference workload and wan
8080

8181
* `azureml-fe` router service is required for real-time inference support and you need to specify `inferenceRouterServiceType` config setting for `azureml-fe`. `azureml-fe` can be deployed with one of following `inferenceRouterServiceType`:
8282
* Type `LoadBalancer`. Exposes `azureml-fe` externally using a cloud provider's load balancer. To specify this value, ensure that your cluster supports load balancer provisioning. Note most on-premises Kubernetes clusters might not support external load balancer.
83-
* Type `NodePort`. Exposes `azureml-fe` on each Node's IP at a static port. You'll be able to contact `azureml-fe`, from outside of cluster, by requesting `<NodeIP>:<NodePort>`. Using `NodePort` also allows you to set up your own load balancing solution and TLS/SSL termination for `azureml-fe`.
84-
* Type `ClusterIP`. Exposes `azureml-fe` on a cluster-internal IP, and it makes `azureml-fe` only reachable from within the cluster. For `azureml-fe` to serve inference requests coming outside of cluster, it requires you to set up your own load balancing solution and TLS/SSL termination for `azureml-fe`.
83+
* Type `NodePort`. Exposes `azureml-fe` on each Node's IP at a static port. You'll be able to contact `azureml-fe`, from outside of cluster, by requesting `<NodeIP>:<NodePort>`. Using `NodePort` also allows you to setup your own load balancing solution and TLS/SSL termination for `azureml-fe`.
84+
* Type `ClusterIP`. Exposes `azureml-fe` on a cluster-internal IP, and it makes `azureml-fe` only reachable from within the cluster. For `azureml-fe` to serve inference requests coming outside of cluster, it requires you to setup your own load balancing solution and TLS/SSL termination for `azureml-fe`.
8585
* To ensure high availability of `azureml-fe` routing service, AzureML extension deployment by default creates three replicas of `azureml-fe` for clusters having three nodes or more. If your cluster has **less than 3 nodes**, set `inferenceLoadbalancerHA=False`.
8686
* You also want to consider using **HTTPS** to restrict access to model endpoints and secure the data that clients submit. For this purpose, you would need to specify either `sslSecret` config setting or combination of `sslKeyPemFile` and `sslCertPemFile` config-protected settings.
8787
* By default, AzureML extension deployment expects config settings for **HTTPS** support. For development or testing purposes, **HTTP** support is conveniently provided through config setting `allowInsecureConnections=True`.
@@ -167,7 +167,7 @@ Upon AzureML extension deployment completes, you can use `kubectl get deployment
167167

168168
|Resource name |Resource type |Training |Inference |Training and Inference| Description | Communication with cloud|
169169
|--|--|--|--|--|--|--|
170-
|relayserver|Kubernetes deployment|**&check;**|**&check;**|**&check;**|Relayserver is only created for Arc Kubernetes cluster, and **not** in AKS cluster. Relayserver works with Azure Relay to communicate with the cloud services.|Receive the request of job creation, model deployment from cloud service; sync the job status with cloud service.|
170+
|relayserver|Kubernetes deployment|**&check;**|**&check;**|**&check;**|Relay server is only created for Arc Kubernetes cluster, and **not** in AKS cluster. Relay server works with Azure Relay to communicate with the cloud services.|Receive the request of job creation, model deployment from cloud service; sync the job status with cloud service.|
171171
|gateway|Kubernetes deployment|**&check;**|**&check;**|**&check;**|The gateway is used to communicate and send data back and forth.|Send nodes and cluster resource information to cloud services.|
172172
|aml-operator|Kubernetes deployment|**&check;**|N/A|**&check;**|Manage the lifecycle of training jobs.| Token exchange with the cloud token service for authentication and authorization of Azure Container Registry.|
173173
|metrics-controller-manager|Kubernetes deployment|**&check;**|**&check;**|**&check;**|Manage the configuration for Prometheus|N/A|

articles/machine-learning/how-to-kubernetes-inference-routing-azureml-fe.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -98,14 +98,19 @@ concurrentRequests = targetRps * reqTime / targetUtilization
9898
replicas = ceil(concurrentRequests / maxReqPerContainer)
9999
```
100100

101+
### Performance of azureml-fe
102+
103+
The `azureml-fe` can reach 5K requests per second (QPS) with good latency, having an overhead not exceeding 3ms on average and 15ms at 99% percentile.
104+
105+
101106
>[!Note]
102107
>
103-
>`azureml-fe` can reach to 5K requests per second (QPS) with good latency, with no more than 3ms overhead in average, and 15ms at 99% percentile.
104-
>
105-
>If you have RPS requirements higher than 10K, consider following options:
108+
>If you have RPS requirements higher than 10K, consider the following options:
106109
>
107-
>* Increase resource requests/limits for `azureml-fe` pods, by default it has 2 vCPU and 1.2G memory resource limit.
108-
>* Increase number of instances for `azureml-fe`, by default AzureML creates 3 `azureml-fe` instances per cluster.
110+
>* Increase resource requests/limits for `azureml-fe` pods; by default it has 2 vCPU and 1.2G memory resource limit.
111+
>* Increase the number of instances for `azureml-fe`. By default, AzureML creates 3 or 1 `azureml-fe` instances per cluster.
112+
> * This instance count depends on your configuration of `inferenceRouterHA` of the [AzureML entension](how-to-deploy-kubernetes-extension.md#review-azureml-extension-configuration-settings).
113+
> * The increased instance count cannot be persisted, since it will be overwritten with your configured value once the extension is upgraded.
109114
>* Reach out to Microsoft experts for help.
110115
111116
## Understand connectivity requirements for AKS inferencing cluster

articles/machine-learning/how-to-manage-kubernetes-instance-types.md

Lines changed: 78 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.author: bozhlin
77
ms.reviewer: ssalgado
88
ms.service: machine-learning
99
ms.subservice: core
10-
ms.date: 08/31/2022
10+
ms.date: 11/09/2022
1111
ms.topic: how-to
1212
ms.custom: build-spring-2022, cliv2, sdkv2, event-tier1-build-2022
1313
---
@@ -26,17 +26,19 @@ In short, a `nodeSelector` lets you specify which node a pod should run on. The
2626

2727
## Default instance type
2828

29-
By default, a `defaultinstancetype` with following definition is created when you attach Kuberenetes cluster to AzureML workspace:
29+
By default, a `defaultinstancetype` with the following definition is created when you attach a Kubernetes cluster to an AzureML workspace:
3030
- No `nodeSelector` is applied, meaning the pod can get scheduled on any node.
31-
- The workload's pods are assigned default resources with 0.6 cpu cores, 1536Mi memory and 0 GPU:
31+
- The workload's pods are assigned default resources with 0.1 cpu cores, 500Mi memory and 0 GPU for request.
32+
- Resource use by the workload's pods is limited to 2 cpu cores and 8 GB memory:
33+
3234
```yaml
3335
resources:
3436
requests:
35-
cpu: "0.6"
36-
memory: "1536Mi"
37+
cpu: "100m"
38+
memory: "500MB"
3739
limits:
38-
cpu: "0.6"
39-
memory: "1536Mi"
40+
cpu: "2"
41+
memory: "8Gi"
4042
nvidia.com/gpu: null
4143
```
4244
@@ -77,13 +79,18 @@ The following steps will create an instance type with the labeled behavior:
7779
- Pods will be assigned resource requests of `700m` CPU and `1500Mi` memory.
7880
- Pods will be assigned resource limits of `1` CPU, `2Gi` memory and `1` NVIDIA GPU.
7981

80-
> [!NOTE]
81-
> - NVIDIA GPU resources are only specified in the `limits` section as integer values. For more information,
82-
see the Kubernetes [documentation](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#using-device-plugins).
83-
> - CPU and memory resources are string values.
84-
> - CPU can be specified in millicores, for example `100m`, or in full numbers, for example `"1"`
85-
is equivalent to `1000m`.
86-
> - Memory can be specified as a full number + suffix, for example `1024Mi` for 1024 MiB.
82+
Creation of custom instance types must meet the following parameters and definition rules, otherwise the instance type creation will fail:
83+
84+
| Parameter | Required | Description |
85+
| --- | --- | --- |
86+
| name | required | String values, which must be unique in cluster.|
87+
| CPU request | required | String values, which cannot be 0 or empty. <br>CPU can be specified in millicores; for example, `100m`. Can also be specified as full numbers; for example, `"1"` is equivalent to `1000m`.|
88+
| Memory request | required | String values, which cannot be 0 or empty. <br>Memory can be specified as a full number + suffix; for example, `1024Mi` for 1024 MiB.|
89+
| CPU limit | required | String values, which cannot be 0 or empty. <br>CPU can be specified in millicores; for example, `100m`. Can also be specified as full numbers; for example, `"1"` is equivalent to `1000m`.|
90+
| Memory limit | required | String values, which cannot be 0 or empty. <br>Memory can be specified as a full number + suffix; for example, `1024Mi` for 1024 MiB.|
91+
| GPU | optional | Integer values, which can only be specified in the `limits` section. <br>For more information, see the Kubernetes [documentation](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#using-device-plugins). |
92+
| nodeSelector | optional | Map of string keys and values. |
93+
8794

8895
It's also possible to create multiple instance types at once:
8996

@@ -128,23 +135,45 @@ If a training or inference workload is submitted without an instance type, it us
128135

129136
### Select instance type to submit training job
130137

138+
#### [Azure CLI](#tab/select-instancetype-to-trainingjob-with-cli)
139+
131140
To select an instance type for a training job using CLI (V2), specify its name as part of the
132141
`resources` properties section in job YAML. For example:
142+
133143
```yaml
134144
command: python -c "print('Hello world!')"
135145
environment:
136146
image: library/python:latest
137-
compute: azureml:<compute_target_name>
147+
compute: azureml:<Kubernetes-compute_target_name>
138148
resources:
139149
instance_type: <instance_type_name>
140150
```
141151

142-
In the above example, replace `<compute_target_name>` with the name of your Kubernetes compute
143-
target and `<instance_type_name>` with the name of the instance type you wish to select. If there's no `instance_type` property specified, the system will use `defaultinstancetype` to submit job.
152+
#### [Python SDK](#tab/select-instancetype-to-trainingjob-with-sdk)
153+
154+
To select an instance type for a training job using SDK (V2), specify its name for `instance_type` property in `command` class. For example:
155+
156+
```python
157+
from azure.ai.ml import command
158+
159+
# define the command
160+
command_job = command(
161+
command="python -c "print('Hello world!')"",
162+
environment="AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu@latest",
163+
compute="<Kubernetes-compute_target_name>",
164+
instance_type="<instance_type_name>"
165+
)
166+
```
167+
---
168+
169+
In the above example, replace `<Kubernetes-compute_target_name>` with the name of your Kubernetes compute
170+
target and replace `<instance_type_name>` with the name of the instance type you wish to select. If there's no `instance_type` property specified, the system will use `defaultinstancetype` to submit the job.
144171

145172
### Select instance type to deploy model
146173

147-
To select an instance type for a model deployment using CLI (V2), specify its name for `instance_type` property in deployment YAML. For example:
174+
#### [Azure CLI](#tab/select-instancetype-to-modeldeployment-with-cli)
175+
176+
To select an instance type for a model deployment using CLI (V2), specify its name for the `instance_type` property in the deployment YAML. For example:
148177

149178
```yaml
150179
name: blue
@@ -161,9 +190,38 @@ environment:
161190
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1
162191
```
163192

164-
In the above example, replace `<instance_type_name>` with the name of the instance type you wish to select. If there's no `instance_type` property specified, the system will use `defaultinstancetype` to deploy model.
193+
#### [Python SDK](#tab/select-instancetype-to-modeldeployment-with-sdk)
194+
195+
To select an instance type for a model deployment using SDK (V2), specify its name for the `instance_type` property in the `KubernetesOnlineDeployment` class. For example:
196+
197+
```python
198+
from azure.ai.ml import KubernetesOnlineDeployment,Model,Environment,CodeConfiguration
199+
200+
model = Model(path="./model/sklearn_mnist_model.pkl")
201+
env = Environment(
202+
conda_file="./model/conda.yml",
203+
image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1",
204+
)
205+
206+
# define the deployment
207+
blue_deployment = KubernetesOnlineDeployment(
208+
name="blue",
209+
endpoint_name="<endpoint name>",
210+
model=model,
211+
environment=env,
212+
code_configuration=CodeConfiguration(
213+
code="./script/", scoring_script="score.py"
214+
),
215+
instance_count=1,
216+
instance_type="<instance type name>",
217+
)
218+
```
219+
---
220+
221+
In the above example, replace `<instance_type_name>` with the name of the instance type you wish to select. If there's no `instance_type` property specified, the system will use `defaultinstancetype` to deploy the model.
222+
165223

166224
## Next steps
167225

168226
- [AzureML inference router and connectivity requirements](./how-to-kubernetes-inference-routing-azureml-fe.md)
169-
- [Secure AKS inferencing environment](./how-to-secure-kubernetes-inferencing-environment.md)
227+
- [Secure AKS inferencing environment](./how-to-secure-kubernetes-inferencing-environment.md)

0 commit comments

Comments
 (0)