MicrosoftDocs
diff --git a/‎articles/machine-learning/how-to-deploy-kubernetes-extension.md
Lines changed: 7 additions & 7 deletions b/‎articles/machine-learning/how-to-deploy-kubernetes-extension.md
Lines changed: 7 additions & 7 deletions
diff --git a/‎articles/machine-learning/how-to-kubernetes-inference-routing-azureml-fe.md
Lines changed: 10 additions & 5 deletions b/‎articles/machine-learning/how-to-kubernetes-inference-routing-azureml-fe.md
Lines changed: 10 additions & 5 deletions
diff --git a/‎articles/machine-learning/how-to-manage-kubernetes-instance-types.md
Lines changed: 78 additions & 20 deletions b/‎articles/machine-learning/how-to-manage-kubernetes-instance-types.md
Lines changed: 78 additions & 20 deletions
@@ -31,7 +31,7 @@ In this article, you can learn:
 * An AKS cluster is up and running in Azure.
   * If you have not previously used cluster extensions, you need to [register the KubernetesConfiguration service provider](../aks/dapr.md#register-the-kubernetesconfiguration-service-provider).
 * Or an Arc Kubernetes cluster is up and running. Follow instructions in [connect existing Kubernetes cluster to Azure Arc](../azure-arc/kubernetes/quickstart-connect-cluster.md).
-  * If the cluster is an Azure RedHat OpenShift Service (ARO) cluster or OpenShift Container Platform (OCP) cluster, you must satisfy other prerequisite steps as documented in the [Reference for configuring Kuberenetes cluster](./reference-kubernetes.md#prerequisites-for-aro-or-ocp-clusters) article.
+  * If the cluster is an Azure RedHat OpenShift Service (ARO) cluster or OpenShift Container Platform (OCP) cluster, you must satisfy other prerequisite steps as documented in the [Reference for configuring Kubernetes cluster](./reference-kubernetes.md#prerequisites-for-aro-or-ocp-clusters) article.
 * The Kubernetes cluster must have minimum of 4 vCPU cores and 8-GB memory.
 * Cluster running behind an outbound proxy server or firewall needs extra [network configurations](./how-to-access-azureml-behind-firewall.md#kubernetes-compute)
 * Install or upgrade Azure CLI to version 2.24.0 or higher.
@@ -56,10 +56,10 @@ You can use AzureML CLI command `k8s-extension create` to deploy AzureML extensi
    | `allowInsecureConnections` |`True` or `False`, default `False`. **Can** be set to `True` to use inference HTTP endpoints for development or test purposes. |N/A| Optional |  Optional |
    | `inferenceRouterServiceType` |`loadBalancer`, `nodePort` or `clusterIP`.  **Required** if `enableInference=True`. | N/A| **&check;** |   **&check;** |
    | `internalLoadBalancerProvider` | This config is only applicable for Azure Kubernetes Service(AKS) cluster now. Set to `azure` to allow the inference router using internal load balancer.  | N/A| Optional |  Optional |
-   |`sslSecret`| The name of Kubernetes secret in `azureml` namespace to store `cert.pem` (PEM-encoded TLS/SSL cert) and `key.pem` (PEM-encoded TLS/SSL key), required for inference  HTTPS endpoint support, when  ``allowInsecureConnections`` is set to False. You can find a sample YAML definition of sslSecret [here](./reference-kubernetes.md#sample-yaml-definition-of-kubernetes-secret-for-tlsssl). Use this config or combination of `sslCertPemFile` and `sslKeyPemFile` protected config settings. |N/A| Optional |  Optional |
-   |`sslCname` |An TLS/SSL CName is used by inference HTTPS endpoint. **Required** if `allowInsecureConnections=False`  |  N/A | Optional | Optional|
+   |`sslSecret`| The name of the Kubernetes secret in the `azureml` namespace. This config is used to store `cert.pem` (PEM-encoded TLS/SSL cert) and `key.pem` (PEM-encoded TLS/SSL key), which are required for inference HTTPS endpoint support when ``allowInsecureConnections`` is set to `False`. For a sample YAML definition of `sslSecret`, see [Configure sslSecret](./how-to-secure-kubernetes-online-endpoint.md#configure-sslsecret). Use this config or a combination of `sslCertPemFile` and `sslKeyPemFile` protected config settings. |N/A| Optional |  Optional |
+   |`sslCname` |An TLS/SSL CNAME is used by inference HTTPS endpoint. **Required** if `allowInsecureConnections=False`  |  N/A | Optional | Optional|
    | `inferenceRouterHA` |`True` or `False`, default `True`. By default, AzureML extension will deploy three inference router replicas for high availability, which requires at least three worker nodes in a cluster. Set to `False` if your cluster has fewer than three worker nodes, in this case only one inference router service is deployed. | N/A| Optional |  Optional |
-   |`nodeSelector` | By default, the deployed kubernetes resources are randomly deployed to one or more nodes of the cluster, and daemonset resources are deployed to ALL nodes. If you want to restrict the extension deployment to specific nodes with label `key1=value1` and `key2=value2`, use `nodeSelector.key1=value1`, `nodeSelector.key2=value2` correspondingly. | Optional| Optional |  Optional |
+   |`nodeSelector` | By default, the deployed kubernetes resources are randomly deployed to one or more nodes of the cluster, and DaemonSet resources are deployed to ALL nodes. If you want to restrict the extension deployment to specific nodes with label `key1=value1` and `key2=value2`, use `nodeSelector.key1=value1`, `nodeSelector.key2=value2` correspondingly. | Optional| Optional |  Optional |
    |`installNvidiaDevicePlugin`  | `True` or `False`, default `False`. [NVIDIA Device Plugin](https://github.com/NVIDIA/k8s-device-plugin#nvidia-device-plugin-for-kubernetes) is required for ML workloads on NVIDIA GPU hardware. By default, AzureML extension deployment won't install NVIDIA Device Plugin regardless Kubernetes cluster has GPU hardware or not. User can specify this setting to `True`, to install it, but make sure to fulfill [Prerequisites](https://github.com/NVIDIA/k8s-device-plugin#prerequisites). | Optional |Optional |Optional |
    |`installPromOp`|`True` or `False`, default `True`. AzureML extension needs prometheus operator to manage prometheus. Set to `False` to reuse the existing prometheus operator. For more information about reusing the existing  prometheus operator, refer to [reusing the prometheus operator](./how-to-troubleshoot-kubernetes-extension.md#prometheus-operator)| Optional| Optional |  Optional |
    |`installVolcano`| `True` or `False`, default `True`. AzureML extension needs volcano scheduler to schedule the job. Set to `False` to reuse existing volcano scheduler. For more information about reusing the existing volcano scheduler, refer to [reusing volcano scheduler](./how-to-troubleshoot-kubernetes-extension.md#volcano-scheduler)   | Optional| N/A |  Optional |
@@ -80,8 +80,8 @@ If you plan to deploy AzureML extension for real-time inference workload and wan
 
   * `azureml-fe` router service is required for real-time inference support and you need to specify `inferenceRouterServiceType` config setting for `azureml-fe`. `azureml-fe` can be deployed with one of following `inferenceRouterServiceType`:
       * Type `LoadBalancer`. Exposes `azureml-fe` externally using a cloud provider's load balancer. To specify this value, ensure that your cluster supports load balancer provisioning. Note most on-premises Kubernetes clusters might not support external load balancer.
-      * Type `NodePort`. Exposes `azureml-fe` on each Node's IP at a static port. You'll be able to contact `azureml-fe`, from outside of cluster, by requesting `<NodeIP>:<NodePort>`. Using `NodePort` also allows you to set up your own load balancing solution and TLS/SSL termination for `azureml-fe`.
-      * Type `ClusterIP`. Exposes `azureml-fe` on a cluster-internal IP, and it makes `azureml-fe` only reachable from within the cluster. For `azureml-fe` to serve inference requests coming outside of cluster, it requires you to set up your own load balancing solution and TLS/SSL termination for `azureml-fe`. 
+      * Type `NodePort`. Exposes `azureml-fe` on each Node's IP at a static port. You'll be able to contact `azureml-fe`, from outside of cluster, by requesting `<NodeIP>:<NodePort>`. Using `NodePort` also allows you to setup your own load balancing solution and TLS/SSL termination for `azureml-fe`.
+      * Type `ClusterIP`. Exposes `azureml-fe` on a cluster-internal IP, and it makes `azureml-fe` only reachable from within the cluster. For `azureml-fe` to serve inference requests coming outside of cluster, it requires you to setup your own load balancing solution and TLS/SSL termination for `azureml-fe`. 
    * To ensure high availability of `azureml-fe` routing service, AzureML extension deployment by default creates three replicas of `azureml-fe` for clusters having three nodes or more. If your cluster has **less than 3 nodes**, set `inferenceLoadbalancerHA=False`.
    * You also want to consider using **HTTPS** to restrict access to model endpoints and secure the data that clients submit. For this purpose, you would need to specify either `sslSecret` config setting or combination of `sslKeyPemFile` and `sslCertPemFile` config-protected settings. 
    * By default, AzureML extension deployment expects config settings for **HTTPS** support. For development or testing purposes, **HTTP** support is conveniently provided through config setting `allowInsecureConnections=True`.
@@ -167,7 +167,7 @@ Upon AzureML extension deployment completes, you can use `kubectl get deployment
 
    |Resource name  |Resource type |Training |Inference |Training and Inference| Description | Communication with cloud|
    |--|--|--|--|--|--|--|
-   |relayserver|Kubernetes deployment|**&check;**|**&check;**|**&check;**|Relayserver is only created for Arc Kubernetes cluster, and **not** in AKS cluster. Relayserver works with Azure Relay to communicate with the cloud services.|Receive the request of job creation, model deployment from cloud service; sync the job status with cloud service.|
+   |relayserver|Kubernetes deployment|**&check;**|**&check;**|**&check;**|Relay server is only created for Arc Kubernetes cluster, and **not** in AKS cluster. Relay server works with Azure Relay to communicate with the cloud services.|Receive the request of job creation, model deployment from cloud service; sync the job status with cloud service.|
    |gateway|Kubernetes deployment|**&check;**|**&check;**|**&check;**|The gateway is used to communicate and send data back and forth.|Send nodes and cluster resource information to cloud services.|
    |aml-operator|Kubernetes deployment|**&check;**|N/A|**&check;**|Manage the lifecycle of training jobs.| Token exchange with the cloud token service for authentication and authorization of Azure Container Registry.|
    |metrics-controller-manager|Kubernetes deployment|**&check;**|**&check;**|**&check;**|Manage the configuration for Prometheus|N/A|
 
@@ -98,14 +98,19 @@ concurrentRequests = targetRps * reqTime / targetUtilization
 replicas = ceil(concurrentRequests / maxReqPerContainer)
 ```
 
+### Performance of azureml-fe
+
+The `azureml-fe` can reach 5K requests per second (QPS) with good latency, having an overhead not exceeding 3ms on average and 15ms at 99% percentile.
+
+
 >[!Note]
 >
->`azureml-fe` can reach to 5K requests per second (QPS) with good latency, with no more than 3ms overhead in average, and 15ms at 99% percentile.
->
->If you have RPS requirements higher than 10K, consider following options:
+>If you have RPS requirements higher than 10K, consider the following options:
 >
->* Increase resource requests/limits for `azureml-fe` pods, by default it has 2 vCPU and 1.2G memory resource limit.
->* Increase number of instances for `azureml-fe`, by default AzureML creates 3 `azureml-fe` instances per cluster.
+>* Increase resource requests/limits for `azureml-fe` pods; by default it has 2 vCPU and 1.2G memory resource limit.
+>* Increase the number of instances for `azureml-fe`. By default, AzureML creates 3 or 1 `azureml-fe` instances per cluster.
+>   * This instance count depends on your configuration of `inferenceRouterHA` of the [AzureML entension](how-to-deploy-kubernetes-extension.md#review-azureml-extension-configuration-settings).
+>   * The increased instance count cannot be persisted, since it will be overwritten with your configured value once the extension is upgraded.
 >* Reach out to Microsoft experts for help.
 
 ## Understand connectivity requirements for AKS inferencing cluster
 
@@ -7,7 +7,7 @@ ms.author: bozhlin
 ms.reviewer: ssalgado
 ms.service: machine-learning
 ms.subservice: core
-ms.date: 08/31/2022
+ms.date: 11/09/2022
 ms.topic: how-to
 ms.custom: build-spring-2022, cliv2, sdkv2, event-tier1-build-2022
 ---
@@ -26,17 +26,19 @@ In short, a `nodeSelector` lets you specify which node a pod should run on.  The
 
 ## Default instance type
 
-By default, a `defaultinstancetype` with following definition is created when you attach Kuberenetes cluster to AzureML workspace:
+By default, a `defaultinstancetype` with the following definition is created when you attach a Kubernetes cluster to an AzureML workspace:
 - No `nodeSelector` is applied, meaning the pod can get scheduled on any node.
-- The workload's pods are assigned default resources with 0.6 cpu cores, 1536Mi memory and 0 GPU:
+- The workload's pods are assigned default resources with 0.1 cpu cores, 500Mi memory and 0 GPU for request.
+- Resource use by the workload's pods is limited to 2 cpu cores and 8 GB memory:
+
 ```yaml
 resources:
   requests:
-    cpu: "0.6"
-    memory: "1536Mi"
+    cpu: "100m"
+    memory: "500MB"
   limits:
-    cpu: "0.6"
-    memory: "1536Mi"
+    cpu: "2"
+    memory: "8Gi"
     nvidia.com/gpu: null
 ```
 
@@ -77,13 +79,18 @@ The following steps will create an instance type with the labeled behavior:
 - Pods will be assigned resource requests of `700m` CPU and `1500Mi` memory.
 - Pods will be assigned resource limits of `1` CPU, `2Gi` memory and `1` NVIDIA GPU.
 
-> [!NOTE]
-> - NVIDIA GPU resources are only specified in the `limits` section as integer values.  For more information,
-  see the Kubernetes [documentation](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#using-device-plugins).
-> - CPU and memory resources are string values.
-> - CPU can be specified in millicores, for example `100m`, or in full numbers, for example `"1"`
-  is equivalent to `1000m`.
-> - Memory can be specified as a full number + suffix, for example `1024Mi` for 1024 MiB.
+Creation of custom instance types must meet the following parameters and definition rules, otherwise the instance type creation will fail:
+
+| Parameter | Required | Description |
+| --- | --- | --- |
+| name | required | String values, which must be unique in cluster.|
+| CPU request | required | String values, which cannot be 0 or empty. <br>CPU can be specified in millicores; for example, `100m`. Can also be specified as full numbers; for example, `"1"` is equivalent to `1000m`.|
+| Memory request | required | String values, which cannot be 0 or empty. <br>Memory can be specified as a full number + suffix; for example, `1024Mi` for 1024 MiB.|
+| CPU limit | required | String values, which cannot be 0 or empty. <br>CPU can be specified in millicores; for example, `100m`. Can also be specified as full numbers; for example, `"1"` is equivalent to `1000m`.|
+| Memory limit | required | String values, which cannot be 0 or empty. <br>Memory can be specified as a full number + suffix; for example, `1024Mi` for 1024 MiB.|
+| GPU | optional | Integer values, which can only be specified in the `limits` section. <br>For more information, see the Kubernetes [documentation](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#using-device-plugins). |
+| nodeSelector | optional | Map of string keys and values. |
+
 
 It's also possible to create multiple instance types at once:
 
@@ -128,23 +135,45 @@ If a training or inference workload is submitted without an instance type, it us
 
 ### Select instance type to submit training job
 
+#### [Azure CLI](#tab/select-instancetype-to-trainingjob-with-cli)
+
 To select an instance type for a training job using CLI (V2), specify its name as part of the
 `resources` properties section in job YAML.  For example:
+
 ```yaml
 command: python -c "print('Hello world!')"
 environment:
   image: library/python:latest
-compute: azureml:<compute_target_name>
+compute: azureml:<Kubernetes-compute_target_name>
 resources:
   instance_type: <instance_type_name>
 ```
 
-In the above example, replace `<compute_target_name>` with the name of your Kubernetes compute
-target and `<instance_type_name>` with the name of the instance type you wish to select. If there's no `instance_type` property specified, the system will use `defaultinstancetype` to submit job.
+#### [Python SDK](#tab/select-instancetype-to-trainingjob-with-sdk)
+
+To select an instance type for a training job using SDK (V2), specify its name for `instance_type` property in `command` class.  For example:
+
+```python
+from azure.ai.ml import command
+
+# define the command
+command_job = command(
+    command="python -c "print('Hello world!')"",
+    environment="AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu@latest",
+    compute="<Kubernetes-compute_target_name>",
+    instance_type="<instance_type_name>"
+)
+```
+---
+
+In the above example, replace `<Kubernetes-compute_target_name>` with the name of your Kubernetes compute
+target and replace `<instance_type_name>` with the name of the instance type you wish to select. If there's no `instance_type` property specified, the system will use `defaultinstancetype` to submit the job.
 
 ### Select instance type to deploy model
 
-To select an instance type for a model deployment using CLI (V2), specify its name for `instance_type` property in deployment YAML.  For example:
+#### [Azure CLI](#tab/select-instancetype-to-modeldeployment-with-cli)
+
+To select an instance type for a model deployment using CLI (V2), specify its name for the `instance_type` property in the deployment YAML.  For example:
 
 ```yaml
 name: blue
@@ -161,9 +190,38 @@ environment:
   image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1
 ```
 
-In the above example, replace `<instance_type_name>` with the name of the instance type you wish to select. If there's no `instance_type` property specified, the system will use `defaultinstancetype` to deploy model.
+#### [Python SDK](#tab/select-instancetype-to-modeldeployment-with-sdk)
+
+To select an instance type for a model deployment using SDK (V2), specify its name for the `instance_type` property in the `KubernetesOnlineDeployment` class.  For example:
+
+```python
+from azure.ai.ml import KubernetesOnlineDeployment,Model,Environment,CodeConfiguration
+
+model = Model(path="./model/sklearn_mnist_model.pkl")
+env = Environment(
+    conda_file="./model/conda.yml",
+    image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1",
+)
+
+# define the deployment
+blue_deployment = KubernetesOnlineDeployment(
+    name="blue",
+    endpoint_name="<endpoint name>",
+    model=model,
+    environment=env,
+    code_configuration=CodeConfiguration(
+        code="./script/", scoring_script="score.py"
+    ),
+    instance_count=1,
+    instance_type="<instance type name>",
+)
+```
+---
+
+In the above example, replace `<instance_type_name>` with the name of the instance type you wish to select. If there's no `instance_type` property specified, the system will use `defaultinstancetype` to deploy the model.
+
 
 ## Next steps
 
 - [AzureML inference router and connectivity requirements](./how-to-kubernetes-inference-routing-azureml-fe.md)
-- [Secure AKS inferencing environment](./how-to-secure-kubernetes-inferencing-environment.md)
+- [Secure AKS inferencing environment](./how-to-secure-kubernetes-inferencing-environment.md)