You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-deploy-azure-kubernetes-service.md
+6-1Lines changed: 6 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,12 +83,14 @@ In Azure Machine Learning, "deployment" is used in the more general sense of mak
83
83
The front-end component (azureml-fe) that routes incoming inference requests to deployed services automatically scales as needed. Scaling of azureml-fe is based on the AKS cluster purpose and size (number of nodes). The cluster purpose and nodes are configured when you [create or attach an AKS cluster](how-to-create-attach-kubernetes.md). There is one azureml-fe service per cluster, which may be running on multiple pods.
84
84
85
85
> [!IMPORTANT]
86
-
> When using a cluster configured as __dev-test__, the self-scaler is **disabled**.
86
+
> When using a cluster configured as __dev-test__, the self-scaler is **disabled**. Even for FastProd/DenseProd clusters, Self-Scaler is only enabled when telemetry shows that it's needed.
87
87
88
88
Azureml-fe scales both up (vertically) to use more cores, and out (horizontally) to use more pods. When making the decision to scale up, the time that it takes to route incoming inference requests is used. If this time exceeds the threshold, a scale-up occurs. If the time to route incoming requests continues to exceed the threshold, a scale-out occurs.
89
89
90
90
When scaling down and in, CPU usage is used. If the CPU usage threshold is met, the front end will first be scaled down. If the CPU usage drops to the scale-in threshold, a scale-in operation happens. Scaling up and out will only occur if there are enough cluster resources available.
91
91
92
+
When scale-up or scale-down, azureml-fe pods will be restarted to apply the cpu/memory changes. Inferencing requests are not affected by the restarts.
93
+
92
94
<aid="connectivity"></a>
93
95
94
96
## Understand connectivity requirements for AKS inferencing cluster
@@ -105,6 +107,8 @@ The following diagram shows the connectivity requirements for AKS inferencing. B
105
107
106
108
For general AKS connectivity requirements, see [Control egress traffic for cluster nodes in Azure Kubernetes Service](../aks/limit-egress-traffic.md).
107
109
110
+
For accessing Azure ML services behind a firewall, see [How to access azureml behind firewall](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/machine-learning/how-to-access-azureml-behind-firewall.md).
111
+
108
112
### Overall DNS resolution requirements
109
113
110
114
DNS resolution within an existing VNet is under your control. For example, a firewall or custom DNS server. The following hosts must be reachable:
@@ -117,6 +121,7 @@ DNS resolution within an existing VNet is under your control. For example, a fir
|`api.azureml.ms`| Azure Active Directory (AAD) authentication |
124
+
|`ingest-vienna<region>.kusto.windows.net`| Kusto endpoint for uploading telemetry |
120
125
|`<leaf-domain-label + auto-generated suffix>.<region>.cloudapp.azure.com`| Endpoint domain name, if you autogenerated by Azure Machine Learning. If you used a custom domain name, you do not need this entry. |
121
126
122
127
### Connectivity requirements in chronological order: from cluster creation to model deployment
0 commit comments