You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-attach-kubernetes-anywhere.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ Azure Machine Learning Kubernetes compute supports two kinds of Kubernetes clust
30
30
31
31
| Compute | Location | Description |
32
32
| --- | --- | --- |
33
-
|**[AKS cluster](https://azure.microsoft.com/services/kubernetes-service/)**| Within Azure | With your self-managed AKS cluster in Azure, you can gain security and controls to meet compliance requirement and flexibility to manage your team's machine learning workload. |
33
+
|**[AKS cluster](https://azure.microsoft.com/en-us/products/kubernetes-service/)**| Within Azure | With your self-managed AKS cluster in Azure, you can gain security and controls to meet compliance requirement and flexibility to manage your team's machine learning workload. |
34
34
|**[Arc Kubernetes cluster](/azure/azure-arc/kubernetes/overview)**| Outside Azure | With Arc Kubernetes cluster, you can train or deploy models in any on-premises or multicloud infrastructure, or the edge. |
35
35
36
36
With a simple cluster extension deployment on AKS or Arc Kubernetes cluster, Kubernetes cluster is seamlessly supported in Machine Learning to run training or inference workload. It's easy to enable and use an existing Kubernetes cluster for Machine Learning workload with the following process:
@@ -43,7 +43,7 @@ With a simple cluster extension deployment on AKS or Arc Kubernetes cluster, Kub
43
43
44
44
- Step 4: Use the Kubernetes compute target from the CLI v2, SDK v2, or the Azure Machine Learning studio UI.
45
45
46
-
Here are the roles and responsibilities in this process:
46
+
Here are the primary responsibilities in this process:
47
47
48
48
- The **IT-operation team** is responsible for Steps 1, 2, and 3. This team prepares an AKS or Arc Kubernetes cluster, deploys the Machine Learning cluster extension, and attaches the Kubernetes cluster to the Machine Learning workspace. In addition to these essential compute setup steps, the IT-operation team also uses familiar tools, such as the Azure CLI or kubectl, to complete the following tasks for the Data-science team:
49
49
@@ -61,13 +61,13 @@ With Arc Kubernetes cluster, you can build, train, and deploy models in any on-p
61
61
62
62
| Usage pattern | Location of data | Goals and requirements | Scenario configuration |
63
63
| --- | --- | --- | --- |
64
-
| Train model in cloud, deploy model on-premises | Cloud | - Use cloud compute to support elastic compute needs or special hardware such as a GPU. <br> - Model deployment must be on-premises for security, compliance, or latency requirements. | - Azure-managed compute in cloud <br> - Customer-managed Kubernetes on-premises <br> - Fully automated machine learning operations in hybrid mode, including training and model deployment steps that transition seamlessly between cloud and on-premises <br> - Repeatable, all assets properly tracked, model retrained as needed, deployment updated automatically after retraining |
65
-
| Train model on-premises and cloud, deploy to both cloud and on-premises | Cloud | - Combine on-premises investments with cloud scalability. <br> - Bring cloud and on-premises compute under single pane of glass. <br> - Access single source of truth for data in cloud and replicate on-premises (lazily on usage or proactively). <br> - Enable cloud compute primary usage when on-premises resources aren't available (in use or in maintenance) or don't meet specific hardware requirements (GPU). | - Azure-managed compute in cloud. br> - Customer-managed Kubernetes on-premises <br> - Fully automated machine learning operations in hybrid mode, including training and model deployment steps that transition seamlessly between cloud and on-premises <br> - Repeatable, all assets properly tracked, model retrained as needed, deployment updated automatically after retraining |
66
-
| Train model on-premises, deploy model in cloud | On-premises | - Store data on-premises to meet data-residency requirements. <br> - Deploy model in the cloud for global-service access or to enable compute elasticity for scale and throughput. | - Azure-managed compute in cloud <br> - Customer-managed Kubernetes on-premises <br> - Fully automated machine learning operations in hybrid mode, including training and model deployment steps that transition seamlessly between cloud and on-premises <br> - Repeatable, all assets properly tracked, model retrained as needed, deployment updated automatically after retraining |
67
-
| Bring your own AKS in Azure | Cloud | - Gain more security and controls. <br> - Establish all private IP machine learning to prevent data exfiltration. | - AKS cluster behind an Azure virtual network <br> - Private endpoints in the same virtual network for Azure Machine Learning workspace and associated resources <br> Fully automated machine learning operations |
68
-
| Full machine learning lifecycle on-premises | On-premises |Secure sensitive data or proprietary IP, such as machine learning models, code, and scripts. | - Outbound proxy server connection on-premises <br> - Azure ExpressRoute and Azure Arc private link to Azure resources <br> - Customer-managed Kubernetes on-premises <br> - Fully automated machine learning operations |
64
+
| Train model in cloud, deploy model on-premises | Cloud | - _Use cloud compute to support elastic compute needs or special hardware such as a GPU._ <br> - _Model deployment must be on-premises for security, compliance, or latency requirements._| - Azure-managed compute in cloud <br> - Customer-managed Kubernetes on-premises <br> - Fully automated machine learning operations in hybrid mode, including training and model deployment steps that transition seamlessly between cloud and on-premises <br> - Repeatable, all assets properly tracked, model retrained as needed, deployment updated automatically after retraining |
65
+
| Train model on-premises and cloud, deploy to both cloud and on-premises | Cloud | - _Combine on-premises investments with cloud scalability._ <br> - _Bring cloud and on-premises compute under single pane of glass._ <br> - _Access single source of truth for data in cloud and replicate on-premises (lazily on usage or proactively)._ <br> - _Enable cloud compute primary usage when on-premises resources aren't available (in use or in maintenance) or don't meet specific hardware requirements (GPU)._| - Azure-managed compute in cloud. br> - Customer-managed Kubernetes on-premises <br> - Fully automated machine learning operations in hybrid mode, including training and model deployment steps that transition seamlessly between cloud and on-premises <br> - Repeatable, all assets properly tracked, model retrained as needed, deployment updated automatically after retraining |
66
+
| Train model on-premises, deploy model in cloud | On-premises | - _Store data on-premises to meet data-residency requirements._ <br> - _Deploy model in the cloud for global-service access or to enable compute elasticity for scale and throughput._| - Azure-managed compute in cloud <br> - Customer-managed Kubernetes on-premises <br> - Fully automated machine learning operations in hybrid mode, including training and model deployment steps that transition seamlessly between cloud and on-premises <br> - Repeatable, all assets properly tracked, model retrained as needed, deployment updated automatically after retraining |
67
+
| Bring your own AKS in Azure | Cloud | - _Gain more security and controls._ <br> - _Establish all private IP machine learning to prevent data exfiltration._| - AKS cluster behind an Azure virtual network <br> - Private endpoints in the same virtual network for Azure Machine Learning workspace and associated resources <br> Fully automated machine learning operations |
68
+
| Full machine learning lifecycle on-premises | On-premises |_Secure sensitive data or proprietary IP, such as machine learning models, code, and scripts._| - Outbound proxy server connection on-premises <br> - Azure ExpressRoute and Azure Arc private link to Azure resources <br> - Customer-managed Kubernetes on-premises <br> - Fully automated machine learning operations |
69
69
70
-
### Limitations for a Kubernetes compute
70
+
### Limitations for Kubernetes compute target
71
71
72
72
A `KubernetesCompute` target in Azure Machine Learning workloads (training and model inference) has the following limitations:
73
73
@@ -103,20 +103,20 @@ In consideration of these differences, and the overall Machine Learning evolutio
103
103
104
104
For more information, explore the following articles:
105
105
106
-
-[Kubernetes version and region availability](./reference-kubernetes.md#supported-kubernetes-version-and-region)
107
-
-[Custom data storage](./reference-kubernetes.md#azure-machine-learning-jobs-connect-with-custom-data-storage)
106
+
-[Review supported Kubernetes versions and regions](./reference-kubernetes.md#supported-kubernetes-version-and-region)
107
+
-[Connect Machine Learning jobs with custom data storage](./reference-kubernetes.md#azure-machine-learning-jobs-connect-with-custom-data-storage)
108
108
109
109
## Machine learning examples
110
110
111
-
Machine learning examples are available in the [Azure Machine Learning (https://github.com/Azure/azureml-examples.git)](https://github.com/Azure/azureml-examples) repository on GitHub. In any example, replace the compute target name with your Kubernetes compute target, and run the sample.
111
+
Machine learning examples are available in the [Azure Machine Learning (azureml-examples)](https://github.com/Azure/azureml-examples) repository on GitHub. In any example, replace the compute target name with your Kubernetes compute target, and run the sample.
112
112
113
113
Here are several options:
114
114
115
-
- Training job samples with the CLI v2: [https://github.com/Azure/azureml-examples/tree/main/cli/jobs](https://github.com/Azure/azureml-examples/tree/main/cli/jobs)
116
-
-Model deployment with online endpoint samples and the CLI v2: [https://github.com/Azure/azureml-examples/tree/main/cli/endpoints/online/kubernetes](https://github.com/Azure/azureml-examples/tree/main/cli/endpoints/online/kubernetes)
117
-
-Batch endpoint samples with the CLI v2: [https://github.com/Azure/azureml-examples/tree/main/cli/endpoints/batch](https://github.com/Azure/azureml-examples/tree/main/cli/endpoints/batch)
118
-
-Training job samples with the SDK v2: [https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs)
119
-
-Model deployment with online endpoint samples and the SDK v2: [https://github.com/Azure/azureml-examples/tree/main/sdk/python/endpoints/online/kubernetes](https://github.com/Azure/azureml-examples/tree/main/sdk/python/endpoints/online/kubernetes)
115
+
-[Training job samples with the CLI v2](https://github.com/Azure/azureml-examples/tree/main/cli/jobs)
116
+
-[Training job samples with the SDK v2](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs)
117
+
-[Model deployment with online endpoint samples and the CLI v2](https://github.com/Azure/azureml-examples/tree/main/cli/endpoints/online/kubernetes)
118
+
-[Model deployment with online endpoint samples and the SDK v2](https://github.com/Azure/azureml-examples/tree/main/sdk/python/endpoints/online/kubernetes)
119
+
-[Batch endpoint samples with the CLI v2](https://github.com/Azure/azureml-examples/tree/main/cli/endpoints/batch)
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-troubleshoot-kubernetes-extension.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -90,8 +90,8 @@ This table shows how to troubleshoot the error codes returned by the HealthCheck
90
90
|E40007 | INVALID_SSL_SETTING | The SSL key or certificate isn't valid. The CNAME should be compatible with the certificate. |
91
91
|E45002 | PROMETHEUS_CONFLICT | The Prometheus Operator installed is conflict with your existing Prometheus Operator. For more information, see [Prometheus operator](#prometheus-operator) |
92
92
|E45003 | BAD_NETWORK_CONNECTIVITY | You need to meet [network-requirements](./how-to-access-azureml-behind-firewall.md#scenario-use-kubernetes-compute).|
93
-
|E45004 | AZUREML_FE_ROLE_CONFLICT |Azure Machine Learning extension isn't supported in the [legacy AKS](./how-to-attach-kubernetes-anywhere.md#kubernetescompute-and-legacy-akscompute). To install Azure Machine Learning extension, you need to [delete the legacy azureml-fe components](v1/how-to-create-attach-kubernetes.md#delete-azureml-fe-related-resources).|
94
-
|E45005 | AZUREML_FE_DEPLOYMENT_CONFLICT | Azure Machine Learning extension isn't supported in the [legacy AKS](./how-to-attach-kubernetes-anywhere.md#kubernetescompute-and-legacy-akscompute). To install Azure Machine Learning extension, you need to run the command below this form to delete the legacy azureml-fe components, more detail you can referto [here](v1/how-to-create-attach-kubernetes.md#update-the-cluster).|
93
+
|E45004 | AZUREML_FE_ROLE_CONFLICT |Azure Machine Learning extension isn't supported in the [legacy AKS](./how-to-attach-kubernetes-anywhere.md#comparison-of-kubernetescompute-and-legacy-akscompute-targets). To install Azure Machine Learning extension, you need to [delete the legacy azureml-fe components](v1/how-to-create-attach-kubernetes.md#delete-azureml-fe-related-resources).|
94
+
|E45005 | AZUREML_FE_DEPLOYMENT_CONFLICT | Azure Machine Learning extension isn't supported in the [legacy AKS](./how-to-attach-kubernetes-anywhere.md#comparison-of-kubernetescompute-and-legacy-akscompute-targets). To install Azure Machine Learning extension, you need to run the command below this form to delete the legacy azureml-fe components, more detail you can referto [here](v1/how-to-create-attach-kubernetes.md#update-the-cluster).|
95
95
96
96
Commands to delete the legacy azureml-fe components in the AKS cluster:
0 commit comments