Skip to content

Commit 2705895

Browse files
committed
Added inferencing KIs
1 parent eab501f commit 2705895

File tree

4 files changed

+92
-2
lines changed

4 files changed

+92
-2
lines changed

articles/machine-learning/known-issues/azure-machine-learning-known-issues.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,9 @@ Select the **Title** to view more information about that specific known issue.
2626
|Compute | [Provisioning error when creating a compute instance with A10 SKU](compute-a10-sku-not-supported.md) | August 14, 2023 |
2727
|Compute | [Idleshutdown property in Bicep template causes error](compute-idleshutdown-bicep.md) | August 14, 2023 |
2828
|Compute | [Slowness in compute instance terminal from a mounted path](compute-slowness-terminal-mounted-path.md)| August 14, 2023|
29-
|Compute| [Creating compute instance after a workspace move results in an Etag conflict error.](workspace-move-compute-instance-same-name.md)| August 14, 2023 |
29+
|Compute| [Creating compute instance after a workspace move results in an Etag conflict error.](workspace-move-compute-instance-same-name.md)| August 14, 2023 |
30+
|Inferencing| [Invalid certificate error during deployment with an AKS cluster](inferencing-invalid-certificate.md)| September, 26, 2023 |
31+
|Inferencing| [SExisting Kubernetes compute cannot be update with `az ml compute attach` command](inferencing-updating-kubernetes-compute-appears-to-succeed.md) | September, 26, 2023 |
3032

3133

3234
## Next steps
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
---
2+
title: Known issue - Inferencing | Invalid certificate error during deployment
3+
titleSuffix: Azure Machine Learning
4+
description: During machine learning deployments with an AKS cluster, you may receive an invalid certificate error.
5+
author: s-polly
6+
ms.author: scottpolly
7+
ms.topic: troubleshooting
8+
ms.service: machine-learning
9+
ms.subservice: core
10+
ms.date: 08/04/2023
11+
ms.custom: known-issue
12+
---
13+
14+
# Known issue - Invalid certificate error during deployment with an AKS cluster
15+
16+
During machine learning deployments using an AKS cluster, you may receive an invalid certificate error, such as `{"code":"BadRequest","statusCode":400,"message":"The request is invalid.","details":[{"code":"KubernetesUnaccessible","message":"Kubernetes error: AuthenticationException. Reason: InvalidCertificate"}],`
17+
18+
19+
20+
[!INCLUDE [dev v2](../includes/machine-learning-dev-v2.md)]
21+
22+
**Status:** Open
23+
24+
**Problem area:** Inferencing
25+
26+
## Symptoms
27+
28+
Azure Machine Learning deployments with an AKS cluster fail with the error:
29+
30+
`{"code":"BadRequest","statusCode":400,"message":"The request is invalid.","details":[{"code":"KubernetesUnaccessible","message":"Kubernetes error: AuthenticationException. Reason: InvalidCertificate"}],`
31+
and the following error is shown in the MMS logs:
32+
33+
`K8sReadNamespacedServiceAsync failed with AuthenticationException: System.Security.Authentication.AuthenticationException: The remote certificate was rejected by the provided RemoteCertificateValidationCallback. at System.Net.Security.SslStream.SendAuthResetSignal(ProtocolToken message, ExceptionDispatchInfo exception) at System.Net.Security.SslStream.CompleteHandshake(SslAuthenticationOptions sslAuthenticationOptions) at System.Net.Security.SslStream.ForceAuthenticationAsync[TIOAdapter](tioadapteradapterbooleanreceivefirstbytereauthenticationdatabooleanisapm) at System.Net.Http.ConnectHelper.EstablishSslConnectionAsync(SslClientAuthenticationOptions sslOptions, HttpRequestMessage request, Boolean async, Stream stream, CancellationToken cancellationToken)`
34+
35+
## Cause
36+
37+
This error occurs because the certificate for AKS clusters created before January 2021 does not include the `Subject Key Identifier` value, which prevents the required `Authority Key Identifier` value from being generated.
38+
39+
## Solutions and workarounds
40+
41+
There are two options to resolve this issue:
42+
- Rotate the AKS certificate for the cluster. See [Certificate Rotation in Azure Kubernetes Service (AKS) - Azure Kubernetes Service](../../aks/certificate-rotation.md) for more information.
43+
- Wait for 5 hours for the certificate to be automatically updated, and the issue should be resolved.
44+
45+
## Next steps
46+
47+
- [About known issues](azureml-known-issues.md)
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
title: Known issue - Inferencing | Existing Kubernetes compute cannot be updated
3+
titleSuffix: Azure Machine Learning
4+
description: Updating a Kubernetes attached compute instance using the az ml attach command appears to succeed but does not.
5+
author: s-polly
6+
ms.author: scottpolly
7+
ms.topic: troubleshooting
8+
ms.service: machine-learning
9+
ms.subservice: core
10+
ms.date: 08/04/2023
11+
ms.custom: known-issue
12+
---
13+
14+
# Known issue - Existing Kubernetes compute cannot be update with `az ml compute attach` command
15+
16+
Updating a Kubernetes attached compute instance using the `az ml attach` command appears to succeed but does not.
17+
18+
19+
[!INCLUDE [dev v2](../includes/machine-learning-dev-v2.md)]
20+
21+
**Status:** Open
22+
23+
**Problem area:** Inferencing
24+
25+
## Symptoms
26+
27+
When running the command `az ml compute attach --resource-group <rgname> --workspace-name <ws name> --type Kubernetes --name <existing attached compute name(aaaaa)> --resource-id "<resource URI>" --namespace <deployment name>`, the Workspace UI will display a success message indicating that the compute update was successful, however the compute will not have changed.
28+
29+
## Cause
30+
31+
The `az ml compute attach` command is not currently supported for use with Kubernetes compute. At this time, the CLI v2 and SDK v2 do not allow updating any configuration of an existing Kubernetes compute.
32+
33+
34+
## Next steps
35+
36+
- [About known issues](azureml-known-issues.md)

articles/machine-learning/toc.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -702,7 +702,6 @@
702702
href: how-to-secure-rag-workflows.md
703703
- name: RAG cloud to local
704704
href: how-to-retrieval-augmented-generation-cloud-to-local.md
705-
706705
- name: Responsibly develop & monitor
707706
items:
708707
- name: Responsible AI overview
@@ -1384,6 +1383,12 @@
13841383
href: ./known-issues/workspace-move-compute-instance-same-name.md
13851384
- name: Application Sharing Policy isn't supported
13861385
href: ./known-issues/application-sharing-policy-not-supported.md
1386+
- name: Inferencing known issues
1387+
items:
1388+
- name: Existing Kubernetes compute cannot be updated
1389+
href: ./known-issues/inferencing-updating-kubernetes-compute-appears-to-succeed.md
1390+
- name: Invalid certificate error during deployment
1391+
href: ./known-issues/inferencing-invalid-certificate.md
13871392
- name: Samples
13881393
items:
13891394
- name: Jupyter Notebooks

0 commit comments

Comments
 (0)