Skip to content

Commit 1fafb26

Browse files
authored
Merge pull request #252827 from s-polly/stp_known_issues_9-25
Added inferencing KIs
2 parents 77682f8 + d9a167b commit 1fafb26

10 files changed

+101
-19
lines changed

articles/machine-learning/known-issues/application-sharing-policy-not-supported.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,11 @@ ms.custom: known-issue
1313

1414
# Known issue - The ApplicationSharingPolicy property isn't supported for compute instances
1515

16+
[!INCLUDE [dev v2](../includes/machine-learning-dev-v2.md)]
17+
1618
Configuring the `applicationSharingPolicy` property for a compute instance has no effect as that property isn't supported
1719

1820

19-
[!INCLUDE [dev v2](../includes/machine-learning-dev-v2.md)]
20-
2121
**Status:** Open
2222

2323
**Problem area:** Compute

articles/machine-learning/known-issues/azure-machine-learning-known-issues.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,9 @@ Select the **Title** to view more information about that specific known issue.
2626
|Compute | [Provisioning error when creating a compute instance with A10 SKU](compute-a10-sku-not-supported.md) | August 14, 2023 |
2727
|Compute | [Idleshutdown property in Bicep template causes error](compute-idleshutdown-bicep.md) | August 14, 2023 |
2828
|Compute | [Slowness in compute instance terminal from a mounted path](compute-slowness-terminal-mounted-path.md)| August 14, 2023|
29-
|Compute| [Creating compute instance after a workspace move results in an Etag conflict error.](workspace-move-compute-instance-same-name.md)| August 14, 2023 |
29+
|Compute| [Creating compute instance after a workspace move results in an Etag conflict error.](workspace-move-compute-instance-same-name.md)| August 14, 2023 |
30+
|Inferencing| [Invalid certificate error during deployment with an AKS cluster](inferencing-invalid-certificate.md)| September, 26, 2023 |
31+
|Inferencing| [Existing Kubernetes compute can't be updated with `az ml compute attach` command](inferencing-updating-kubernetes-compute-appears-to-succeed.md) | September, 26, 2023 |
3032

3133

3234
## Next steps

articles/machine-learning/known-issues/compute-a10-sku-not-supported.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,12 @@ ms.custom: known-issue
1313

1414
# Known issue - Provisioning error when creating a compute instance with A10 SKU
1515

16+
[!INCLUDE [dev v2](../includes/machine-learning-dev-v2.md)]
17+
1618
While trying to create a compute instance with A10 SKU, you'll encounter a provisioning error.
1719

1820
:::image type="content" source="media/compute-a10-sku-not-supported/ci-a10.png" alt-text="A screenshot showing the provisioning error message.":::
1921

20-
[!INCLUDE [dev v2](../includes/machine-learning-dev-v2.md)]
2122

2223
**Status:** Open
2324

articles/machine-learning/known-issues/compute-idleshutdown-bicep.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,12 @@ ms.custom: known-issue, devx-track-bicep
1313

1414
# Known issue - Idleshutdown property in Bicep template causes error
1515

16+
[!INCLUDE [dev v2](../includes/machine-learning-dev-v2.md)]
17+
1618
When creating an Azure Machine Learning compute instance through Bicep compiled using [MSBuild/NuGet](../../azure-resource-manager/bicep/msbuild-bicep-file.md), using the `idleTimeBeforeShutdown` property as described in the API reference [Microsoft.MachineLearningServices workspaces/computes API reference](/azure/templates/microsoft.machinelearningservices/workspaces/computes?pivots=deployment-language-bicep) results in an error.
1719

1820

1921

20-
[!INCLUDE [dev v2](../includes/machine-learning-dev-v2.md)]
21-
22-
2322
**Status:** Open
2423

2524

articles/machine-learning/known-issues/compute-slowness-terminal-mounted-path.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,9 @@ ms.custom: known-issue
1313

1414
# Known issue - Slowness in compute instance terminal from a mounted path
1515

16-
While using the compute instance terminal inside a mounted path of a data folder, any commands executed from the terminal result in slowness. This issue is restricted to the terminal; running the commands from SDK using a notebook works as expected.
17-
18-
1916
[!INCLUDE [dev v2](../includes/machine-learning-dev-v2.md)]
20-
<!--- Choose the correct include --->
17+
18+
While using the compute instance terminal inside a mounted path of a data folder, any commands executed from the terminal result in slowness. This issue is restricted to the terminal; running the commands from SDK using a notebook works as expected.
2119

2220
**Status:** Open
2321

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
title: Known issue - Invalid certificate error during deployment
3+
titleSuffix: Azure Machine Learning
4+
description: During machine learning deployments with an AKS cluster, you may receive an invalid certificate error.
5+
author: s-polly
6+
ms.author: scottpolly
7+
ms.topic: troubleshooting
8+
ms.service: machine-learning
9+
ms.subservice: core
10+
ms.date: 08/04/2023
11+
ms.custom: known-issue
12+
---
13+
14+
# Known issue - Invalid certificate error during deployment with an AKS cluster
15+
16+
[!INCLUDE [sdk v1](../includes/machine-learning-sdk-v1.md)]
17+
18+
During machine learning deployments using an AKS cluster, you may receive an invalid certificate error, such as `{"code":"BadRequest","statusCode":400,"message":"The request is invalid.","details":[{"code":"KubernetesUnaccessible","message":"Kubernetes error: AuthenticationException. Reason: InvalidCertificate"}]`.
19+
20+
21+
**Status:** Open
22+
23+
**Problem area:** Inferencing
24+
25+
## Symptoms
26+
27+
Azure Machine Learning deployments with an AKS cluster fail with the error:
28+
29+
`{"code":"BadRequest","statusCode":400,"message":"The request is invalid.","details":[{"code":"KubernetesUnaccessible","message":"Kubernetes error: AuthenticationException. Reason: InvalidCertificate"}],`
30+
and the following error is shown in the MMS logs:
31+
32+
`K8sReadNamespacedServiceAsync failed with AuthenticationException: System.Security.Authentication.AuthenticationException: The remote certificate was rejected by the provided RemoteCertificateValidationCallback. at System.Net.Security.SslStream.SendAuthResetSignal(ProtocolToken message, ExceptionDispatchInfo exception) at System.Net.Security.SslStream.CompleteHandshake(SslAuthenticationOptions sslAuthenticationOptions) at System.Net.Security.SslStream.ForceAuthenticationAsync[TIOAdapter](tioadapteradapterbooleanreceivefirstbytereauthenticationdatabooleanisapm) at System.Net.Http.ConnectHelper.EstablishSslConnectionAsync(SslClientAuthenticationOptions sslOptions, HttpRequestMessage request, Boolean async, Stream stream, CancellationToken cancellationToken)`
33+
34+
## Cause
35+
36+
This error occurs because the certificate for AKS clusters created before January 2021 does not include the `Subject Key Identifier` value, which prevents the required `Authority Key Identifier` value from being generated.
37+
38+
## Solutions and workarounds
39+
40+
There are two options to resolve this issue:
41+
- Rotate the AKS certificate for the cluster. See [Certificate Rotation in Azure Kubernetes Service (AKS) - Azure Kubernetes Service](../../aks/certificate-rotation.md) for more information.
42+
- Wait for 5 hours for the certificate to be automatically updated, and the issue should be resolved.
43+
44+
## Next steps
45+
46+
- [About known issues](azure-machine-learning-known-issues.md)
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
---
2+
title: Known issue - Existing Kubernetes compute can't be updated
3+
titleSuffix: Azure Machine Learning
4+
description: Updating a Kubernetes attached compute instance using the az ml attach command appears to succeed but doesn't.
5+
author: s-polly
6+
ms.author: scottpolly
7+
ms.topic: troubleshooting
8+
ms.service: machine-learning
9+
ms.subservice: core
10+
ms.date: 08/04/2023
11+
ms.custom: known-issue
12+
---
13+
14+
# Known issue - Existing Kubernetes compute can't be updated with `az ml compute attach` command
15+
16+
[!INCLUDE [dev v2](../includes/machine-learning-dev-v2.md)]
17+
18+
Updating a Kubernetes attached compute instance using the `az ml attach` command appears to succeed but doesn't.
19+
20+
**Status:** Open
21+
22+
**Problem area:** Inferencing
23+
24+
## Symptoms
25+
26+
When running the command `az ml compute attach --resource-group <resource-group-name> --workspace-name <workspace-name> --type Kubernetes --name <existing-attached-compute-name> --resource-id "<cluster-resource-id>" --namespace <kubernetes-namespace>`, The CLI returns a success message indicating that the compute has been successfully updated. However the compute won't be updated.
27+
28+
## Cause
29+
30+
The `az ml compute attach` command currently does not support updating existing Kubernetes compute.
31+
32+
33+
## Next steps
34+
35+
- [About known issues](azure-machine-learning-known-issues.md)

articles/machine-learning/known-issues/jupyter-r-kernel-not-starting.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,14 @@ ms.custom: known-issue
1313

1414
# Known issue - Jupyter R Kernel doesn't start in new compute instance images
1515

16-
When trying to launch an R kernel in JupyterLab or a notebook in a new compute instance, the kernel fails to start with `Error: .onLoad failed in loadNamespace()`
17-
1816
[!INCLUDE [dev v2](../includes/machine-learning-dev-v2.md)]
1917

18+
When trying to launch an R kernel in JupyterLab or a notebook in a new compute instance, the kernel fails to start with `Error: .onLoad failed in loadNamespace()`.
2019

2120
**Status:** Open
2221

23-
2422
**Problem area:** Compute
2523

26-
2724
## Symptoms
2825

2926
After creating a new compute instance, try to launch R kernel in JupyterLab or a Jupyter notebook. The kernel fails to launch. You'll see the following messages in the Jupyter logs:

articles/machine-learning/known-issues/workspace-move-compute-instance-same-name.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,10 @@ ms.custom: known-issue
1313

1414
# Known issue - Creating compute instance after a workspace move results in an Etag conflict error.
1515

16-
After a moving a workspace to a different subscription or resource group, creating a compute instance with the same name as a previous compute instance will fail with an Etag conflict error.
16+
[!INCLUDE [dev v2](../includes/machine-learning-dev-v2.md)]
1717

18+
After a moving a workspace to a different subscription or resource group, creating a compute instance with the same name as a previous compute instance will fail with an Etag conflict error.
1819

19-
[!INCLUDE [dev v2](../includes/machine-learning-dev-v2.md)]
20-
<!--- Choose the correct include --->
2120

2221
**Status:** Open
2322

articles/machine-learning/toc.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -702,7 +702,6 @@
702702
href: how-to-secure-rag-workflows.md
703703
- name: RAG cloud to local
704704
href: how-to-retrieval-augmented-generation-cloud-to-local.md
705-
706705
- name: Responsibly develop & monitor
707706
items:
708707
- name: Responsible AI overview
@@ -1384,6 +1383,12 @@
13841383
href: ./known-issues/workspace-move-compute-instance-same-name.md
13851384
- name: Application Sharing Policy isn't supported
13861385
href: ./known-issues/application-sharing-policy-not-supported.md
1386+
- name: Inferencing known issues
1387+
items:
1388+
- name: Existing Kubernetes compute cannot be updated
1389+
href: ./known-issues/inferencing-updating-kubernetes-compute-appears-to-succeed.md
1390+
- name: Invalid certificate error during deployment
1391+
href: ./known-issues/inferencing-invalid-certificate.md
13871392
- name: Samples
13881393
items:
13891394
- name: Jupyter Notebooks

0 commit comments

Comments
 (0)