Skip to content

Commit f48e694

Browse files
authored
Merge pull request #92430 from nishankgu/patch-16
Update resource-known-issues.md
2 parents 216c08d + 71d49c2 commit f48e694

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

articles/machine-learning/service/resource-known-issues.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,12 @@ ms.custom: seodec18
1717

1818
This article helps you find and correct errors or failures encountered when using Azure Machine Learning.
1919

20+
## Upcoming SR-IOV upgrade to NCv3 machines in AmlCompute
21+
22+
Azure Compute will be updating the NCv3 SKUs starting early November to support all MPI implementations and versions, and RDMA verbs for InfiniBand-equipped virtual machines. This will require a short downtime - [read more about the SR-IOV upgrade](https://azure.microsoft.com/updates/sriov-availability-on-ncv3-virtual-machines-sku).
23+
24+
As a customer of Azure Machine Learning's managed compute offering (AmlCompute), you are not required to make any changes at this time. Based on the [update schedule](https://azure.microsoft.com/updates/sr-iov-availability-schedule-on-ncv3-virtual-machines-sku) you would need to plan for a short break in your training. The service will take responsibility to update the VM images on your cluster nodes and automatically scale up your cluster. Once the upgrade completes you may be able to use all other MPI discibutions (like OpenMPI with Pytorch) besides getting higher InfiniBand bandwidth, lower latencies, and better distributed application performance.
25+
2026
## Visual interface issues
2127

2228
Visual interface for machine learning service issues.

0 commit comments

Comments
 (0)