docs: update low-precision training docs to reflect MS-AMP deprecation (#3929)

Manas Vardhan · web-flow · commit 58c3605fee95 · 2026-03-02T16:23:10.000+01:00
- Add prominent deprecation warnings for MS-AMP in both usage and concept guides - Note specific compatibility issues (CUDA 12.x+, modern NCCL, PyTorch 2.2+) - Recommend TransformersEngine and torchao as actively maintained alternatives - Update code examples to prefer TE/torchao over MS-AMP - Reorder Further Reading links to de-emphasize MS-AMP Closes #3639
diff --git a/docs/source/concept_guides/low_precision_training.md b/docs/source/concept_guides/low_precision_training.md
@@ -16,7 +16,7 @@ rendered properly in your Markdown viewer.
 # Low precision training methods
 
 The release of new kinds of hardware led to the emergence of new training paradigms that better utilize them. Currently, this is in the form of training
-in 8-bit precision using packages such as [TransformersEngine](https://github.com/NVIDIA/TransformerEngine) (TE) or [MS-AMP](https://github.com/Azure/MS-AMP/tree/main).
+in 8-bit precision using packages such as [TransformersEngine](https://github.com/NVIDIA/TransformerEngine) (TE), [torchao](https://github.com/pytorch/ao) (native PyTorch FP8), or the legacy [MS-AMP](https://github.com/Azure/MS-AMP/tree/main) (no longer maintained, see warning below).
 
 For an introduction to the topics discussed today, we recommend reviewing the [low-precision usage guide](../usage_guides/low_precision_training) as this documentation will reference it regularly. 
 
@@ -63,7 +63,7 @@ If we notice in the chart mentioned earlier, TE simply casts the computation lay
 
 <Tip warning={true}>
 
-MS-AMP is no longer actively maintained and has known compatibility issues with newer CUDA versions (12.x+) and PyTorch builds. We recommend using `TransformersEngine` or `torchao` instead for FP8 training.
+**⚠️ Deprecated / Unmaintained:** MS-AMP is no longer actively maintained by Microsoft. The repository has not seen updates since 2023 and has known compatibility issues with CUDA 12.x+, modern NCCL versions, and recent PyTorch releases (2.2+). **We strongly recommend using `TransformersEngine` or `torchao` instead.** See the [usage guide](../usage_guides/low_precision_training) for migration instructions.
 
 </Tip>
 
@@ -77,4 +77,10 @@ MS-AMP takes a different approach to `TransformersEngine` by providing three dif
 
 ## Combining the two
 
+<Tip warning={true}>
+
+Since MS-AMP is no longer maintained, this combination is not recommended for new projects.
+
+</Tip>
+
 More experiments need to be performed but it's been noted that combining both MS-AMP and TransformersEngine can lead to the highest throughput by relying on NVIDIA's optimized FP8 operators and utilizing how MS-AMP reduces the memory overhead.
diff --git a/docs/source/usage_guides/low_precision_training.md b/docs/source/usage_guides/low_precision_training.md
@@ -30,7 +30,7 @@ What this will result in is some reduction in the memory used (as we've cut the
 
 ## Configuring the Accelerator
 
-Currently three different backends for FP8 are supported (`TransformersEngine`, `torchao`, and `MS-AMP`), each with different capabilities and configurations. 
+Currently two actively maintained backends for FP8 are supported (`TransformersEngine` and `torchao`), each with different capabilities and configurations. A legacy `MS-AMP` backend also exists but is no longer recommended (see [below](#configuring-ms-amp) for details).
 
 To use either, the same core API is used. Just pass `mixed_precision="fp8"` to either the [`Accelerator`], during `accelerate config` when prompted about mixed precision, or as part of your `config.yaml` file in the `mixed_precision` key:
 
@@ -43,10 +43,9 @@ To specify a backend (and customize other parts of the FP8 mixed precision setup
 
 ```{python}
 from accelerate import Accelerator
-from accelerate.utils import MSAMPRecipeKwargs
-kwargs = [MSAMPRecipeKwargs()]
-# Or to specify the backend as `TransformersEngine` even if MS-AMP is installed
-# kwargs = [TERecipeKwargs()]
+from accelerate.utils import TERecipeKwargs, AORecipeKwargs
+# Use TransformersEngine
+kwargs = [TERecipeKwargs()]
 # Or to use torchao
 # kwargs = [AORecipeKwargs()]
 accelerator = Accelerator(mixed_precision="fp8", kwarg_handlers=kwargs)
@@ -69,11 +68,19 @@ fp8_config:
 
 <Tip warning={true}>
 
-MS-AMP is no longer actively maintained and has known compatibility issues with newer CUDA versions (12.x+) and PyTorch builds. We recommend using `TransformersEngine` or `torchao` instead for FP8 training.
+**⚠️ Deprecated / Unmaintained:** MS-AMP is no longer actively maintained by Microsoft. The [MS-AMP repository](https://github.com/Azure/MS-AMP) has not received updates since 2023 and has known compatibility issues:
+
+- Requires CUDA 11.x (does not support CUDA 12.x+)
+- Requires older NCCL versions incompatible with recent PyTorch releases
+- Does not support recent PyTorch versions (2.2+)
+
+**We strongly recommend using [`TransformersEngine`](#configuring-transformersengine) or [`torchao`](#configuring-torchao) instead for all new and existing FP8 training workflows.** Both are actively maintained and support modern CUDA/PyTorch versions. Native PyTorch FP8 support via `torchao` is particularly promising as a vendor-neutral solution.
+
+The MS-AMP backend is retained in Accelerate for legacy compatibility but may be removed in a future release.
 
 </Tip>
 
-Of the two, `MS-AMP` is traditionally the easier one to configure as there is only a single argument: the optimization level. 
+`MS-AMP` has a single configuration argument: the optimization level. 
 
 Currently two levels of optimization are supported in the Accelerate integration, `"O1"` and `"O2"` (using the letter 'o', not zero). 
 
@@ -205,7 +212,7 @@ Find out more [here](https://github.com/huggingface/accelerate/tree/main/benchma
 
 To learn more about training in FP8 please check out the following resources:
 
-* [Our concept guide](../concept_guides/low_precision_training) detailing into more about both TransformersEngine and MS-AMP
+* [Our concept guide](../concept_guides/low_precision_training) detailing into more about TransformersEngine, torchao, and MS-AMP
 * [The `transformers-engine` documentation](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/api/common.html)
-* [The `MS-AMP` documentation](https://azure.github.io/MS-AMP/docs/)
 * [The `torchao` documentation](https://github.com/pytorch/ao/tree/main/torchao/float8)
+* [The `MS-AMP` documentation](https://azure.github.io/MS-AMP/docs/) (⚠️ no longer maintained)