MicrosoftDocs
diff --git a/‎articles/machine-learning/.openpublishing.redirection.machine-learning.json
Lines changed: 5 additions & 0 deletions b/‎articles/machine-learning/.openpublishing.redirection.machine-learning.json
Lines changed: 5 additions & 0 deletions
diff --git a/‎articles/machine-learning/media/reference-checkpoint-performance-with-Nebula/checkpoint-save-metadata.png renamed to ‎articles/machine-learning/media/reference-checkpoint-performance-for-large-models/checkpoint-save-metadata.png b/‎articles/machine-learning/media/reference-checkpoint-performance-with-Nebula/checkpoint-save-metadata.png renamed to ‎articles/machine-learning/media/reference-checkpoint-performance-for-large-models/checkpoint-save-metadata.png
diff --git a/‎articles/machine-learning/media/reference-checkpoint-performance-with-Nebula/checkpoint-time-flow-diagram.png renamed to ‎articles/machine-learning/media/reference-checkpoint-performance-for-large-models/checkpoint-time-flow-diagram.png b/‎articles/machine-learning/media/reference-checkpoint-performance-with-Nebula/checkpoint-time-flow-diagram.png renamed to ‎articles/machine-learning/media/reference-checkpoint-performance-for-large-models/checkpoint-time-flow-diagram.png
diff --git a/‎articles/machine-learning/media/reference-checkpoint-performance-with-Nebula/nebula-checkpoint-time-savings.png renamed to ‎articles/machine-learning/media/reference-checkpoint-performance-for-large-models/nebula-checkpoint-time-savings.png b/‎articles/machine-learning/media/reference-checkpoint-performance-with-Nebula/nebula-checkpoint-time-savings.png renamed to ‎articles/machine-learning/media/reference-checkpoint-performance-for-large-models/nebula-checkpoint-time-savings.png
diff --git a/‎articles/machine-learning/reference-checkpoint-performance-with-Nebula.md renamed to ‎articles/machine-learning/reference-checkpoint-performance-for-large-models.md
Lines changed: 5 additions & 5 deletions b/‎articles/machine-learning/reference-checkpoint-performance-with-Nebula.md renamed to ‎articles/machine-learning/reference-checkpoint-performance-for-large-models.md
Lines changed: 5 additions & 5 deletions
diff --git a/‎articles/machine-learning/toc.yml
Lines changed: 3 additions & 3 deletions b/‎articles/machine-learning/toc.yml
Lines changed: 3 additions & 3 deletions
@@ -1,5 +1,10 @@
 {
     "redirections": [
+    {
+        "source_path_from_root": "/articles/machine-learning/reference-checkpoint-performance-with-Nebula.md",
+        "redirect_url": "/articles/machine-learning/reference-checkpoint-performance-for-large-models",
+        "redirect_document_id": true
+    },
     {
         "source_path_from_root": "/articles/machine-learning/referencemanaged-online-endpoints-vm-sku-list.md",
         "redirect_url": "/azure/machine-learning/reference-managed-online-endpoints-vm-sku-list",
 
@@ -1,5 +1,5 @@
 ---
-title: Optimize Checkpoint Performance for Large Model Training Jobs with Nebula (Preview)
+title: Optimize Checkpoint Performance for Large Models
 titleSuffix: Azure Machine Learning
 description: Learn how Nebula can save time, resources, and money for large model training applications
 services: machine-learning
@@ -9,7 +9,7 @@ ms.custom: ----, ----, ----
 
 author: ziqiwang
 ms.author: ziqiwang
-ms.date: 03/06/2023
+ms.date: 03/14/2023
 ms.reviewer: franksolomon
 ---
 
@@ -41,7 +41,7 @@ Checkpoints can help deal with these problems. Periodic checkpoints snapshot the
 
 When large model training operations experience failures and terminations, data scientists and researchers can restore the training process from a previously saved checkpoint. Unfortunately, the process between the checkpoint and the termination itself is wasted, because the computation must re-execute operations to cover the unsaved, intermediate results. Shorter checkpoint intervals could solve this problem. The following diagram shows the time cost to restore a training process from checkpoints:
 
-:::image type="content" source="./media/reference-checkpoint-performance-with-nebula/checkpoint-time-flow-diagram.png" lightbox="./media/reference-checkpoint-performance-with-nebula/checkpoint-time-flow-diagram.png" alt-text="Screenshot that shows the time cost to restore a training process from checkpoints.":::
+:::image type="content" source="./media/reference-checkpoint-performance-for-large-models/checkpoint-time-flow-diagram.png" lightbox="./media/reference-checkpoint-performance-for-large-models/checkpoint-time-flow-diagram.png" alt-text="Screenshot that shows the time cost to restore a training process from checkpoints.":::
 
 However, the checkpoint saves process itself generates large overheads. A TB-sized checkpoint save can often become a training process bottleneck. The synchronized checkpoint process blocks the training process for hours. Checkpoint-related overheads can take up 12% of total training time, on average, and can rise to 43% [(Maeng et al., 2021)](https://cs.stanford.edu/people/trippel/pubs/cpr-mlsys-21.pdf).
 
@@ -55,7 +55,7 @@ Nebula can
 
 * **Boost checkpoint speeds as much as 1000 times** with a simple API that asynchronously works with your training process. Nebula can reduce checkpoint times from hours to seconds - a potential reduction of 95% to 99%.
 
-  :::image type="content" source="media/reference-checkpoint-performance-with-nebula/nebula-checkpoint-time-savings.png" lightbox="media/reference-checkpoint-performance-with-nebula/nebula-checkpoint-time-savings.png" alt-text="Screenshot that shows the time savings benefit of Nebula.":::
+  :::image type="content" source="media/reference-checkpoint-performance-for-large-models/nebula-checkpoint-time-savings.png" lightbox="media/reference-checkpoint-performance-for-large-models/nebula-checkpoint-time-savings.png" alt-text="Screenshot that shows the time savings benefit of Nebula.":::
 
   This example shows the checkpoint and end-to-end training time reduction for four checkpoint saves of Huggingface GPT2, GPT2-Large, and GPT-XL training jobs. For the medium-sized Huggingface GPT2-XL checkpoint saves (20.6 GB), Nebula achieved a 96.9% time reduction for one checkpoint.
 
@@ -112,7 +112,7 @@ Nebula provides APIs to handle checkpoint saves. You can use these APIs in your
 ### View your checkpointing histories
 When your training job finishes, navigate to the Job `Name> Outputs + logs` pane. In the left panel, expand the **Nebula** folder, and select `checkpointHistories.csv` to see detailed information about Nebula checkpoint saves - duration, throughput, and checkpoint size.
 
-:::image type="content" source="./media/reference-checkpoint-performance-with-nebula/checkpoint-save-metadata.png" lightbox="./media/reference-checkpoint-performance-with-nebula/checkpoint-save-metadata.png" alt-text="Screenshot that shows metadata about the checkpoint saves.":::
+:::image type="content" source="./media/reference-checkpoint-performance-for-large-models/checkpoint-save-metadata.png" lightbox="./media/reference-checkpoint-performance-for-large-models/checkpoint-save-metadata.png" alt-text="Screenshot that shows metadata about the checkpoint saves.":::
 
 ## Examples
 
 
@@ -357,9 +357,6 @@
         - name: Attach and Manage a Synapse Spark pool
           displayName: Attach and Manage a Synapse Spark pool
           href: how-to-manage-synapse-spark-pool.md
-        - name: Reference checkpoint performance with Nebula
-          displayName: Reference checkpoint performance with Nebula
-          href: reference-checkpoint-performance-with-nebula.md
     - name: AKS and Azure Arc-enabled Kubernetes
       items:
         - name: What is Kubernetes compute target
@@ -503,6 +500,9 @@
         - name: Debug jobs and monitor training progress
           displayName: automl
           href: how-to-interactive-jobs.md
+        - name: Optimize Checkpoint Performance for Large Models
+          displayName: Optimize Checkpoint Performance for Large Models
+          href: reference-checkpoint-performance-for-large-models.md
         - name: Train with the Python SDK
           items:
             - name: Tune hyperparameters
Original file line number	Diff line number	Diff line change
`@@ -1,5 +1,10 @@`
`1`	`1`	`{`
`2`	`2`	`"redirections": [`
	`3`	`+ {`
	`4`	`+ "source_path_from_root": "/articles/machine-learning/reference-checkpoint-performance-with-Nebula.md",`
	`5`	`+ "redirect_url": "/articles/machine-learning/reference-checkpoint-performance-for-large-models",`
	`6`	`+ "redirect_document_id": true`
	`7`	`+ },`
`3`	`8`	`{`
`4`	`9`	`"source_path_from_root": "/articles/machine-learning/referencemanaged-online-endpoints-vm-sku-list.md",`
`5`	`10`	`"redirect_url": "/azure/machine-learning/reference-managed-online-endpoints-vm-sku-list",`