Skip to content

Commit 9f4977d

Browse files
committed
Fix detected issues . . .
1 parent afd0fa7 commit 9f4977d

File tree

4 files changed

+9
-9
lines changed

4 files changed

+9
-9
lines changed

articles/machine-learning/reference-checkpoint-performance-with-Nebula.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ When large model training operations experience failures and terminations, data
3939

4040
To summarize, large model checkpoint management involves heavy job recover time and storage overheads.
4141

42-
:::image type="content" source="media/quickstart-spark-jobs/checkpoint-time-flow-diagram.png" lightbox="media/reference-checkpoint-performance-with-Nebula/checkpoint-time-flow-diagram.png" alt-text="Screenshot that shows the time waste of duplicated data training.":::
42+
:::image type="content" source="./media/reference-checkpoint-performance-with-Nebula/checkpoint-time-flow-diagram.png" lightbox="./media/reference-checkpoint-performance-with-Nebula/checkpoint-time-flow-diagram.png" alt-text="Screenshot that shows the time waste of duplicated data training.":::
4343

4444
## Nebula to the Rescue
4545

@@ -65,18 +65,18 @@ Nebula can
6565

6666
and access them at any time with a few lines of code.
6767

68-
**LARGER IMG_3 VERSION NEEDED**
68+
**LARGER img-3 VERSION NEEDED**
6969

70-
:::image type="content" source="media/quickstart-spark-jobs/IMG_3.png" lightbox="media/reference-checkpoint-performance-with-Nebula/IMG_3.png" alt-text="LARGER IMG_3 VERSION NEEDED":::
70+
:::image type="content" source="media/reference-checkpoint-performance-with-Nebula/img-3.png" lightbox="media/reference-checkpoint-performance-with-Nebula/img-3.png" alt-text="LARGER img-3 VERSION NEEDED":::
7171

72-
**LARGER IMG_4 VERSION NEEDED**
72+
**LARGER img-4 VERSION NEEDED**
7373

74-
:::image type="content" source="media/quickstart-spark-jobs/IMG_4.png" lightbox="media/reference-checkpoint-performance-with-Nebula/IMG_4.png" alt-text="LARGER IMG_3 VERSION NEEDED":::
74+
:::image type="content" source="media/reference-checkpoint-performance-with-Nebula/img-4.png" lightbox="media/reference-checkpoint-performance-with-Nebula/img-4.png" alt-text="LARGER img-4 VERSION NEEDED":::
7575

7676

77-
**LARGER IMG_5 VERSION NEEDED**
77+
**LARGER img-5 VERSION NEEDED**
7878

79-
:::image type="content" source="media/quickstart-spark-jobs/IMG_5.png" lightbox="media/reference-checkpoint-performance-with-Nebula/IMG_5.png" alt-text="LARGER IMG_5 VERSION NEEDED":::
79+
:::image type="content" source="media/reference-checkpoint-performance-with-Nebula/img-5.png" lightbox="media/reference-checkpoint-performance-with-Nebula/img-5.png" alt-text="LARGER img-5 VERSION NEEDED":::
8080

8181
Nebula offers full compatibility with any distributed training framework that supports PyTorch, and any compute target that supports ACPT. Nebula is designed to work with different distributed training strategies. You can use Nebula with PyTorch, PyTorch Lightning, DeepSpeed, and more. You can also use it with different Azure Machine Learning compute target, such as AmlCompute or AKS.
8282

@@ -91,7 +91,7 @@ Nebula offers full compatibility with any distributed training framework that su
9191

9292
See [Manage training & deploy computes](./how-to-create-attach-compute-studio.md) to learn more about compute target creation
9393

94-
* The required dependency included in an ACPT-curated (Azure Container for Pytorch) environment. See [Curated environments](resource-curated-environments#azure-container-for-pytorch-acpt-preview) to obtain the ACPT image. Learn how to use the curated environment [here](./how-to-use-environments.md)
94+
* The required dependency included in an ACPT-curated (Azure Container for Pytorch) environment. See [Curated environments](resource-curated-environments.md#azure-container-for-pytorch-acpt-preview) to obtain the ACPT image. Learn how to use the curated environment [here](./how-to-use-environments.md)
9595

9696
* An Azure ML script run configuration file, which defines the
9797
- source directory
@@ -112,7 +112,7 @@ To save checkpoints with Nebula, you must modify your training scripts in two wa
112112

113113
Similar to the way that the PyTorch `torch.save()` API works, Nebula provides checkpoint save APIs that you can use in your training scripts.
114114

115-
You don't need to modify other steps to train your large model on Azure Machine Learning Platform. You only need to use the [Azure Container PyTorch (ACPT) curated environment](how-to-manage-environments-v2?tabs=cli#curated-environments)
115+
You don't need to modify other steps to train your large model on Azure Machine Learning Platform. You only need to use the [Azure Container PyTorch (ACPT) curated environment](how-to-manage-environments-v2.md?tabs=cli#curated-environments)
116116

117117

118118
## Examples

0 commit comments

Comments
 (0)