Skip to content

Commit 9e772b7

Browse files
committed
Latest changes . . .
1 parent 068faf2 commit 9e772b7

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

articles/machine-learning/reference-checkpoint-performance-with-Nebula.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Azure Container for PyTorch (ACPT) now includes **Nebula**, a fast, simple, disk
2424
To make Nebula available for your training jobs, import the `nebulaml` python package in your script. Nebula has full compatibility with different distributed PyTorch training strategies, including PyTorch Lightning, DeepSpeed, and more. The Nebula API offers a simple way to monitor and view checkpoint lifecycles. The APIs support various model types, and ensure checkpoint consistency and reliability.
2525

2626
> [!IMPORTANT]
27-
> The `torch-nebula` package is not available in the public PyPI python package index. This package is only available in the Azure Container for PyTorch (ACPT) curated environment on Azure Machine Learning. To avoid problems, please don't try to install `torch-nebula` from PyPI, or the `pip` command.
27+
> The `nebulaml` package is not available in the public PyPI python package index. This package is only available in the Azure Container for PyTorch (ACPT) curated environment on Azure Machine Learning. To avoid problems, please don't try to install `nebulaml` from PyPI, or the `pip` command.
2828
2929
In this document, you'll learn how to use Nebula with ACPT on Azure Machine Learning, to quickly checkpoint your model training jobs. Additionally, you'll learn how to view and manage Nebula checkpoint data. You'll also learn how to resume the model training jobs from the last available checkpoint if Azure Machine Learning suffers interruption, failure, or termination.
3030

@@ -170,7 +170,7 @@ To enable full Nebula compatibility with PyTorch-based training scripts, modify
170170
## List all checkpoints
171171
ckpts = nm.list_checkpoints()
172172
## Get Latest checkpoint path
173-
latest_ckpt_path = tn.get_latest_checkpoint_path("checkpoint", persisted_storage_path)
173+
latest_ckpt_path = ml.get_latest_checkpoint_path("checkpoint", persisted_storage_path)
174174
```
175175

176176
# [Using DeepSpeed](#tab/DEEPSPEED)
@@ -205,16 +205,16 @@ latest_ckpt_path = tn.get_latest_checkpoint_path("checkpoint", persisted_storage
205205
config_params["persistent_storage_path"] = "<YOUR STORAGE PATH>"
206206
config_params["persistent_time_interval"] = 10
207207

208-
nebula_checkpoint_callback = tn.NebulaCallback(
208+
nebula_checkpoint_callback = ml.NebulaCallback(
209209
****, # Original ModelCheckpoint params
210210
config_params=config_params, # customize the config of init nebula
211211
)
212212
```
213213

214-
Next, add `tn.NebulaCheckpointIO()` as a plugin to your `Trainer`, and modify the `trainer.save_checkpoint()` storage parameters as shown:
214+
Next, add `ml.NebulaCheckpointIO()` as a plugin to your `Trainer`, and modify the `trainer.save_checkpoint()` storage parameters as shown:
215215

216216
```python
217-
trainer = Trainer(plugins=[tn.NebulaCheckpointIO()], # add NebulaCheckpointIO as a plugin
217+
trainer = Trainer(plugins=[ml.NebulaCheckpointIO()], # add NebulaCheckpointIO as a plugin
218218
callbacks=[nebula_checkpoint_callback]) # use NebulaCallback as a plugin
219219
```
220220

0 commit comments

Comments
 (0)