Skip to content

Commit 583bd69

Browse files
authored
Corrected singlefile path, added detail
1 parent afe9386 commit 583bd69

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

training/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ torchrun --nnodes=24 --node_rank=0 --nproc_per_node=8 \
3131
--report_interval=100 \
3232
--checkpoint_interval=20000 \
3333
```
34-
To reproduce the exact model as Bamba-9B, you can find the training configs [here](data/README.md).
34+
To reproduce the exact model as Bamba-9B, or train using your own data or models, further config details are [here](data/README.md).
3535

3636
## Continuing Training
3737

@@ -41,10 +41,10 @@ Training can be continued from a completed run's final saved checkpoint in multi
4141
3. Restore the entire model, optimizer, and dataloader state.
4242

4343
If the completed run was configured with `--ckpt_save_path="/path/to/prev/ckpt"`, then a single
44-
`consolidated.00.pth` file containing the model weights only is created at the root level, while
44+
`consolidated.00.pth` file containing the final model weights only is created under `"/path/to/pref/ckpt/pth"`, while
4545
sharded checkpoint files which also capture the optimizer and dataloader state exist under
4646
`"/path/to/prev/ckpt/checkpoints"`. The three scenarios above are then achieved by specifying:
47-
1. **Model Only**: `--ckpt_load_path="/path/to/prev/ckpt/consolidated.00.pth"`
47+
1. **Model Only**: `--ckpt_load_path="/path/to/prev/ckpt/pth/consolidated.00.pth"`
4848
2. **Model + Optimizer**: `--ckpt_load_path="/path/to/prev/ckpt/"`
4949
3. **Model + Optimizer + Dataloader**: `--ckpt_load_path="/path/to/prev/ckpt/" --resuming_dataset`
5050

0 commit comments

Comments
 (0)