File tree Expand file tree Collapse file tree 1 file changed +3
-3
lines changed
Expand file tree Collapse file tree 1 file changed +3
-3
lines changed Original file line number Diff line number Diff line change @@ -31,7 +31,7 @@ torchrun --nnodes=24 --node_rank=0 --nproc_per_node=8 \
3131 --report_interval=100 \
3232 --checkpoint_interval=20000 \
3333```
34- To reproduce the exact model as Bamba-9B, you can find the training configs [here](data/README.md).
34+ To reproduce the exact model as Bamba-9B, or train using your own data or models, further config details are [here](data/README.md).
3535
3636## Continuing Training
3737
@@ -41,10 +41,10 @@ Training can be continued from a completed run's final saved checkpoint in multi
4141 3. Restore the entire model, optimizer, and dataloader state.
4242
4343If the completed run was configured with `--ckpt_save_path="/path/to/prev/ckpt"`, then a single
44- `consolidated.00.pth` file containing the model weights only is created at the root level , while
44+ `consolidated.00.pth` file containing the final model weights only is created under `"/path/to/pref/ckpt/pth"` , while
4545sharded checkpoint files which also capture the optimizer and dataloader state exist under
4646`"/path/to/prev/ckpt/checkpoints"`. The three scenarios above are then achieved by specifying:
47- 1. **Model Only**: `--ckpt_load_path="/path/to/prev/ckpt/consolidated.00.pth"`
47+ 1. **Model Only**: `--ckpt_load_path="/path/to/prev/ckpt/pth/ consolidated.00.pth"`
4848 2. **Model + Optimizer**: `--ckpt_load_path="/path/to/prev/ckpt/"`
4949 3. **Model + Optimizer + Dataloader**: `--ckpt_load_path="/path/to/prev/ckpt/" --resuming_dataset`
5050
You can’t perform that action at this time.
0 commit comments