You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It looks like `save_checkpoint` expects `get_model_parallel_*` API in
the `mpu` object. So adding it to the Ulysses slim mpu version.
This solves this problem in HF Trainer:
```
[rank1]: File "/code/users/stas/github/transformers-alst-integration/src/transformers/trainer.py", line 3248, in _save_optimizer_and_scheduler
[rank1]: self.model_wrapped.save_checkpoint(output_dir)
[rank1]: File "/code/users/stas/github/DeepSpeed/deepspeed/runtime/engine.py", line 3497, in save_checkpoint
[rank1]: self._save_checkpoint(save_dir,
[rank1]: File "/code/users/stas/github/DeepSpeed/deepspeed/runtime/engine.py", line 3709, in _save_checkpoint
[rank1]: save_path = self._get_ckpt_name(save_dir, tag)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/code/users/stas/github/DeepSpeed/deepspeed/runtime/engine.py", line 3039, in _get_ckpt_name
[rank1]: mp_rank = 0 if self.mpu is None else self.mpu.get_model_parallel_rank()
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: AttributeError: module 'deepspeed.runtime.sequence_parallel.parallel_state_sp' has no attribute 'get_model_parallel_rank'. Did you mean: 'get_sequence_parallel_rank'?
```
Signed-off-by: Stas Bekman <[email protected]>
0 commit comments