The "deepspeed" parameter of DeepSpeed OneBitAdam/ZeroOneAdam #13795
Unanswered
BlinkDL
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment
-
hi @BlinkDL did you solve this issue at last? Currently I'm trying to run RWKV-4-7B retrain using ZeroOneAdam, hit exact the same issue. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am using DeepSpeed strategy at this moment, and I'd like to try DeepSpeed OneBitAdam/ZeroOneAdam, but initialize it in the python code (instead of the json config).
However, there is a "deepspeed" parameter in OneBitAdam/ZeroOneAdam (and you can't pass None, because the optimizer will call deepspeed.mpu):
https://deepspeed.readthedocs.io/en/latest/optimizers.html#zerooneadam-gpu
May I ask where I can find this "deepspeed" object inside DeepSpeedStrategy, so I can pass it to the optimizer?
I am using the following code to initialize DeepSpeedStrategy:
Beta Was this translation helpful? Give feedback.
All reactions