Updating support of Megatron-LM#3842
Conversation
…lm.py and found out that it is self.expert_tensor_parallel_size that causes the OOM isue
|
@SunMarc This is Peng. I am working on using accelerate and megatron-LM for fine tuning GLM4.6 models. This PR includes some updates of accelerate to better support new features as well as configurations from Megatron. Would you please help take a look? Thank you! |
|
Of course, I will have a look this week ! |
Thank you! Take your time and have a nice holiday ahead if you are in the US! |
SunMarc
left a comment
There was a problem hiding this comment.
Thanks a lot, we don't maintain that much megatron as it is quite complex for users but happy to have this PR !
src/accelerate/utils/dataclasses.py
Outdated
| "attention_dropout": self.attention_dropout, | ||
| "hidden_dropout": self.hidden_dropout, | ||
| "attention_softmax_in_fp32": self.attention_softmax_in_fp32, | ||
| # "expert_tensor_parallel_size": self.expert_tensor_parallel_size, |
There was a problem hiding this comment.
I don't quite find this config useful, so left it commented out. Let me uncomment it. thanks for the comment!
There was a problem hiding this comment.
@SunMarc , fixed it with a new commit. Would you please take another look? Thanks!
|
@bot /style |
|
Style bot fixed some files and pushed the changes. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
What does this PR do?
This PR is to make sure that accelerate continues support using Megatron-LM as backend for training large scale LLMs.
In details:
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.