-
Notifications
You must be signed in to change notification settings - Fork 65
feat: support moe hf chkpt #486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for making a pull request! 😃 |
12f2345 to
9bd210f
Compare
|
@kmehant can you fix the lint error |
f049c36 to
e2ad0f0
Compare
dushyantbehl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
willmj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Mehant! This looks good to me, but could we update the notes section on Fast MoE to include that this conversion now happens automatically?
|
Also if we could add a small unit test here (maybe similar to the e2e tests we have in test_sft_trainer) to make sure this conversion is happening correctly and inference can be run on the model, that would be good - but it may be hard to find a small moe model to test with. |
d658990 to
33bbc92
Compare
|
@willmj addressed your comments, thanks |
a55356e to
573f7a4
Compare
willmj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
Description of the change
This is to support automatic HF checkpointing for moe EP training runs using fms-acceleration.
Depends on foundation-model-stack/fms-acceleration#133
Related issue number
#480