-
Notifications
You must be signed in to change notification settings - Fork 19
feat: Support granite 4 preview architecture for MoE kernels, EP, and fast kernels #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…cture Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
bc5fbf3 to
7dd0ed1
Compare
| # versions above 0.45.1 to support torch 2.6 | ||
| # exact version is used since upper bound is not known | ||
|
|
||
| bitsandbytes == 0.45.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case isnt it better to just lower bound? and if so this line exact version is used since upper bound is not known is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed @fabianlim
fabianlim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better not to have mamba installs by default
98034ef to
a446a5c
Compare
Signed-off-by: Mehant Kammakomati <[email protected]>
fabianlim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im ok with the changes. Just wondering what caused the loss regressionl
willmj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It was happening only on old models and padding free setting, also not for newer models with padding free on, so needs investigation, we will check it based on available cycles and update it on the attached issue. |
Extend support to GraniteMoeHybridForCausalLM for all the fast moe (fast kernels, and EP) features and padding free.
Summary of changes
Final reg plots that includes past models
Outliers
3 classes of outliers can be identified
issue: #147
Loss regression
Models
ibm-granite/granite-3.0-3b-a800m-instructandibm-research/moe-7b-1b-active-shared-expertsall padding-free runs regressed from previous bench loss showing larger losses than previous bench loss. However, its not clear if it has to do with padding-free since other models in the benchmark set didn't regress with padding free on.All outliers
Additional failed runs compared to previous benchmark
Reason: OOM
Granite 4 preview acceleration over baselines (mamba kernels + accelerations added as part of this PR.)