Skip to content

Conversation

@kmehant
Copy link
Collaborator

@kmehant kmehant commented May 12, 2025

Extend support to GraniteMoeHybridForCausalLM for all the fast moe (fast kernels, and EP) features and padding free.

Summary of changes

Final reg plots that includes past models

memory increased loss throughput increased
compare-mem_nvidia_mem_reserved compare-train_loss compare-train_tokens_per_second

Outliers

3 classes of outliers can be identified

  1. increased throughput
  2. increased memory consumption
  3. increased loss

issue: #147

Loss regression

Models ibm-granite/granite-3.0-3b-a800m-instruct and ibm-research/moe-7b-1b-active-shared-experts all padding-free runs regressed from previous bench loss showing larger losses than previous bench loss. However, its not clear if it has to do with padding-free since other models in the benchmark set didn't regress with padding free on.

Screenshot 2025-06-13 at 12 17 34 PM

All outliers

Screenshot 2025-06-13 at 12 21 36 PM

Additional failed runs compared to previous benchmark

Reason: OOM

Screenshot 2025-06-13 at 2 23 20 PM

Granite 4 preview acceleration over baselines (mamba kernels + accelerations added as part of this PR.)

Screenshot 2025-06-13 at 3 03 02 PM

Signed-off-by: Mehant Kammakomati <[email protected]>
@kmehant kmehant force-pushed the G4moeplugin branch 3 times, most recently from bc5fbf3 to 7dd0ed1 Compare June 13, 2025 09:17
@kmehant kmehant marked this pull request as ready for review June 13, 2025 09:33
@kmehant kmehant requested a review from fabianlim as a code owner June 13, 2025 09:33
@kmehant kmehant requested a review from willmj June 13, 2025 09:33
# versions above 0.45.1 to support torch 2.6
# exact version is used since upper bound is not known

bitsandbytes == 0.45.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case isnt it better to just lower bound? and if so this line exact version is used since upper bound is not known is not needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed @fabianlim

Copy link
Contributor

@fabianlim fabianlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better not to have mamba installs by default

@kmehant kmehant force-pushed the G4moeplugin branch 4 times, most recently from 98034ef to a446a5c Compare June 13, 2025 18:32
Signed-off-by: Mehant Kammakomati <[email protected]>
Copy link
Contributor

@fabianlim fabianlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im ok with the changes. Just wondering what caused the loss regressionl

Copy link
Collaborator

@willmj willmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kmehant kmehant merged commit 9122a76 into foundation-model-stack:main Jun 16, 2025
7 checks passed
@kmehant
Copy link
Collaborator Author

kmehant commented Jun 16, 2025

Just wondering what caused the loss regressionl

It was happening only on old models and padding free setting, also not for newer models with padding free on, so needs investigation, we will check it based on available cycles and update it on the attached issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants