feat: Support granite 4 preview architecture for MoE kernels, EP, and fast kernels #143

kmehant · 2025-05-12T08:14:03Z

Extend support to GraniteMoeHybridForCausalLM for all the fast moe (fast kernels, and EP) features and padding free.

Summary of changes

extend support to GraniteMoeHybridForCausalLM
update a100_80gb_moe.csv | updated based on new bench results and new GraniteMoeHybridForCausalLM model
update a100_80gb_moe.csv | bench for mixtral
update requirements_moe.txt
regression test for the above runs.
report parity charts
report outliers
raise issue for failed runs in the new benchmark (see: test: regression tests show higher memory, throughput and loss values for some settings and models #147)
raise issue for loss regression (see: test: regression tests show higher memory, throughput and loss values for some settings and models #147)

Final reg plots that includes past models

memory increased	loss	throughput increased

Outliers

3 classes of outliers can be identified

increased throughput
increased memory consumption
increased loss

issue: #147

Loss regression

Models ibm-granite/granite-3.0-3b-a800m-instruct and ibm-research/moe-7b-1b-active-shared-experts all padding-free runs regressed from previous bench loss showing larger losses than previous bench loss. However, its not clear if it has to do with padding-free since other models in the benchmark set didn't regress with padding free on.

All outliers

Additional failed runs compared to previous benchmark

Reason: OOM

Granite 4 preview acceleration over baselines (mamba kernels + accelerations added as part of this PR.)

…cture Signed-off-by: Mehant Kammakomati <[email protected]>

Signed-off-by: Mehant Kammakomati <[email protected]>

fabianlim · 2025-06-13T15:45:27Z

plugins/accelerated-peft/requirements.txt

+# versions above 0.45.1 to support torch 2.6
+# exact version is used since upper bound is not known
+
+bitsandbytes == 0.45.1


in this case isnt it better to just lower bound? and if so this line exact version is used since upper bound is not known is not needed

Fixed @fabianlim

fabianlim

better not to have mamba installs by default

tox.ini

Signed-off-by: Mehant Kammakomati <[email protected]>

fabianlim

Im ok with the changes. Just wondering what caused the loss regressionl

willmj

LGTM

kmehant · 2025-06-16T13:07:45Z

Just wondering what caused the loss regressionl

It was happening only on old models and padding free setting, also not for newer models with padding free on, so needs investigation, we will check it based on available cycles and update it on the attached issue.

kmehant force-pushed the G4moeplugin branch from 61e2e4c to 1270b2f Compare June 12, 2025 19:04

feat: MoE Kernels, EP, and Fast Kernels for Granite 4 Preview archite…

ada737d

…cture Signed-off-by: Mehant Kammakomati <[email protected]>

kmehant force-pushed the G4moeplugin branch from 1270b2f to ada737d Compare June 12, 2025 19:27

feat: update benchmark csv:

bb3c88e

Signed-off-by: Mehant Kammakomati <[email protected]>

kmehant force-pushed the G4moeplugin branch 3 times, most recently from bc5fbf3 to 7dd0ed1 Compare June 13, 2025 09:17

kmehant marked this pull request as ready for review June 13, 2025 09:33

kmehant requested a review from fabianlim as a code owner June 13, 2025 09:33

kmehant requested a review from willmj June 13, 2025 09:33

kmehant mentioned this pull request Jun 13, 2025

test: regression tests show higher memory, throughput and loss values for some settings and models #147

Open

fabianlim reviewed Jun 13, 2025

View reviewed changes

fabianlim requested changes Jun 13, 2025

View reviewed changes

tox.ini Show resolved Hide resolved

kmehant force-pushed the G4moeplugin branch 4 times, most recently from 98034ef to a446a5c Compare June 13, 2025 18:32

feat: update requirements:

cbe65bf

Signed-off-by: Mehant Kammakomati <[email protected]>

kmehant force-pushed the G4moeplugin branch from a446a5c to cbe65bf Compare June 13, 2025 18:34

kmehant requested a review from fabianlim June 13, 2025 18:39

fabianlim approved these changes Jun 13, 2025

View reviewed changes

willmj approved these changes Jun 16, 2025

View reviewed changes

kmehant merged commit 9122a76 into foundation-model-stack:main Jun 16, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Support granite 4 preview architecture for MoE kernels, EP, and fast kernels #143

feat: Support granite 4 preview architecture for MoE kernels, EP, and fast kernels #143

Uh oh!

kmehant commented May 12, 2025 •

edited

Loading

Uh oh!

fabianlim Jun 13, 2025

Uh oh!

kmehant Jun 13, 2025

Uh oh!

fabianlim left a comment

Uh oh!

Uh oh!

fabianlim left a comment

Uh oh!

willmj left a comment

Uh oh!

Uh oh!

kmehant commented Jun 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Support granite 4 preview architecture for MoE kernels, EP, and fast kernels #143

feat: Support granite 4 preview architecture for MoE kernels, EP, and fast kernels #143

Uh oh!

Conversation

kmehant commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Final reg plots that includes past models

Outliers

Loss regression

All outliers

Additional failed runs compared to previous benchmark

Reason: OOM

Granite 4 preview acceleration over baselines (mamba kernels + accelerations added as part of this PR.)

Uh oh!

fabianlim Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

kmehant Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

fabianlim left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fabianlim left a comment

Choose a reason for hiding this comment

Uh oh!

willmj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kmehant commented Jun 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kmehant commented May 12, 2025 •

edited

Loading