Extend support to GraniteMoeSharedForCausalLM architecture #126

kmehant · 2025-02-18T12:15:33Z

Extend support to GraniteMoeSharedForCausalLM for all the fast moe (fast kernels, and EP) features and padding free.

Summary of changes

extend support to GraniteMoeSharedForCausalLM
update a100_80gb_moe.csv | bench for moe granite (moe and moe shared granite models)
update a100_80gb_moe.csv | bench for mixtral
update requirements_moe.txt
regression test for the above runs.

Final reg plots that includes granite moe and mixtral.

memory increased	loss holds parity	throughput increased

Outliers

OOM experiments

Mixtral OOM due to extra memory consumption

epoch	framework_config	gradient_accumulation_steps	mem_nvidia_mem_reserved	model_name_or_path	num_gpus	per_device_train_batch_size	torch_dtype	train_loss	train_runtime	train_samples_per_second	train_steps_per_second	train_tokens_per_second
	none	16	78783.5	mistralai/Mixtral-8x7B-Instruct-v0.1	8	1	bfloat16

Also note:

ibm-research/moe-7b-1b-active-shared-experts has number of experts that are not divisible by 4, so ep4 does not apply

Reference issue on regression - #128

Signed-off-by: Mehant Kammakomati <[email protected]>

fabianlim · 2025-02-18T17:07:21Z

plugins/fused-ops-and-kernels/src/fms_acceleration_foak/models/granitemoeshared.py

+                logic="APPEND",
+            ),
+        ),
+        ModelPatcherRule(


refer to how granite does it

fabianlim · 2025-02-18T17:09:03Z

scripts/benchmarks/scenarios-moe.yaml

+            packing: False
+            adam_epsilon: 1e-8
+            model_name_or_path: 
+                - 'ibm-research/moe-7b-1b-active-shared-experts'


you shouldnt copy and paste this, you should just add to model_name_or_path in the existing scenarios

model_name_or_path: - 'ibm-granite/granite-3.0-3b-a800m-instruct' - 'ibm-research/moe-7b-1b-active-shared-experts'

fabianlim · 2025-02-18T17:09:53Z

scripts/benchmarks/scenarios-moe.yaml

+            - moe-scattermoe-granite-ep4-padding-free-foak
+        arguments:
+            learning_rate: 5e-5
+            torch_dtype: bfloat16


same here.. dont copy and paste.. if its all the same arguments there is no need

I dont understand why you need this different bench

Wanted to run only for moesharedmodel so had to copy paste. Is there a way I can subselect a model along with scenario with scenariofilter? Apologies if I have missed that.

sorry you cant. If you want to do ad hoc testing then just uncomment the other models you dont want to test. For the official bench we need to update all models, this is because we only version 1 set of requirements for reproducibilty, and we cant have partial benches running, otherwise there will be inconsistency

fabianlim · 2025-02-18T17:10:18Z

scripts/benchmarks/benchmark.py

                    [
                        "--gradient_accumulation_steps",
-                        str(effective_batch_size // num_gpus // pdtbs),
+                        str(1 if gas == 0 else gas),


ok this worjks, but dont understand why you need it, because your benches use the same parameters as the existing ones, and we dont run into this issue

Yes, @fabianlim. I am not sure how your benches passed through this value error - https://github.com/foundation-model-stack/fms-hf-tuning/blob/fb3ace8397223932e176de604703c54a14e1ebf0/tuning/sft_trainer.py#L136-L139

One thing I noticed is the experiments continue to happen silently even when some of them are failed and also the benchmark report gets generated irrespectively,

each job is independent. the bench will run all jobs and those jobs get failed will have empty reports.

willmj

Once Fabian's comments have been addressed this looks good to me except for minor nit, would be interested to see benchmarks as well if available.

scripts/benchmarks/benchmark.py

Signed-off-by: Mehant Kammakomati <[email protected]>

fabianlim · 2025-02-18T22:59:36Z

@kmehant yes code changes looks good, what is missing now is the updating of the benches. So you need to run the full bench for scenarios-moe.yaml now in order to update a100_80gb_moe.csv and requirements_moe.txt

Please run the bench using the documented instructions, running the actual bench in adhoc ways cause inconsistencies.
There are two suites accelerated-moe-scatter and accelerated-moe-scatter-mixtral, their commands are seperate, and then the results are stictched together. If this is too troublesome, then you can split them. Take out the mixtral benches and put them in seperate requirements and CSV file (maybe call them moe_mixtral.csv, requirements_moe_mixtral.csv, etc..
Then run one full bench for accelerated-moe-scatter for both models and update accordingly

Signed-off-by: Mehant Kammakomati <[email protected]>

kmehant · 2025-02-20T07:17:53Z

update a100_80gb_moe.csv | bench for moe granite (moe and moe shared granite models)
update a100_80gb_moe.csv | bench for mixtral
update requirements_moe.txt
regression test for the above runs.

fabianlim · 2025-02-20T07:27:05Z

@willmj and @anhuong can either of you help to guide @kmehant how to produce the regression plots?

@kmehant when updating the benches we want to make sure we do not suffer any worsening in performance as compared to the current bench, so we have some convinience scripts to produce plots like those you see in the PR.. then we will record these regrressions as a form of record keeping

kmehant · 2025-02-20T07:35:16Z

#126 (comment)

Got it @fabianlim do you have those scripts handy?

kmehant · 2025-02-20T08:58:49Z

@fabianlim

Use of memory increased

Train loss parity exists

Throughput increased in the new benchmark

kmehant · 2025-02-20T09:01:35Z

Outliers

fabianlim · 2025-02-20T11:07:28Z

@kmehant pls take note of my comment here

you must preserve the mixtral bench, you cannot delete it

kmehant · 2025-02-20T12:11:21Z

@fabianlim apologies for the confusion. I was sharing intermediate results and regression. I am tracking it here and mixtral is currently running and will update the regression and results once done. Thank you.

Signed-off-by: Mehant Kammakomati <[email protected]>

kmehant · 2025-02-21T07:17:48Z

Final reg plots that includes mixtral.

memory increased	loss holds parity	throughput increased

Outliers

kmehant · 2025-02-21T07:19:51Z

scripts/benchmarks/refs/requirements_moe.txt

+-e git+https://github.com/kmehant/fms-acceleration.git@d0560e0c652dc2f72fdad07a2fb7af8fe67bbf08#egg=fms_acceleration&subdirectory=plugins/framework
+-e git+https://github.com/kmehant/fms-acceleration.git@d0560e0c652dc2f72fdad07a2fb7af8fe67bbf08#egg=fms_acceleration_aadp&subdirectory=plugins/attention-and-distributed-packing
+-e git+https://github.com/kmehant/fms-acceleration.git@d0560e0c652dc2f72fdad07a2fb7af8fe67bbf08#egg=fms_acceleration_foak&subdirectory=plugins/fused-ops-and-kernels
+-e git+https://github.com/kmehant/fms-acceleration.git@d0560e0c652dc2f72fdad07a2fb7af8fe67bbf08#egg=fms_acceleration_moe&subdirectory=plugins/accelerated-moe
+-e git+https://github.com/kmehant/fms-acceleration.git@d0560e0c652dc2f72fdad07a2fb7af8fe67bbf08#egg=fms_acceleration_peft&subdirectory=plugins/accelerated-peft
+fms-hf-tuning @ git+https://github.com/foundation-model-stack/fms-hf-tuning.git@fdc7527510692ada03e4303df1549cebc5139b31


I did a pip freeze to update this, however, this would show my forks of fms-acceleration. Should we keep it as is?

no pls delete

updated the req file. Thanks

fabianlim · 2025-02-21T07:22:52Z

tox.ini


    # install the flash attn at the last 
-    pip install flash-attn
+    pip install flash-attn --no-build-isolation


does this really need to change

I had this torch module not found issue with tox. Had to do this.

im abit confused because i believe torch is installed with fms_hf_tuing and our tox file installs that, because tox is completely reproducible. so if you haad seen this, we must have seen this before

how about this.. maybe can you leave out this commit, and make an issue saying you want to add it in

Sure I can do that.

fabianlim

@kmehant can you put your final regression plots into the main description. Otherwise i think it looks good. Lets have @willmj take one look at it before merge

Signed-off-by: Mehant Kammakomati <[email protected]>

willmj

LGTM

kmehant · 2025-02-21T16:28:02Z

@fabianlim @willmj requesting merge.

fabianlim · 2025-02-21T16:31:12Z

@kmehant sorry i just noticed you only have 2 mixtral entries, but we used to have 3. can you check this

kmehant · 2025-02-21T18:00:31Z

#126 (comment)

The FSDP variant which was previously part got OOMed @fabianlim

fabianlim · 2025-02-21T23:33:29Z

All the OOMs need to be recorded down, can you do the same in the main commeng @kmehant , and pls open an issue and link it here for tracking

kmehant · 2025-02-22T04:43:14Z

#126 (comment)

@fabianlim completed!

kmehant added 4 commits February 18, 2025 14:29

feat: extend support to GraniteMoeSharedForCausalLM

b2cec7e

Signed-off-by: Mehant Kammakomati <[email protected]>

feat: extend support to GraniteMoeSharedForCausalLM

8abb025

Signed-off-by: Mehant Kammakomati <[email protected]>

feat: extend foak support to granitemoeshared arch

41683af

Signed-off-by: Mehant Kammakomati <[email protected]>

feat: include shared moe as part of moe benches

78123d6

Signed-off-by: Mehant Kammakomati <[email protected]>

kmehant force-pushed the support-sharedmoe-granite branch from 321f6e9 to acd6d19 Compare February 18, 2025 13:35

kmehant marked this pull request as ready for review February 18, 2025 14:02

kmehant requested a review from fabianlim as a code owner February 18, 2025 14:02

fabianlim requested a review from willmj February 18, 2025 14:03

kmehant force-pushed the support-sharedmoe-granite branch 2 times, most recently from 270123d to 0615a81 Compare February 18, 2025 15:34

fix: lint errors

353b623

Signed-off-by: Mehant Kammakomati <[email protected]>

kmehant force-pushed the support-sharedmoe-granite branch from 0615a81 to 353b623 Compare February 18, 2025 15:45

kmehant added 3 commits February 18, 2025 21:34

feat: grad accum cannot be 0

0894286

Signed-off-by: Mehant Kammakomati <[email protected]>

feat: have separate scenario for moe-shared

8298052

Signed-off-by: Mehant Kammakomati <[email protected]>

feat: add small scenario for moe shared for quick testing

23ea8ec

Signed-off-by: Mehant Kammakomati <[email protected]>

fabianlim requested changes Feb 18, 2025

View reviewed changes

willmj reviewed Feb 18, 2025

View reviewed changes

scripts/benchmarks/benchmark.py Outdated Show resolved Hide resolved

fix: address review comments

d0560e0

Signed-off-by: Mehant Kammakomati <[email protected]>

feat: add partial bench data

225b6a0

Signed-off-by: Mehant Kammakomati <[email protected]>

feat: add mixtral results

b890141

Signed-off-by: Mehant Kammakomati <[email protected]>

kmehant commented Feb 21, 2025

View reviewed changes

fabianlim reviewed Feb 21, 2025

View reviewed changes

fabianlim approved these changes Feb 21, 2025

View reviewed changes

kmehant force-pushed the support-sharedmoe-granite branch from 69a684f to 0a04dcc Compare February 21, 2025 07:34

feat: update requirements file for moe bench

d4256f6

Signed-off-by: Mehant Kammakomati <[email protected]>

kmehant force-pushed the support-sharedmoe-granite branch from 0a04dcc to d4256f6 Compare February 21, 2025 07:35

fix: revert flash install fix

b64808d

Signed-off-by: Mehant Kammakomati <[email protected]>

kmehant requested a review from willmj February 21, 2025 07:37

willmj approved these changes Feb 21, 2025

View reviewed changes

kmehant mentioned this pull request Feb 22, 2025

test: Regression tests show increased use of memory causing 1 additional experiment to OOM over last tests #128

Open

fabianlim merged commit c959c6b into foundation-model-stack:main Feb 24, 2025
7 checks passed

Extend support to GraniteMoeSharedForCausalLM architecture #126

Extend support to GraniteMoeSharedForCausalLM architecture #126

Uh oh!

Conversation

kmehant commented Feb 18, 2025 • edited by fabianlim Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Final reg plots that includes granite moe and mixtral.

Outliers

OOM experiments

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

willmj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fabianlim commented Feb 18, 2025

Uh oh!

kmehant commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fabianlim commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kmehant commented Feb 20, 2025

Uh oh!

kmehant commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Use of memory increased

Train loss parity exists

Throughput increased in the new benchmark

Uh oh!

kmehant commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Outliers

Uh oh!

fabianlim commented Feb 20, 2025

Uh oh!

kmehant commented Feb 20, 2025

Uh oh!

kmehant commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Outliers

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabianlim Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabianlim left a comment

kmehant commented Feb 18, 2025 •

edited by fabianlim

Loading

kmehant commented Feb 20, 2025 •

edited

Loading

fabianlim commented Feb 20, 2025 •

edited

Loading

kmehant commented Feb 20, 2025 •

edited

Loading

kmehant commented Feb 20, 2025 •

edited

Loading

kmehant commented Feb 21, 2025 •

edited

Loading

fabianlim Feb 21, 2025 •

edited

Loading