-
Notifications
You must be signed in to change notification settings - Fork 19
Extend support to GraniteMoeSharedForCausalLM architecture #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend support to GraniteMoeSharedForCausalLM architecture #126
Conversation
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
321f6e9 to
acd6d19
Compare
270123d to
0615a81
Compare
Signed-off-by: Mehant Kammakomati <[email protected]>
0615a81 to
353b623
Compare
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
Signed-off-by: Mehant Kammakomati <[email protected]>
| logic="APPEND", | ||
| ), | ||
| ), | ||
| ModelPatcherRule( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refer to how granite does it
| packing: False | ||
| adam_epsilon: 1e-8 | ||
| model_name_or_path: | ||
| - 'ibm-research/moe-7b-1b-active-shared-experts' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you shouldnt copy and paste this, you should just add to model_name_or_path in the existing scenarios
model_name_or_path:
- 'ibm-granite/granite-3.0-3b-a800m-instruct'
- 'ibm-research/moe-7b-1b-active-shared-experts'
| - moe-scattermoe-granite-ep4-padding-free-foak | ||
| arguments: | ||
| learning_rate: 5e-5 | ||
| torch_dtype: bfloat16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.. dont copy and paste.. if its all the same arguments there is no need
I dont understand why you need this different bench
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted to run only for moesharedmodel so had to copy paste. Is there a way I can subselect a model along with scenario with scenariofilter? Apologies if I have missed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry you cant. If you want to do ad hoc testing then just uncomment the other models you dont want to test. For the official bench we need to update all models, this is because we only version 1 set of requirements for reproducibilty, and we cant have partial benches running, otherwise there will be inconsistency
scripts/benchmarks/benchmark.py
Outdated
| [ | ||
| "--gradient_accumulation_steps", | ||
| str(effective_batch_size // num_gpus // pdtbs), | ||
| str(1 if gas == 0 else gas), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this worjks, but dont understand why you need it, because your benches use the same parameters as the existing ones, and we dont run into this issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @fabianlim. I am not sure how your benches passed through this value error - https://github.com/foundation-model-stack/fms-hf-tuning/blob/fb3ace8397223932e176de604703c54a14e1ebf0/tuning/sft_trainer.py#L136-L139
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I noticed is the experiments continue to happen silently even when some of them are failed and also the benchmark report gets generated irrespectively,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each job is independent. the bench will run all jobs and those jobs get failed will have empty reports.
willmj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once Fabian's comments have been addressed this looks good to me except for minor nit, would be interested to see benchmarks as well if available.
Signed-off-by: Mehant Kammakomati <[email protected]>
|
@kmehant yes code changes looks good, what is missing now is the updating of the benches. So you need to run the full bench for scenarios-moe.yaml now in order to update a100_80gb_moe.csv and requirements_moe.txt
|
Signed-off-by: Mehant Kammakomati <[email protected]>
|
|
@willmj and @anhuong can either of you help to guide @kmehant how to produce the regression plots? @kmehant when updating the benches we want to make sure we do not suffer any worsening in performance as compared to the current bench, so we have some convinience scripts to produce plots like those you see in the PR.. then we will record these regrressions as a form of record keeping |
|
Got it @fabianlim do you have those scripts handy? |
Use of memory increasedTrain loss parity existsThroughput increased in the new benchmark |
|
@fabianlim apologies for the confusion. I was sharing intermediate results and regression. I am tracking it here and mixtral is currently running and will update the regression and results once done. Thank you. |
Signed-off-by: Mehant Kammakomati <[email protected]>
| -e git+https://github.com/kmehant/fms-acceleration.git@d0560e0c652dc2f72fdad07a2fb7af8fe67bbf08#egg=fms_acceleration&subdirectory=plugins/framework | ||
| -e git+https://github.com/kmehant/fms-acceleration.git@d0560e0c652dc2f72fdad07a2fb7af8fe67bbf08#egg=fms_acceleration_aadp&subdirectory=plugins/attention-and-distributed-packing | ||
| -e git+https://github.com/kmehant/fms-acceleration.git@d0560e0c652dc2f72fdad07a2fb7af8fe67bbf08#egg=fms_acceleration_foak&subdirectory=plugins/fused-ops-and-kernels | ||
| -e git+https://github.com/kmehant/fms-acceleration.git@d0560e0c652dc2f72fdad07a2fb7af8fe67bbf08#egg=fms_acceleration_moe&subdirectory=plugins/accelerated-moe | ||
| -e git+https://github.com/kmehant/fms-acceleration.git@d0560e0c652dc2f72fdad07a2fb7af8fe67bbf08#egg=fms_acceleration_peft&subdirectory=plugins/accelerated-peft | ||
| fms-hf-tuning @ git+https://github.com/foundation-model-stack/fms-hf-tuning.git@fdc7527510692ada03e4303df1549cebc5139b31 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a pip freeze to update this, however, this would show my forks of fms-acceleration. Should we keep it as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no pls delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the req file. Thanks
tox.ini
Outdated
|
|
||
| # install the flash attn at the last | ||
| pip install flash-attn | ||
| pip install flash-attn --no-build-isolation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this really need to change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this torch module not found issue with tox. Had to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im abit confused because i believe torch is installed with fms_hf_tuing and our tox file installs that, because tox is completely reproducible. so if you haad seen this, we must have seen this before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about this.. maybe can you leave out this commit, and make an issue saying you want to add it in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I can do that.
fabianlim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
69a684f to
0a04dcc
Compare
Signed-off-by: Mehant Kammakomati <[email protected]>
0a04dcc to
d4256f6
Compare
Signed-off-by: Mehant Kammakomati <[email protected]>
willmj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@fabianlim @willmj requesting merge. |
|
@kmehant sorry i just noticed you only have 2 mixtral entries, but we used to have 3. can you check this |
|
The FSDP variant which was previously part got OOMed @fabianlim |
|
All the OOMs need to be recorded down, can you do the same in the main commeng @kmehant , and pls open an issue and link it here for tracking |
|
@fabianlim completed! |









Extend support to GraniteMoeSharedForCausalLM for all the fast moe (fast kernels, and EP) features and padding free.
Summary of changes
Final reg plots that includes granite moe and mixtral.
Outliers
OOM experiments
Mixtral OOM due to extra memory consumption
Also note:
ibm-research/moe-7b-1b-active-shared-expertshas number of experts that are not divisible by 4, soep4does not applyReference issue on regression - #128