-
Notifications
You must be signed in to change notification settings - Fork 192
Add decilm modelling code #505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/compress
Are you sure you want to change the base?
Conversation
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some high-level questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is tokenization_mistral.py needed? Perhaps we hold off on that until we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used afaik. For now, I do not skip any files from deci_lm_hf_code - it will be much easier to sync with the internal code in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of transformers_4_44_2 files? Rest of ModelOpt features only support transformers>=4.48.
Also does DeciLM only support 2 transformers versions at the moment: 4.44.2 and 4.51.3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment as for tokenization_mistral.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what is the motivation to keep files for both transformers versions and instead not import directly from transformers package? Do we make any changes in these transformers_4_44_2_*.py files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know, let's talk to people who implemented it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need megatron_lm related files here? Perhaps we hold off on that until we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
support for hybrid model compression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the hybrid model not in DeciLM format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeciLM model code used megatron_lm mixer class for mamba support
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feature/compress #505 +/- ##
=================================================
Coverage 73.40% 73.40%
=================================================
Files 180 180
Lines 18127 18127
=================================================
Hits 13306 13306
Misses 4821 4821 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can DeciLM not us a standard HF AutoTokenizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know, create an issue to investigate: issues/41
| from .transformers_4_44_2__activations import ACT2FN | ||
| from .transformers_4_44_2__cache_utils import Cache, StaticCache | ||
| from .transformers_4_44_2__modeling_attn_mask_utils import AttentionMaskConverter | ||
| from .transformers_4_44_2__modeling_flash_attention_utils_backward_compat import ( | ||
| _flash_attention_forward, | ||
| ) | ||
| from .transformers_4_44_2__modeling_outputs import ( | ||
| BaseModelOutputWithPast, | ||
| CausalLMOutputWithPast, | ||
| MoeCausalLMOutputWithPast, | ||
| MoeModelOutputWithPast, | ||
| QuestionAnsweringModelOutput, | ||
| SequenceClassifierOutputWithPast, | ||
| TokenClassifierOutput, | ||
| ) | ||
| from .transformers_4_44_2__modeling_rope_utils import ROPE_INIT_FUNCTIONS | ||
| from .transformers_4_44_2__pytorch_utils import ALL_LAYERNORM_LAYERS | ||
| from .transformers_4_51_3__modeling_llama4_attention import Llama4TextAttention, Llama4TextConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at these files in puzzletron. They are static one-time copied files and never modified after copying so we can surely remove unnecessary stuff from these and not have to worry about syncing with puzzletron gitlab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's discuss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mamba code is used in other places in Puzzletron it is not so easy change,
also, supporting multiple models is #1 priority, making code simpler will only hide some problems
for sure, let's not do it now, I have 10MRs waiting in the queue, creating an issue to consider it (issue/42),
| from .megatron_lm__mamba_mixer import MambaMixerMegatron | ||
| from .transformers_4_44_2__activations import ACT2FN | ||
| from .transformers_4_44_2__cache_utils import Cache, StaticCache | ||
| from .transformers_4_44_2__modeling_attn_mask_utils import AttentionMaskConverter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know the original person to copy all these transformers files to DeciLM? I want to understand why is there a need to copy files from a specific transformers version instead of just doing from transformers.modeling_attn_mask_utils import AttentionMaskConverter (from pip installed transformers version)
One reason I can think of could be if transformers moves these functions around in the codebase, our imports will fail. But that is the case for rest of modelopt as well where we import something from transformers or torch and if newer versions break bw compatibility, we just fix our imports. We can always pin a transformers version in requirements but that can always be upgraded from time to time without needing to copy-pase and maintain full files from transformers repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change these imports from local copied files to transformers package and see if our Llama pruning example works fine or not and get rid of these files which will greatly simplify everything for us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just trying if it works is a bad idea, our integration tests are not extensive, let's talk to people who implemented it and understand better
| from .transformers_4_44_2__cache_utils import Cache as Cache_4_44_2 | ||
| from .transformers_4_44_2__cache_utils import SinkCache, SlidingWindowCache, StaticCache | ||
| from .transformers_4_51_3__cache_utils import HybridChunkedCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see all these except SinkCache in latest transformers: https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py and we can directly just import these
SinkCache was deprecated sometime ago in transformers: huggingface/transformers#38399
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's have discussion with other people first about using copied transformers code
What does this PR do?
Add decilm modelling code