Skip to content

Conversation

@danielkorzekwa
Copy link

What does this PR do?

Add decilm modelling code

Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
@danielkorzekwa danielkorzekwa requested review from a team as code owners November 4, 2025 10:31
@danielkorzekwa danielkorzekwa requested review from kevalmorabia97 and removed request for a team November 4, 2025 10:31
Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high-level questions

Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is tokenization_mistral.py needed? Perhaps we hold off on that until we need it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used afaik. For now, I do not skip any files from deci_lm_hf_code - it will be much easier to sync with the internal code in the meantime.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of transformers_4_44_2 files? Rest of ModelOpt features only support transformers>=4.48.

Also does DeciLM only support 2 transformers versions at the moment: 4.44.2 and 4.51.3?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as for tokenization_mistral.py

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what is the motivation to keep files for both transformers versions and instead not import directly from transformers package? Do we make any changes in these transformers_4_44_2_*.py files?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know, let's talk to people who implemented it.

Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need megatron_lm related files here? Perhaps we hold off on that until we need it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support for hybrid model compression

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the hybrid model not in DeciLM format?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeciLM model code used megatron_lm mixer class for mamba support

@codecov
Copy link

codecov bot commented Nov 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.40%. Comparing base (1c12fd8) to head (418890e).

Additional details and impacted files
@@                Coverage Diff                @@
##           feature/compress     #505   +/-   ##
=================================================
  Coverage             73.40%   73.40%           
=================================================
  Files                   180      180           
  Lines                 18127    18127           
=================================================
  Hits                  13306    13306           
  Misses                 4821     4821           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can DeciLM not us a standard HF AutoTokenizer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know, create an issue to investigate: issues/41

Comment on lines +58 to +75
from .transformers_4_44_2__activations import ACT2FN
from .transformers_4_44_2__cache_utils import Cache, StaticCache
from .transformers_4_44_2__modeling_attn_mask_utils import AttentionMaskConverter
from .transformers_4_44_2__modeling_flash_attention_utils_backward_compat import (
_flash_attention_forward,
)
from .transformers_4_44_2__modeling_outputs import (
BaseModelOutputWithPast,
CausalLMOutputWithPast,
MoeCausalLMOutputWithPast,
MoeModelOutputWithPast,
QuestionAnsweringModelOutput,
SequenceClassifierOutputWithPast,
TokenClassifierOutput,
)
from .transformers_4_44_2__modeling_rope_utils import ROPE_INIT_FUNCTIONS
from .transformers_4_44_2__pytorch_utils import ALL_LAYERNORM_LAYERS
from .transformers_4_51_3__modeling_llama4_attention import Llama4TextAttention, Llama4TextConfig
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at these files in puzzletron. They are static one-time copied files and never modified after copying so we can surely remove unnecessary stuff from these and not have to worry about syncing with puzzletron gitlab

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's discuss

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mamba code is used in other places in Puzzletron it is not so easy change,
also, supporting multiple models is #1 priority, making code simpler will only hide some problems

for sure, let's not do it now, I have 10MRs waiting in the queue, creating an issue to consider it (issue/42),

from .megatron_lm__mamba_mixer import MambaMixerMegatron
from .transformers_4_44_2__activations import ACT2FN
from .transformers_4_44_2__cache_utils import Cache, StaticCache
from .transformers_4_44_2__modeling_attn_mask_utils import AttentionMaskConverter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know the original person to copy all these transformers files to DeciLM? I want to understand why is there a need to copy files from a specific transformers version instead of just doing from transformers.modeling_attn_mask_utils import AttentionMaskConverter (from pip installed transformers version)

One reason I can think of could be if transformers moves these functions around in the codebase, our imports will fail. But that is the case for rest of modelopt as well where we import something from transformers or torch and if newer versions break bw compatibility, we just fix our imports. We can always pin a transformers version in requirements but that can always be upgraded from time to time without needing to copy-pase and maintain full files from transformers repo

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change these imports from local copied files to transformers package and see if our Llama pruning example works fine or not and get rid of these files which will greatly simplify everything for us

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just trying if it works is a bad idea, our integration tests are not extensive, let's talk to people who implemented it and understand better

Comment on lines +26 to +28
from .transformers_4_44_2__cache_utils import Cache as Cache_4_44_2
from .transformers_4_44_2__cache_utils import SinkCache, SlidingWindowCache, StaticCache
from .transformers_4_51_3__cache_utils import HybridChunkedCache
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see all these except SinkCache in latest transformers: https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py and we can directly just import these

SinkCache was deprecated sometime ago in transformers: huggingface/transformers#38399

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's have discussion with other people first about using copied transformers code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants