Add decilm modelling code #505

danielkorzekwa · 2025-11-04T10:31:18Z

What does this PR do?

Add decilm modelling code

Signed-off-by: Daniel Korzekwa <[email protected]>

kevalmorabia97

Some high-level questions

kevalmorabia97 · 2025-11-04T10:36:29Z

modelopt/torch/_compress/decilm/deci_lm_hf_code/tokenization_mistral.py

Why is tokenization_mistral.py needed? Perhaps we hold off on that until we need it?

This is not used afaik. For now, I do not skip any files from deci_lm_hf_code - it will be much easier to sync with the internal code in the meantime.

kevalmorabia97 · 2025-11-04T10:37:57Z

modelopt/torch/_compress/decilm/deci_lm_hf_code/transformers_4_44_2__activations.py

Can we get rid of transformers_4_44_2 files? Rest of ModelOpt features only support transformers>=4.48.

Also does DeciLM only support 2 transformers versions at the moment: 4.44.2 and 4.51.3?

Similar comment as for tokenization_mistral.py

Do you know what is the motivation to keep files for both transformers versions and instead not import directly from transformers package? Do we make any changes in these transformers_4_44_2_*.py files?

I do not know, let's talk to people who implemented it.

kevalmorabia97 · 2025-11-04T10:38:59Z

modelopt/torch/_compress/decilm/deci_lm_hf_code/megatron_lm__mamba_mixer.py

Why do we need megatron_lm related files here? Perhaps we hold off on that until we need it?

support for hybrid model compression

Is the hybrid model not in DeciLM format?

DeciLM model code used megatron_lm mixer class for mamba support

codecov · 2025-11-04T10:45:52Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.40%. Comparing base (1c12fd8) to head (418890e).

Additional details and impacted files

@@                Coverage Diff                @@
##           feature/compress     #505   +/-   ##
=================================================
  Coverage             73.40%   73.40%           
=================================================
  Files                   180      180           
  Lines                 18127    18127           
=================================================
  Hits                  13306    13306           
  Misses                 4821     4821

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

modelopt/torch/_compress/decilm/deci_lm_hf_code/block_config.py

kevalmorabia97 · 2025-11-06T20:14:04Z

modelopt/torch/_compress/decilm/deci_lm_hf_code/tokenization_decilm.py

Can DeciLM not us a standard HF AutoTokenizer?

I do not know, create an issue to investigate: issues/41

kevalmorabia97 · 2025-11-06T20:20:59Z

modelopt/torch/_compress/decilm/deci_lm_hf_code/modeling_decilm.py

+from .transformers_4_44_2__activations import ACT2FN
+from .transformers_4_44_2__cache_utils import Cache, StaticCache
+from .transformers_4_44_2__modeling_attn_mask_utils import AttentionMaskConverter
+from .transformers_4_44_2__modeling_flash_attention_utils_backward_compat import (
+    _flash_attention_forward,
+)
+from .transformers_4_44_2__modeling_outputs import (
+    BaseModelOutputWithPast,
+    CausalLMOutputWithPast,
+    MoeCausalLMOutputWithPast,
+    MoeModelOutputWithPast,
+    QuestionAnsweringModelOutput,
+    SequenceClassifierOutputWithPast,
+    TokenClassifierOutput,
+)
+from .transformers_4_44_2__modeling_rope_utils import ROPE_INIT_FUNCTIONS
+from .transformers_4_44_2__pytorch_utils import ALL_LAYERNORM_LAYERS
+from .transformers_4_51_3__modeling_llama4_attention import Llama4TextAttention, Llama4TextConfig


I looked at these files in puzzletron. They are static one-time copied files and never modified after copying so we can surely remove unnecessary stuff from these and not have to worry about syncing with puzzletron gitlab

let's discuss

Mamba code is used in other places in Puzzletron it is not so easy change,
also, supporting multiple models is #1 priority, making code simpler will only hide some problems

for sure, let's not do it now, I have 10MRs waiting in the queue, creating an issue to consider it (issue/42),

kevalmorabia97 · 2025-11-06T20:24:09Z

modelopt/torch/_compress/decilm/deci_lm_hf_code/modeling_decilm.py

+from .megatron_lm__mamba_mixer import MambaMixerMegatron
+from .transformers_4_44_2__activations import ACT2FN
+from .transformers_4_44_2__cache_utils import Cache, StaticCache
+from .transformers_4_44_2__modeling_attn_mask_utils import AttentionMaskConverter


Do you know the original person to copy all these transformers files to DeciLM? I want to understand why is there a need to copy files from a specific transformers version instead of just doing from transformers.modeling_attn_mask_utils import AttentionMaskConverter (from pip installed transformers version)

One reason I can think of could be if transformers moves these functions around in the codebase, our imports will fail. But that is the case for rest of modelopt as well where we import something from transformers or torch and if newer versions break bw compatibility, we just fix our imports. We can always pin a transformers version in requirements but that can always be upgraded from time to time without needing to copy-pase and maintain full files from transformers repo

We can change these imports from local copied files to transformers package and see if our Llama pruning example works fine or not and get rid of these files which will greatly simplify everything for us

just trying if it works is a bad idea, our integration tests are not extensive, let's talk to people who implemented it and understand better

kevalmorabia97 · 2025-11-06T20:29:54Z

modelopt/torch/_compress/decilm/deci_lm_hf_code/variable_cache.py

+from .transformers_4_44_2__cache_utils import Cache as Cache_4_44_2
+from .transformers_4_44_2__cache_utils import SinkCache, SlidingWindowCache, StaticCache
+from .transformers_4_51_3__cache_utils import HybridChunkedCache


I see all these except SinkCache in latest transformers: https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py and we can directly just import these

SinkCache was deprecated sometime ago in transformers: huggingface/transformers#38399

let's have discussion with other people first about using copied transformers code

danielkorzekwa added 13 commits November 3, 2025 21:05

Add decilm modelling code

694c317

Signed-off-by: Daniel Korzekwa <[email protected]>

Add decilm modelling code.

991659f

Signed-off-by: Daniel Korzekwa <[email protected]>

Add transformers codebase

8489cee

Signed-off-by: Daniel Korzekwa <[email protected]>

Add transformers code

f0afefe

Signed-off-by: Daniel Korzekwa <[email protected]>

Add decilm modelling code

b3ed5bc

Signed-off-by: Daniel Korzekwa <[email protected]>

Add decilm modelling code

a700da5

Signed-off-by: Daniel Korzekwa <[email protected]>

Correct licence headers

b59b679

Signed-off-by: Daniel Korzekwa <[email protected]>

Correct licence headers

1abdf3e

Signed-off-by: Daniel Korzekwa <[email protected]>

Add decilm code

66609b1

Signed-off-by: Daniel Korzekwa <[email protected]>

Add decilm code

7da0a8a

Signed-off-by: Daniel Korzekwa <[email protected]>

Add decilm code

6e09a81

Signed-off-by: Daniel Korzekwa <[email protected]>

Add decilm code

2e3f5da

Signed-off-by: Daniel Korzekwa <[email protected]>

Add decilm code

418890e

Signed-off-by: Daniel Korzekwa <[email protected]>

danielkorzekwa requested review from a team as code owners November 4, 2025 10:31

danielkorzekwa requested review from kevalmorabia97 and removed request for a team November 4, 2025 10:31

kevalmorabia97 reviewed Nov 4, 2025

View reviewed changes

kevalmorabia97 requested review from AAnoosheh and ChenhanYu November 4, 2025 10:41

kevalmorabia97 reviewed Nov 6, 2025

View reviewed changes

Add decilm modelling code #505

Are you sure you want to change the base?

Add decilm modelling code #505

Conversation

danielkorzekwa commented Nov 4, 2025

What does this PR do?

Uh oh!

kevalmorabia97 left a comment

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Nov 4, 2025

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kevalmorabia97 Nov 4, 2025 •

edited

Loading

kevalmorabia97 Nov 4, 2025 •

edited

Loading