Skip to content

feat(rebase + unify subfunctions): rebase to transformers v4.57.3 and align the subfunction changes.#882

Draft
vbaddi wants to merge 14 commits intoquic:mainfrom
vbaddi:feat/rebase_transformers_unify_subfunctions
Draft

feat(rebase + unify subfunctions): rebase to transformers v4.57.3 and align the subfunction changes.#882
vbaddi wants to merge 14 commits intoquic:mainfrom
vbaddi:feat/rebase_transformers_unify_subfunctions

Conversation

@vbaddi
Copy link
Contributor

@vbaddi vbaddi commented Mar 25, 2026

This PR rebases dev/rebase_transformers_v4_57_3 onto main and consolidates our transformer rebase changes with the PR #880 subfunction/KV alignment so we keep the branch simpler and unify the subfunction approach.

What changed

  • KV/subfunction alignment:
    • Applied PR Rope Fix for a single subfunction signature #880-style wrapper changes for causal model families to reduce divergence from mainline.
    • Removed local resolve_kv_seq_len usage from remaining wrappers (grok_1, molmo) to match the cache-native pattern used elsewhere.
    • Removed now-unused helper resolve_kv_seq_len from QEfficient/utils/_utils.py.
  • Unit test updates:
    • Added a new quickcheck unit test for use_onnx_subfunctions=True that validates decoder-block subfunction cardinality per causal model.
    • Important: test counts only decoder model block functions (via get_submodules_for_export()), not all ONNX helper functions, so the assertion tracks the intended behavior.

Decoder-block subfunction status (causal model list)

  • Single decoder-block subfunction: falcon, gpt2, gptj, granite, llama, mistral, mpt, olmo2, phi3, qwen2
  • Multiple decoder-block subfunctions: codegen, gpt_oss, mixtral, phi (Phi-1), starcoder2

Tests verified

python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -n auto
Result after subfunction-count + KV-helper cleanup: 75 passed, 1 skipped

vbaddi added 5 commits March 25, 2026 07:47
- Pin transformers to 4.57.3
- Keep QEff cache internals self-owned (CacheLayerMixin/Cache adapter path), with legacy interop.
- Update model kv_seq_len calls to use cross-version cache-length resolution.
- Add small quantizer compatibility guards (AWQ/update_dtype paths).

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
…ve unused kv helper

- Add a causal-LM unit quickcheck that exports with use_onnx_subfunctions=True and
    asserts decoder-block subfunction cardinality (single vs multi) per model
    expectations.
- Count only decoder block functions derived from get_submodules_for_export(), not
    all ONNX helper functions.
- Remove unused resolve_kv_seq_len from QEfficient/utils/_utils.py after migrating
    wrappers away from it.

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>

query_states, key_states = qeff_apply_rotary_pos_emb(
query_states, key_states, cos_cached, sin_cached, position_ids[1:], self.rope_scaling["mrope_section"]
query_states, key_states, cos, sin, position_ids[1:], self.rope_scaling["mrope_section"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's keep this consistent as *_cached

def __qeff_init__(self):
self.rotary_emb = QEffQwen2_5_VLRotaryEmbedding(config=self.config)
QEffQwen2_5_VLRotaryEmbedding._max_seq_len_cached = self.config.max_position_embeddings
self.rotary_emb._set_cos_sin_cache(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is this done for all models too?

abhishek-singh591 and others added 3 commits March 26, 2026 06:07
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
@qcdipankar qcdipankar marked this pull request as draft March 26, 2026 08:21
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants