rebase(transformers): align modeling wrappers, cache_utils and other changes to v5.3.0 and restore PyTorch/ORT parity by vbaddi · Pull Request #876 · quic/efficient-transformers

vbaddi · 2026-03-21T18:02:54Z

Rebased downstream wrapper stack to transformers v5.3.0 and aligned coupled deps (huggingface-hub, peft, diffusers) in project config.
Updated model wrapper compatibility paths across causal/VLM/audio/export flows to match upstream v5 APIs while preserving downstream public behavior.
Hardened cache compatibility layer and runtime glue for mixed legacy/new cache semantics used by downstream generation/export paths.
Fixed attention/mask/rotary call-path mismatches introduced by upstream API changes (including model-specific signature updates).
Updated AWQ/quantizer and export compatibility paths to remain ONNX-safe.
Validation evidence:

python -m pytest -q tests/test_model_quickcheck.py -n 16
Result: 26 passed.

QAic Verification Pending
E2E CI read out

cc: @quic-rishinr @quic-hemagnih @asmigosw @anujgupt-github

…T parity - Rebased downstream wrapper stack to transformers==5.3.0 and aligned coupled deps (huggingface-hub, peft, diffusers) in project config. - Updated model wrapper compatibility paths across causal/VLM/audio/export flows to match upstream v5 APIs while preserving downstream public behavior. - Hardened cache compatibility layer and runtime glue for mixed legacy/new cache semantics used by downstream generation/export paths. - Fixed attention/mask/rotary call-path mismatches introduced by upstream API changes (including model-specific signature updates). - Updated AWQ/quantizer and export compatibility paths to remain ONNX-safe. - Resolved MoE/export edge cases (including Mixtral/gpt_oss) to keep HF PyTorch -> downstream PyTorch -> ONNXRuntime token parity. - Validation evidence: pyenv activate qeff.mainline python -m pytest -q tests/test_model_quickcheck.py -n 16 Result: 26 passed. Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

…odeling_qeff Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

vbaddi assigned vbaddi and asmigosw Mar 21, 2026

vbaddi requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners March 21, 2026 18:02

vbaddi added the enhancement New feature or request label Mar 21, 2026

vbaddi added 4 commits March 21, 2026 18:07

rebase(transformers): remove align_mask hacks for models and update m…

7be0bc9

…odeling_qeff Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

rebase(transformers): phase-1 of making qeff cache_utils indepdent of HF

caa9c31

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: rebase to mainline

92ba255

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

vbaddi force-pushed the test/rebase-transformers branch from ec1d7c1 to 92ba255 Compare March 21, 2026 18:20

vbaddi marked this pull request as draft March 21, 2026 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rebase(transformers): align modeling wrappers, cache_utils and other changes to v5.3.0 and restore PyTorch/ORT parity#876

rebase(transformers): align modeling wrappers, cache_utils and other changes to v5.3.0 and restore PyTorch/ORT parity#876
vbaddi wants to merge 4 commits intoquic:mainfrom
vbaddi:test/rebase-transformers

vbaddi commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vbaddi commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants