Skip to content

rebase(transformers): align modeling wrappers, cache_utils and other changes to v5.3.0 and restore PyTorch/ORT parity#876

Draft
vbaddi wants to merge 4 commits intoquic:mainfrom
vbaddi:test/rebase-transformers
Draft

rebase(transformers): align modeling wrappers, cache_utils and other changes to v5.3.0 and restore PyTorch/ORT parity#876
vbaddi wants to merge 4 commits intoquic:mainfrom
vbaddi:test/rebase-transformers

Conversation

@vbaddi
Copy link
Contributor

@vbaddi vbaddi commented Mar 21, 2026

  • Rebased downstream wrapper stack to transformers v5.3.0 and aligned coupled deps (huggingface-hub, peft, diffusers) in project config.
  • Updated model wrapper compatibility paths across causal/VLM/audio/export flows to match upstream v5 APIs while preserving downstream public behavior.
  • Hardened cache compatibility layer and runtime glue for mixed legacy/new cache semantics used by downstream generation/export paths.
  • Fixed attention/mask/rotary call-path mismatches introduced by upstream API changes (including model-specific signature updates).
  • Updated AWQ/quantizer and export compatibility paths to remain ONNX-safe.
  • Validation evidence:
python -m pytest -q tests/test_model_quickcheck.py -n 16
Result: 26 passed.
  • QAic Verification Pending
  • E2E CI read out

cc: @quic-rishinr @quic-hemagnih @asmigosw @anujgupt-github

vbaddi added 4 commits March 21, 2026 18:07
…T parity

- Rebased downstream wrapper stack to transformers==5.3.0 and aligned coupled deps
    (huggingface-hub, peft, diffusers) in project config.
- Updated model wrapper compatibility paths across causal/VLM/audio/export flows
    to match upstream v5 APIs while preserving downstream public behavior.
- Hardened cache compatibility layer and runtime glue for mixed legacy/new cache
    semantics used by downstream generation/export paths.
- Fixed attention/mask/rotary call-path mismatches introduced by upstream API
    changes (including model-specific signature updates).
- Updated AWQ/quantizer and export compatibility paths to remain ONNX-safe.
- Resolved MoE/export edge cases (including Mixtral/gpt_oss) to keep
    HF PyTorch -> downstream PyTorch -> ONNXRuntime token parity.
- Validation evidence:
    pyenv activate qeff.mainline
    python -m pytest -q tests/test_model_quickcheck.py -n 16
    Result: 26 passed.

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
…odeling_qeff

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
@vbaddi vbaddi force-pushed the test/rebase-transformers branch from ec1d7c1 to 92ba255 Compare March 21, 2026 18:20
@vbaddi vbaddi marked this pull request as draft March 21, 2026 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants