Skip to content

COPY OF 2345#2354

Draft
xiaoyu-work wants to merge 27 commits intomainfrom
xiaoyu/qwen3-vl
Draft

COPY OF 2345#2354
xiaoyu-work wants to merge 27 commits intomainfrom
xiaoyu/qwen3-vl

Conversation

@xiaoyu-work
Copy link
Collaborator

Describe your changes

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

hanbitmyths and others added 27 commits February 26, 2026 11:19
- graph_surgeries.py: add QwenVL-specific graph surgery passes for
  vision embedding merge and positional encoding fixup
- rtn_quantization.py: extend RTN quantization for multimodal models,
  handle vision encoder exclusion patterns
- cast_chain_elimination.py: new pass to eliminate redundant Cast chains
  in Dynamo-exported models (fp32->fp16->fp32 patterns)
- olive_config.json: register new passes
…surgery passes

- rtn_quantization.py: Parameterize bits through quantization methods to support 8-bit Gather
- common.py: Fix ByteSize() crash for >2GB models, fix FOLDED_FROM_KEY import
- graph_surgeries.py: Add ReciprocalMulToDiv, DeduplicateSubgraphInitializers, DeduplicateNodes
- Apply ruff format to 4 files (cast_chain_elimination.py,
  rtn_quantization.py, test_graph_surgeries.py, test_rtn_quantization.py)
- Fix _pack_int8_to_int4 reshape bug: replace global flatten+pack with
  axis-aware _pack_int4_along_axis that correctly packs zero_point when
  k_blocks is small (e.g. 1), avoiding ValueError on reshape
- Fix test_rtn_quantization_pass_gather assertion: GatherBlockQuantized
  always uses quantize_axis=data_rank-1, not pass_config['axis']
The upstream tuning_strategies.md page no longer exists, causing the
Sphinx linkcheck to fail with -W (warnings-as-errors).
Address PR review feedback from @devang-ml and @justinchuby: use
onnxscript.optimizer.optimize() instead of ORT InferenceSession with
session.enable_cast_chain_elimination to eliminate redundant Cast chains.

- Remove onnxruntime dependency from cast_chain_elimination pass
- Use onnxscript.optimizer.optimize() with TypeInferenceError fallback
  (same pattern as OnnxPeepholeOptimizer)
- Update test comment to reflect onnxscript optimizer
- Verified: numerically identical outputs (0.00 max abs diff)
- Verified: no eval regression (69% on AI2D 100 samples)
Resolve conflict in olive/passes/onnx/common.py: take upstream fix
from PR #2355 (ByteSize EncodeError handling).
…n elimination

Use a custom CastCastRoundTrip rewrite rule instead of the full
onnxscript.optimizer.optimize() call. The rewrite rule specifically
targets round-trip Cast chains (e.g. fp32->fp16->fp32) by checking
that the final cast type matches the original input type, and replaces
them with Identity.

This is simpler, faster, and avoids the TypeInferenceError fallback
that was needed with the full optimizer. The onnxscript rewrite()
function also runs RemoveUnusedNodesPass and RemoveUnusedOpsetsPass
automatically.

Validated: weights identical, 0.00 max abs diff, eval 69% unchanged.
Move _ensure_com_microsoft_opset and eliminate_cast_chains into
ModelOptimizer class. Add fix_com_microsoft_opset and
cast_chain_elimination config flags to OnnxPeepholeOptimizer.

Remove standalone OnnxCastChainElimination pass, its olive_config
entry, and its test file. Move tests into test_peephole_optimizer.py.

Per devang-ml's review: consolidate into existing pass to avoid
introducing a new one.
Add onnxscript_optimize, onnxoptimizer_optimize, and
fuse_reshape_operations config flags (default True for backward
compatibility). This allows recipe configs to disable the default
optimizations and only run opset fixup + cast chain elimination,
producing byte-identical models to the old standalone pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants