COPY OF 2345 by xiaoyu-work · Pull Request #2354 · microsoft/Olive

xiaoyu-work · 2026-03-10T22:00:29Z

Describe your changes

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by running lintrunner -a
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

- graph_surgeries.py: add QwenVL-specific graph surgery passes for vision embedding merge and positional encoding fixup - rtn_quantization.py: extend RTN quantization for multimodal models, handle vision encoder exclusion patterns - cast_chain_elimination.py: new pass to eliminate redundant Cast chains in Dynamo-exported models (fp32->fp16->fp32 patterns) - olive_config.json: register new passes

…surgery passes - rtn_quantization.py: Parameterize bits through quantization methods to support 8-bit Gather - common.py: Fix ByteSize() crash for >2GB models, fix FOLDED_FROM_KEY import - graph_surgeries.py: Add ReciprocalMulToDiv, DeduplicateSubgraphInitializers, DeduplicateNodes

…author (TD002), fix formatting

- Apply ruff format to 4 files (cast_chain_elimination.py, rtn_quantization.py, test_graph_surgeries.py, test_rtn_quantization.py) - Fix _pack_int8_to_int4 reshape bug: replace global flatten+pack with axis-aware _pack_int4_along_axis that correctly packs zero_point when k_blocks is small (e.g. 1), avoiding ValueError on reshape - Fix test_rtn_quantization_pass_gather assertion: GatherBlockQuantized always uses quantize_axis=data_rank-1, not pass_config['axis']

The upstream tuning_strategies.md page no longer exists, causing the Sphinx linkcheck to fail with -W (warnings-as-errors).

@devang-ml

Address PR review feedback from @devang-ml and @justinchuby: use onnxscript.optimizer.optimize() instead of ORT InferenceSession with session.enable_cast_chain_elimination to eliminate redundant Cast chains. - Remove onnxruntime dependency from cast_chain_elimination pass - Use onnxscript.optimizer.optimize() with TypeInferenceError fallback (same pattern as OnnxPeepholeOptimizer) - Update test comment to reflect onnxscript optimizer - Verified: numerically identical outputs (0.00 max abs diff) - Verified: no eval regression (69% on AI2D 100 samples)

Resolve conflict in olive/passes/onnx/common.py: take upstream fix from PR #2355 (ByteSize EncodeError handling).

…n elimination Use a custom CastCastRoundTrip rewrite rule instead of the full onnxscript.optimizer.optimize() call. The rewrite rule specifically targets round-trip Cast chains (e.g. fp32->fp16->fp32) by checking that the final cast type matches the original input type, and replaces them with Identity. This is simpler, faster, and avoids the TypeInferenceError fallback that was needed with the full optimizer. The onnxscript rewrite() function also runs RemoveUnusedNodesPass and RemoveUnusedOpsetsPass automatically. Validated: weights identical, 0.00 max abs diff, eval 69% unchanged.

…ents-differ)

Move _ensure_com_microsoft_opset and eliminate_cast_chains into ModelOptimizer class. Add fix_com_microsoft_opset and cast_chain_elimination config flags to OnnxPeepholeOptimizer. Remove standalone OnnxCastChainElimination pass, its olive_config entry, and its test file. Move tests into test_peephole_optimizer.py. Per devang-ml's review: consolidate into existing pass to avoid introducing a new one.

Add onnxscript_optimize, onnxoptimizer_optimize, and fuse_reshape_operations config flags (default True for backward compatibility). This allows recipe configs to disable the default optimizations and only run opset fixup + cast chain elimination, producing byte-identical models to the old standalone pass.

hanbitmyths and others added 27 commits February 26, 2026 11:19

Fix ModelBuilder sys.path for ort-genai builders package import

514362d

Expose real ModelBuilder import error for debugging

cb1987b

Clean up ModelBuilder import fix (expose chain, not debug print)

2c2269e

Remove sys.path hack for onnxruntime-genai builder import

e77864f

Add unit tests for Qwen3-VL graph surgery and quantization passes

4d5283e

Fix lintrunner warnings: rename uppercase variables (N806), add TODO …

9fc9bd3

…author (TD002), fix formatting

Merge branch 'main' into sunghcho/qwen3-vl

32cc2ce

Add linkcheck_ignore for broken intel/neural-compressor URL

62544da

The upstream tuning_strategies.md page no longer exists, causing the Sphinx linkcheck to fail with -W (warnings-as-errors).

Merge branch 'main' into sunghcho/qwen3-vl

efe845f

Remove neural-compressor linkcheck_ignore (fixed upstream in #2351)

3d0029c

Merge branch 'main' into sunghcho/qwen3-vl

5ad0fa4

Trigger CI rebuild

448e8a2

Trigger CI rebuild (lint)

b41c25f

Trigger CI rebuild (all green)

a35f6e9

Trigger CI rebuild (CodeQL)

9846f31

Merge origin/main into sunghcho/qwen3-vl

f8146c5

Resolve conflict in olive/passes/onnx/common.py: take upstream fix from PR #2355 (ByteSize EncodeError handling).

Fix lint: move onnxscript imports to top level (PLC0415)

4ecba49

Fix lint: use functional RewriteRule API to avoid pylint W0221 (argum…

054bd7c

…ents-differ)

Move _get_cast_chain_rewrite_rules into ModelOptimizer as static method

9578497

Fix lint: remove duplicate numpy import in test (W0621/W0404)

f50743d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COPY OF 2345#2354

COPY OF 2345#2354
xiaoyu-work wants to merge 27 commits intomainfrom
xiaoyu/qwen3-vl

xiaoyu-work commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xiaoyu-work commented Mar 10, 2026

Describe your changes

Checklist before requesting a review

(Optional) Issue link

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants