Draft
Conversation
- graph_surgeries.py: add QwenVL-specific graph surgery passes for vision embedding merge and positional encoding fixup - rtn_quantization.py: extend RTN quantization for multimodal models, handle vision encoder exclusion patterns - cast_chain_elimination.py: new pass to eliminate redundant Cast chains in Dynamo-exported models (fp32->fp16->fp32 patterns) - olive_config.json: register new passes
…surgery passes - rtn_quantization.py: Parameterize bits through quantization methods to support 8-bit Gather - common.py: Fix ByteSize() crash for >2GB models, fix FOLDED_FROM_KEY import - graph_surgeries.py: Add ReciprocalMulToDiv, DeduplicateSubgraphInitializers, DeduplicateNodes
…author (TD002), fix formatting
- Apply ruff format to 4 files (cast_chain_elimination.py, rtn_quantization.py, test_graph_surgeries.py, test_rtn_quantization.py) - Fix _pack_int8_to_int4 reshape bug: replace global flatten+pack with axis-aware _pack_int4_along_axis that correctly packs zero_point when k_blocks is small (e.g. 1), avoiding ValueError on reshape - Fix test_rtn_quantization_pass_gather assertion: GatherBlockQuantized always uses quantize_axis=data_rank-1, not pass_config['axis']
The upstream tuning_strategies.md page no longer exists, causing the Sphinx linkcheck to fail with -W (warnings-as-errors).
Address PR review feedback from @devang-ml and @justinchuby: use onnxscript.optimizer.optimize() instead of ORT InferenceSession with session.enable_cast_chain_elimination to eliminate redundant Cast chains. - Remove onnxruntime dependency from cast_chain_elimination pass - Use onnxscript.optimizer.optimize() with TypeInferenceError fallback (same pattern as OnnxPeepholeOptimizer) - Update test comment to reflect onnxscript optimizer - Verified: numerically identical outputs (0.00 max abs diff) - Verified: no eval regression (69% on AI2D 100 samples)
Resolve conflict in olive/passes/onnx/common.py: take upstream fix from PR #2355 (ByteSize EncodeError handling).
…n elimination Use a custom CastCastRoundTrip rewrite rule instead of the full onnxscript.optimizer.optimize() call. The rewrite rule specifically targets round-trip Cast chains (e.g. fp32->fp16->fp32) by checking that the final cast type matches the original input type, and replaces them with Identity. This is simpler, faster, and avoids the TypeInferenceError fallback that was needed with the full optimizer. The onnxscript rewrite() function also runs RemoveUnusedNodesPass and RemoveUnusedOpsetsPass automatically. Validated: weights identical, 0.00 max abs diff, eval 69% unchanged.
Move _ensure_com_microsoft_opset and eliminate_cast_chains into ModelOptimizer class. Add fix_com_microsoft_opset and cast_chain_elimination config flags to OnnxPeepholeOptimizer. Remove standalone OnnxCastChainElimination pass, its olive_config entry, and its test file. Move tests into test_peephole_optimizer.py. Per devang-ml's review: consolidate into existing pass to avoid introducing a new one.
Add onnxscript_optimize, onnxoptimizer_optimize, and fuse_reshape_operations config flags (default True for backward compatibility). This allows recipe configs to disable the default optimizations and only run opset fixup + cast chain elimination, producing byte-identical models to the old standalone pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes
Checklist before requesting a review
lintrunner -a(Optional) Issue link