-
Notifications
You must be signed in to change notification settings - Fork 0
Lt/te reasoning #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request introduces support for Large Reasoning Model (LRM) optimization by implementing advanced KV-cache token eviction strategies. The main enhancements include:
- Implementation of RKV algorithm for token importance scoring during both prefill and decode stages
- Addition of refined selection methods (KVCrush and DiverseKV) to improve token diversity and reduce redundancy
- Support for Qwen3 and Phi3 model architectures
- Mathematical reasoning benchmark infrastructure for MATH500 and GSM8K datasets
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
modules/genai_optimizations/setup.py |
Adds dependencies for mathematical reasoning (latex2sympy2, word2number) |
modules/genai_optimizations/genai_opt/token_eviction.py |
Implements RKV algorithm, refined selection modes (KVCrush/DiverseKV), adaptive budget allocation, and key similarity computation |
modules/genai_optimizations/genai_opt/sparse_attention.py |
Adds forward functions for Qwen3 and Phi3 models, fixes dropout parameter naming |
modules/genai_optimizations/genai_opt/__init__.py |
Exports KVCacheRefinedSelection enum |
modules/genai_optimizations/benchmarks/utils.py |
Adds CLI arguments for refined selection algorithms |
modules/genai_optimizations/benchmarks/reasoning_parser.py |
New file implementing mathematical answer extraction and evaluation logic |
modules/genai_optimizations/benchmarks/math500_gsm_bench.py |
New benchmark script for MATH500 and GSM8K datasets |
modules/genai_optimizations/benchmarks/README.md |
Documents LRM optimization example |
modules/genai_optimizations/README.md |
Documents RKV mode, refined selection algorithms, and supported LRM models |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| string = string.replace("infinity", "\\infty") | ||
| if "\\infty" not in string: | ||
| string = string.replace("inf", "\\infty") | ||
| string = string.replace("+\\inity", "\\infty") |
Copilot
AI
Oct 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of '\inity' to '\infty'. The code appears to be attempting to fix typos but includes a typo itself.
| string = string.replace("+\\inity", "\\infty") | |
| string = string.replace("\\inity", "\\infty") |
f5a3b72 to
00e29dd
Compare
00e29dd to
0d51c55
Compare
* Update usage of shared stream buffer Signed-off-by: Raasz, Pawel <[email protected]> * Add onnxscript to custom operation req tests Signed-off-by: Raasz, Pawel <[email protected]> * Fix FFT custom op mapping to use it instead ops in OV Signed-off-by: Raasz, Pawel <[email protected]> * Fix FFT evaluate seg faults and update not supported configurations Signed-off-by: Raasz, Pawel <[email protected]> * Skip calculate_grid test Signed-off-by: Raasz, Pawel <[email protected]> * Number of inputs in custom FFT compatible with ONNX DFT Signed-off-by: Raasz, Pawel <[email protected]> * Revert input check for FFT and set version for onnxscript Signed-off-by: Raasz, Pawel <[email protected]> --------- Signed-off-by: Raasz, Pawel <[email protected]>
d643b0c to
20f7e00
Compare
No description provided.