-
Notifications
You must be signed in to change notification settings - Fork 78
[XPU] Enable reduction optimization by default #2846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add reduction optimization pass to the pipeline by default.
|
Performance gains can be seen under @chengjunlu I'd say we go with this for now. You can disable it when doing your current work. This is giving performance now and having this would make testing the followups we have in mind easier. WDYT? |
|
Looks like the wide vector flag is giving some better performance, but this is noticeable enough. Just a minor bump from that flag. |
LGTM. |
| if os.getenv("TRITON_INTEL_OPTIMIZE_REDUCTION_LOCALITY", "0") == "1": | ||
| intel.passes.ttgpuir.add_optimize_reduction_locality(pm) | ||
| intel.passes.ttgpuir.add_optimize_elementwise_parallelism(pm) | ||
| intel.passes.ttgpuir.add_optimize_reduction_locality(pm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use if os.getenv("TRITON_INTEL_OPTIMIZE_REDUCTION_LOCALITY", "1") == "1": to allow the user to disable it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course. I'll add that when I finish the investigation
|
From https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/12061551469, it appears that the reduction optimization has a negative geomean impact to FA (causal=false, dhead=64) with the new IGC driver (improved instruction scheduling). Can we investigate it further before enabling it by default? |
Yes! I can look into that. |
|
Waiting till regressions are fixed. #2907 is the best attempt we have currently. |
|
@sommerlukas can we close this draft PR? |
I think as the performance results were a bit mixed, we agreed with @etiotto and @whitneywhtsang to not enable this by default for now. Closing the PR. |
Add reduction optimization pass to the pipeline by default.