-
Couldn't load subscription status.
- Fork 75
Add one autotuning config to the flex attention benchmark #5303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@chengjunlu @whitneywhtsang could you take a look? |
|
@admitric I tried to perform performance measurement with the additional config on PVC and BMG, but noticed no performance difference on default runners. I suspect the performance improvement can only be observed with updated driver. Will kick off a run to prove it when runners with updated drivers are available. Where do you expect the performance improvement? BMG? |
|
PVC on shape [1, 128, 128, 1024, 1024, 192, 128] |
@admitric No performance improvement is observed with agama 1188, do we also need vectorization enabled?
|
|
Changes looks good to me. |
|
I have different results on PVC, they reproduce stabily. I am using Triton main branch (on 2908846). hardcoded [FlexConfig(64, 32, 2, 4)], agama-1188, spill size 0 |
|
OK I think we need to wait for the new driver to be available in CI, so that we can confirm the new config works as expected. Note that BMG performance impact is the one we should prioritize. |
This PR adds a FlexAttention autotuner config that can show better performance on one of the shapes.