Describe the bug
There is NaN when running the flexattntion using the triton kernel with XPU but there is no this issue for CUDA backend. You can reproduce this issue by referring to https://jira.devtools.intel.com/browse/TRITONXPU-174
Environment details
Triton commit to reproduce:
commit 650428b (HEAD -> main, origin/main, origin/HEAD)
Author: Pavel Chekin [email protected]
Date: Mon Dec 16 22:39:04 2024 -0800
Ignore cleanup errors in cache teardown. (#3020)
To avoid PermissionError on Windows where certain .pyd files are locked