[PyTorch Upstream]NaN for FlexAttention with triton xpu

### Describe the bug

There is NaN when running the flexattntion using the triton kernel with XPU but there is no this issue for CUDA backend. You can reproduce this issue by referring to https://jira.devtools.intel.com/browse/TRITONXPU-174

### Environment details

Triton commit to reproduce:

commit 650428be73bd574e5b3a9a1494b543bd5f39c104 (HEAD -> main, origin/main, origin/HEAD)

Author: Pavel Chekin <pavel.chekin@intel.com>

Date:   Mon Dec 16 22:39:04 2024 -0800

 

   Ignore cleanup errors in cache teardown. (#3020)

  

   To avoid PermissionError on Windows where certain .pyd files are locked


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PyTorch Upstream]NaN for FlexAttention with triton xpu #3094

Describe the bug

Environment details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[PyTorch Upstream]NaN for FlexAttention with triton xpu #3094

Description

Describe the bug

Environment details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions