Skip to content

Conversation

@whitneywhtsang
Copy link
Contributor

No description provided.

kernel_kwargs["num_warps"] = 8
if is_xpu():
# since the block size are big we use num_warps = 32 to avoid pressure problems.
kernel_kwargs["num_warps"] = 32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious how you chose it. Why not 16, for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32 is the max with double GRF.

stride_am, stride_ak, #
stride_bk, stride_bn, #
stride_cm, stride_cn, #
stride_am: tl.constexpr, stride_ak: tl.constexpr, #
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this increase compilation time? So, each new value requires a recompile, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but without constexpr, 2d block io cannot be used, and runtime is slow.

@etiotto etiotto merged commit b3ce5fb into main Dec 3, 2025
25 checks passed
@etiotto etiotto deleted the whitneywhtsang/mxfp branch December 3, 2025 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants