Skip to content

Conversation

@Yu-Zhewen
Copy link
Contributor

Summary

This PR generalizes MLIR ukernel generation by introducing a Python preprocessing approach to reduce code duplication and improve maintainability.

Motivation

Before: We currently have 7 hand-written MLIR ukernels, totaling ~3k lines of repetitive code with significant boilerplate:

  • iree_uk_amdgpu_dt_matmul_f16.mlir
  • iree_uk_amdgpu_dt_matmul_f8E4M3FNUZ.mlir
  • iree_uk_amdgpu_dt_scaled_matmul_f4E2M1FN.mlir
  • iree_uk_amdgpu_matmul_bf16.mlir
  • iree_uk_amdgpu_matmul_f16.mlir
  • iree_uk_amdgpu_matmul_f8E4M3FN.mlir
  • iree_uk_amdgpu_matmul_f8E4M3FNUZ.mlir

Each ukernel variant required manual duplication with minor parameter changes (element types, intrinsics, unroll configurations, etc.).

After: This PR introduces a template-based generation system where we only need to maintain some template files (.mlir.in, around 1k lines at the moment).

See TEMPLATE_GUIDE.md for all generation commands.

Plan

Assisted-by: Cursor AI

Abhishek-Varma and others added 2 commits January 24, 2026 13:03
-- This commit ports gfx942 ukernel to gfx950 for data tiling.

Signed-off-by: Abhishek Varma <[email protected]>
Signed-off-by: Yu-Zhewen <[email protected]>
Signed-off-by: Yu-Zhewen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants