Skip to content

Conversation

@Antonyvance
Copy link

@Antonyvance Antonyvance commented Oct 23, 2025

Intel Xe Architecture Support for CUTLASS Library generation

Feature: Add Intel Xe12/Xe20 architecture support with operation generation and Python bindings.

Use Case: Enable kernel generation for PyTorch inductor path and ML frameworks on Intel Arc/PVC GPUs.

Key Changes:

  • Architecture Support: Added Xe12 (PVC) and Xe20 (BMG) with compute capability 12-50
  • Operations: FP16, BF16, FP8 (E4M3/E5M2), INT8 GEMM kernels with multiple tile sizes (256×256, 128×256, etc.)
  • Build Flags: New CMake options -DCUTLASS_LIBRARY_GENERATOR_ARCHS="20" for Intel GPU targets
  • Python Integration: CMake-based shared library (examples/11_xe20_cutlass_library/) + ctypes bindings
  • Generator: Extended python/cutlass_library/generator.py with GenerateIntelXe() functions
  • Examples: Python test scripts with performance benchmarking

Testing: ✅ Tested BF16 generated kernels, Examples, Documentation

Note These changes do not make use of new APIs (or modified collectives). That must be different feature / refactoring effort.

ToDo:

  • Build Failures
  • Benchmark tests for comprehensive performance analysis
  • Testing kernels beyond BF16 (FP16, FP8, INT8)
  • Optimizing generated kernels with tile sizes
  • Modify CMake to avoid explicitly linking with libsycl.so

Type: Feature | Tested On: Xe20 ✅

@Antonyvance Antonyvance added enhancement New feature or request release urgent PR requires a urgent attention (for release or blocking another PR) labels Oct 23, 2025
@Antonyvance Antonyvance added this to the 0.6 milestone Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request release urgent PR requires a urgent attention (for release or blocking another PR)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant