Skip to content

Latest commit

 

History

History
18 lines (16 loc) · 946 Bytes

File metadata and controls

18 lines (16 loc) · 946 Bytes

SYCL CUTLASS Changelog

  • GEMM/StreamK/SplitK with support for FP16 data type
  • Flash attention prefill with Paged KV cache with support for FP16 data type
  • Performance improvements for flash attention prefill and decode
  • Support for Intel GPU Data Center Max (1100 and 1550)
  • Support for Intel Arc B580 Battlemage
  • GEMM/StreamK/SplitK with support for bfloat16 data type
  • Flash attention prefill and decode with KV cache with support for bfloat16 data type
  • Support for epilogue operations:
    • Element-wise, row-wise and column-wise bias
    • ReLU, SiLU, GELU activation fns
    • Softmax
  • Mixed precision GEMM (bfloat16/int8, half/int4) with dequantization support
  • Dual GEMM & Grouped GEMM