Skip to content

Commit 53ee6c5

Browse files
authored
[CUDA] FpA IntB Gemm Weight Conversion in GPU (microsoft#24914)
### Description Implement fpA intB gemm preprocess in cuda kernel to speed up weight prepacking. ### Motivation and Context Original preprocess code (in microsoft#24854) is for CPU, which is slow and need extra memory copy between CPU and GPU.
1 parent 03b22ff commit 53ee6c5

File tree

6 files changed

+834
-781
lines changed

6 files changed

+834
-781
lines changed

0 commit comments

Comments
 (0)