Commit 53ee6c5

authored

[CUDA] FpA IntB Gemm Weight Conversion in GPU (microsoft#24914)

### Description Implement fpA intB gemm preprocess in cuda kernel to speed up weight prepacking. ### Motivation and Context Original preprocess code (in microsoft#24854) is for CPU, which is slow and need extra memory copy between CPU and GPU.

1 parent 03b22ff commit 53ee6c5Copy full SHA for 53ee6c5

6 files changed

+834

-781

lines changed

onnxruntime/contrib_ops/cuda
- llm
- quantization
  - matmul_nbits.cc

6 files changed

+834

-781

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 53ee6c5

6 files changed

6 files changed

File tree

6 files changed

6 files changed

0 commit comments