Commit 955ba64
Optimization for quantized gemm skinny sizes (ROCm#411)
* Optimization for quantized gemm skinny sizes
* lint fix
* Add support for bf16/fp16
* code cleanup
* code cleanup
* lint fix2
* cleanup
* Moved the logic into tuned gemm to preserve API compatibility
---------
Co-authored-by: Gregory Shtrasberg <[email protected]>
Co-authored-by: Gregory Shtrasberg <[email protected]>1 parent 17b26bd commit 955ba64
File tree
7 files changed
+559
-52
lines changed- csrc/rocm
- vllm
- model_executor/layers
- quantization/utils
7 files changed
+559
-52
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
51 | 69 | | |
52 | 70 | | |
53 | 71 | | |
| |||
0 commit comments