Commit 49804aa
authored
Cherry pick vLLM related bug fixes for 2.8 release branch (#5732)
1). FP8 GEMM support input shape [M, B, K]
keep output shape as [M, B, N].
keep output stride similar as input. In some scenario, the input shape is [M, B, K] and stride is [K, M*K, 1]. We have to make the output stride is [N, M*N, 1] to keep consistency.
2). fix QWEN 32B int4 TP=8 bug
When we run QWEN-32B int4 model with TP=8. The weight will be [80, 5120]. And the group_size of int4 gemm is 128 which is larger then gemm in_feature (80). So, the group_size is changed to 80 which is erroneous.1 parent 4d90735 commit 49804aa
File tree
4 files changed
+32
-12
lines changed- csrc/gpu
- aten/operators/fp8
- oneDNN
- intel_extension_for_pytorch/nn/utils
- tests/gpu/examples
4 files changed
+32
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
139 | 139 | | |
140 | 140 | | |
141 | 141 | | |
142 | | - | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
143 | 151 | | |
144 | 152 | | |
145 | 153 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
738 | 738 | | |
739 | 739 | | |
740 | 740 | | |
741 | | - | |
| 741 | + | |
742 | 742 | | |
743 | 743 | | |
744 | 744 | | |
| |||
791 | 791 | | |
792 | 792 | | |
793 | 793 | | |
794 | | - | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
795 | 806 | | |
796 | 807 | | |
797 | 808 | | |
798 | | - | |
| 809 | + | |
799 | 810 | | |
800 | 811 | | |
801 | 812 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
316 | 316 | | |
317 | 317 | | |
318 | 318 | | |
319 | | - | |
320 | | - | |
321 | | - | |
322 | | - | |
323 | | - | |
| 319 | + | |
324 | 320 | | |
325 | 321 | | |
326 | 322 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
69 | | - | |
| 69 | + | |
| 70 | + | |
70 | 71 | | |
71 | 72 | | |
72 | 73 | | |
73 | | - | |
| 74 | + | |
74 | 75 | | |
75 | 76 | | |
76 | 77 | | |
| |||
105 | 106 | | |
106 | 107 | | |
107 | 108 | | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
108 | 112 | | |
| 113 | + | |
109 | 114 | | |
110 | 115 | | |
111 | 116 | | |
112 | 117 | | |
113 | 118 | | |
114 | | - | |
| 119 | + | |
0 commit comments