Skip to content

Commit c4f7acf

Browse files
authored
[Inference] FP8 dual gemm auto-tune and support compile parallelization (#9151)
* fp8 * check * check * check * check * cutlass fp8 * fp8 chech * check * ffn1 tune * delete * check * change file path * top_p_sampling_reject.cu
1 parent 25a5b4f commit c4f7acf

22 files changed

+1000
-1010
lines changed

csrc/README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,11 @@ pip install -r requirements.txt
1010

1111
## 编译 Cuda 算子
1212

13-
生成 FP8的 cutlass 算子(编译耗时较长)
13+
生成 FP8的 cutlass 算子
1414
```shell
15-
python generate_code_gemm_fused_kernels.py
15+
python utils/auto_gen_fp8_fp8_gemm_fused_kernels.py
16+
17+
python utils/auto_gen_fp8_fp8_dual_gemm_fused_kernels.py
1618
```
1719

1820
编译

csrc/gpu/cutlass_kernels/fp8_gemm_fused/dispatch_dual_gemm_scale_bias_swiglu.h

Lines changed: 0 additions & 31 deletions
This file was deleted.

csrc/gpu/cutlass_kernels/fp8_gemm_fused/dual_gemm_scale_bias_swiglu_16_32_64_stages3.h

Lines changed: 0 additions & 185 deletions
This file was deleted.

0 commit comments

Comments
 (0)