Commit 903002a
Improve API for f4f4bf16 (#5163)
Summary:
Pull Request resolved: #5163
X-link: https://github.com/facebookresearch/FBGEMM/pull/2162
We add some improvements for FP4 gemm.
- Remove the need to pass `use_mx`, we can infer this based on `global_scale`
- As a follow up we should improve the assertions on the proper FP4 dtypes, similar to what we have with [FP4 group gemm](https://www.internalfb.com/code/fbsource/[addad803d330]/fbcode/deeplearning/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16_grouped.cu?lines=388-409).
- Add optional `output` to API, which is in-line with other torch APIs.
- Move function declaration to `torch_ops.h` which will remove the need for the forward declaration in Blas.cpp
- Small code cleans ups
Misc
- Later we should likely clean-up & re-evaluate the heuristic for the kernel, right now its almost identical (and duplicated) for MX and NV FP4, and we are likely instantiating more instances than needed.
Reviewed By: slayton58
Differential Revision: D87655845
fbshipit-source-id: d3ddd1f7efe7683fb8615ab2f6febe438ce6b3801 parent 1179289 commit 903002a
File tree
53 files changed
+313
-210
lines changed- fbgemm_gpu/experimental
- gemm/triton_gemm
- gen_ai
- bench
- src/quantize
- cutlass_extensions
- f4f4bf16
- include/fbgemm_gpu
- test/quantize
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
53 files changed
+313
-210
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
289 | 289 | | |
290 | 290 | | |
291 | 291 | | |
292 | | - | |
| 292 | + | |
293 | 293 | | |
294 | 294 | | |
295 | 295 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2385 | 2385 | | |
2386 | 2386 | | |
2387 | 2387 | | |
2388 | | - | |
| 2388 | + | |
2389 | 2389 | | |
2390 | 2390 | | |
2391 | 2391 | | |
| |||
2471 | 2471 | | |
2472 | 2472 | | |
2473 | 2473 | | |
2474 | | - | |
| 2474 | + | |
2475 | 2475 | | |
2476 | 2476 | | |
2477 | 2477 | | |
| |||
0 commit comments