add fp8 quantization kernels #12

baodii · 2025-08-13T04:54:29Z

add a) static_scaled_fp8_quant; b) dynamic_scaled_fp8_quant; c) dynamic_per_token_scaled_fp8_quant kernels.

Copilot

Pull Request Overview

This PR adds FP8 quantization kernels to the vLLM XPU implementation, providing three different quantization methods for FP8 (float8) data types. The implementation includes comprehensive test coverage and follows the existing kernel architecture pattern.

Adds static, dynamic per-tensor, and dynamic per-token FP8 quantization kernels
Provides comprehensive test suite with reference implementations for validation
Extends the dispatch utilities to support FP8_e5m2 data type in addition to existing FP8_e4m3fn

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test_fp8_quant.py	Main test file with reference implementations and parametrized test cases
tests/register_ops.py	Python bindings for the three new FP8 quantization operations
tests/ops/fp8_quant_op.py	High-level Python wrapper function for FP8 quantization operations
csrc/xpu/torch_bindings.cpp	PyTorch operator registration for the three FP8 quantization kernels
csrc/xpu/quantization/fp8/utils.h	Common utilities and type definitions for FP8 quantization
csrc/xpu/quantization/fp8/fp8_quant.h	Core FP8 quantization kernel implementations and helper functions
csrc/xpu/quantization/fp8/fp8_quant.cpp	C++ implementation of the three FP8 quantization functions
csrc/xpu/ops.h	Function declarations for the new FP8 quantization operations
csrc/xpu/dispatch_utils.h	Extended dispatch macros to include FP8_e5m2 data type
CMakeLists.txt	Build configuration update to include the new FP8 quantization source file

Comments suppressed due to low confidence (1)

csrc/xpu/quantization/fp8/fp8_quant.h:73

Commented-out debug printf statements should be removed to improve code cleanliness and maintainability.

        static_cast<float>(input[i]), scale);

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-13T04:55:18Z

tests/test_fp8_quant.py

+    s_1 = as_float32_tensor(1.0)
+    s_512 = as_float32_tensor(512.0)
+
+    # For fp8, in order to match the cuda kernel output, we have to do exactly


The comment mentions 'cuda kernel output' but this is for XPU implementation. It should say 'xpu kernel output' to be consistent with the context.

Suggested change

# For fp8, in order to match the cuda kernel output, we have to do exactly

# For fp8, in order to match the xpu kernel output, we have to do exactly

Copilot · 2025-08-13T04:55:19Z

csrc/xpu/quantization/fp8/fp8_quant.h

+  //     quant_type_max_v<fp8_type>: %f, r: %f\n", token_id, tid, val, scale, x,
+  //     static_cast<float>(quant_type_max_v<fp8_type>),
+  //     r);
+  // sycl::ext::oneapi::experimental::printf("scaled_fp8_conversion: %f\n", r);


Commented-out debug printf statements should be removed to improve code cleanliness and maintainability.

Suggested change

// sycl::ext::oneapi::experimental::printf("scaled_fp8_conversion: %f\n", r);

Copilot · 2025-08-13T04:55:19Z

tests/ops/fp8_quant_op.py

+
+# Add parent directory to Python path
+# sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
+


Commented-out sys.path manipulation code should be removed if it's not needed, as it can be confusing and reduces code maintainability.

Suggested change

zufangzhu · 2025-08-14T01:51:09Z

csrc/xpu/quantization/fp8/utils.h

Hi @baodii, can we merge this file with quant_utils.h into one?

Signed-off-by: baodii <[email protected]>

…diffence is too huge Signed-off-by: baodii <[email protected]>

Signed-off-by: Zhu, Zufang <[email protected]>

Copilot AI review requested due to automatic review settings August 13, 2025 04:54

Copilot AI reviewed Aug 13, 2025

View reviewed changes

zufangzhu reviewed Aug 14, 2025

View reviewed changes

csrc/xpu/quantization/fp8/utils.h Outdated

Copy link

Contributor

zufangzhu Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @baodii, can we merge this file with quant_utils.h into one?

baodii added 4 commits August 13, 2025 22:20

add fp8 quant kernels

1b96119

Signed-off-by: baodii <[email protected]>

add fp8_e5m2 support and fixing UT

3a318ff

Signed-off-by: baodii <[email protected]>

add per-token quanzation

dcf2d5e

Signed-off-by: baodii <[email protected]>

change per-token UT assert_close by percentage of mismatch since fp8 …

56ebb81

…diffence is too huge Signed-off-by: baodii <[email protected]>

zufangzhu force-pushed the baodi/fp8_kernels branch from 15d689b to 9ff5526 Compare August 14, 2025 07:00

zufangzhu added 2 commits August 14, 2025 00:22

merge redundant files

5bbb0cc

Signed-off-by: Zhu, Zufang <[email protected]>

remove redefine struct

059a372

Signed-off-by: Zhu, Zufang <[email protected]>

zufangzhu force-pushed the baodi/fp8_kernels branch from 2455acb to 059a372 Compare August 14, 2025 07:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add fp8 quantization kernels #12

add fp8 quantization kernels #12

Uh oh!

baodii commented Aug 13, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 13, 2025

Uh oh!

Copilot AI Aug 13, 2025

Uh oh!

Copilot AI Aug 13, 2025

Uh oh!

zufangzhu Aug 14, 2025

Uh oh!

Uh oh!

	# For fp8, in order to match the cuda kernel output, we have to do exactly
	# For fp8, in order to match the xpu kernel output, we have to do exactly


		# Add parent directory to Python path
		# sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))

add fp8 quantization kernels #12

Are you sure you want to change the base?

add fp8 quantization kernels #12

Uh oh!

Conversation

baodii commented Aug 13, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

zufangzhu Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!