Skip to content

Commit fcb3e32

Browse files
author
George Ohashi
committed
comments
1 parent 98918b9 commit fcb3e32

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,15 +29,15 @@
2929
PTQ is performed to reduce the precision of quantizable weights (e.g., linear layers) to a lower bit-width. Supported formats are:
3030

3131
##### [W4A16](./examples/quantization_w4a16/README.md)
32-
- Uses GPTQ to compress weights to 4 bits.
32+
- Uses GPTQ to compress weights to 4 bits. Requires calibration dataset.
3333
- Useful speed ups in low QPS regimes with more weight compression.
3434
- Recommended for any GPUs types.
3535
##### [W8A8-INT8](./examples/quantization_w8a8_int8/README.md)
36-
- Uses channel-wise quantization to compress weights to 8 bits, and uses dynamic per-token quantization to compress activations to 8 bits.
36+
- Uses channel-wise quantization to compress weights to 8 bits using GPTQ, and uses dynamic per-token quantization to compress activations to 8 bits. Requires calibration dataset for weight quantization. Activation quantization is carried out during inference on vLLM.
3737
- Useful for speed ups in high QPS regimes or offline serving on vLLM.
3838
- Recommended for NVIDIA GPUs with compute capability <8.9 (Ampere, Turing, Volta, Pascal, or older).
3939
##### [W8A8-FP8](./examples/quantization_w8a8_fp8/README.md)
40-
- Uses channel-wise quantization to compress weights to 8 bits, and uses dynamic per-token quantization to compress activations to 8 bits.
40+
- Uses channel-wise quantization to compress weights to 8 bits, and uses dynamic per-token quantization to compress activations to 8 bits. Does not require calibration dataset. Activation quantization is carried out during inference on vLLM.
4141
- Useful for speed ups in high QPS regimes or offline serving on vLLM.
4242
- Recommended for NVIDIA GPUs with compute capability >8.9 (Hopper and Ada Lovelace).
4343

0 commit comments

Comments
 (0)