Skip to content

Commit eeab75a

Browse files
committed
Add Float8Tensor
Summary: * Added Float8Tensor that's using fbgemm kernels and scaled_mm: * per row activation + per row weight linear calling torch._scaled_mm op (for compatibilty with SM 8.9) * per tensor activation + per tensor weight quant linear calling torch._scaled_mm op (for compatibilty with SM 8.9) * per row activation + per row weight bmm calling torch.ops.fbgemm.f8f8bf16_rowwise_batched kernel (only works for SM 9.0+) can use batched scaled mm from torch when it's supported: pytorch/pytorch#157950 * dynamic quantization kwargs is added to the Float8Tensor directly * Added QuantizeTensorKwargs and QuantizeTensorToFloat8Kwargs to store key word args for Float8Tensor.to_float8 * Updated Float8DynamicActivationFloat8WeightConfig and Float8WeightOnlyConfig to use Float8Tensor Test Plan: python test/dtypes/test_affine_quantized_float.py python test/quantization/quantize_/workflows/float8/test_float8_tensor.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2463, branch: jerryzh168/stack/9
1 parent c28ee7b commit eeab75a

File tree

13 files changed

+1375
-192
lines changed

13 files changed

+1375
-192
lines changed

.github/workflows/1xH100_tests.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ jobs:
2525
include:
2626
- name: H100
2727
runs-on: linux.aws.h100
28-
torch-spec: '--pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126'
28+
torch-spec: '--pre torch torchvision torchaudio fbgemm-gpu-genai --index-url https://download.pytorch.org/whl/nightly/cu126'
2929
gpu-arch-type: "cuda"
3030
gpu-arch-version: "12.4"
3131
permissions:
@@ -46,8 +46,8 @@ jobs:
4646
pip install uv
4747
pip install ${{ matrix.torch-spec }}
4848
uv pip install -r dev-requirements.txt
49-
uv pip install vllm
5049
pip install .
5150
pytest test/integration --verbose -s
5251
pytest test/dtypes/test_affine_quantized_float.py --verbose -s
52+
pytest test/quantization/quantize_/workflows/float8/test_float8_tensor.py --verbose -s
5353
./test/float8/test_everything_single_gpu.sh

.github/workflows/1xL4_tests.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ jobs:
2525
include:
2626
- name: SM-89
2727
runs-on: linux.g6.4xlarge.experimental.nvidia.gpu
28-
torch-spec: '--pre torch --index-url https://download.pytorch.org/whl/nightly/cu126'
28+
torch-spec: '--pre torch fbgemm-gpu-genai --index-url https://download.pytorch.org/whl/nightly/cu126'
2929
gpu-arch-type: "cuda"
3030
gpu-arch-version: "12.6"
3131
permissions:
@@ -46,8 +46,8 @@ jobs:
4646
pip install uv
4747
pip install ${{ matrix.torch-spec }}
4848
uv pip install -r dev-requirements.txt
49-
uv pip install vllm
5049
pip install .
5150
pytest test/integration --verbose -s
5251
pytest test/dtypes/test_affine_quantized_float.py --verbose -s
5352
./test/float8/test_everything_single_gpu.sh
53+
pytest test/quantization/quantize_/workflows/float8/test_float8_tensor.py --verbose -s

test/dtypes/test_fbgemm_fp8.py

Lines changed: 0 additions & 153 deletions
This file was deleted.

0 commit comments

Comments
 (0)