Skip to content
Merged
Changes from 2 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
bdc5e5d
example p1
brian-dellabetta Jan 13, 2026
cde1c3a
p2
brian-dellabetta Jan 13, 2026
06695a5
p2
brian-dellabetta Jan 14, 2026
a9a567f
use targets
brian-dellabetta Jan 15, 2026
264636a
update quant config
brian-dellabetta Jan 15, 2026
255f803
comments
brian-dellabetta Jan 15, 2026
02bf5ee
script cleanup
brian-dellabetta Jan 15, 2026
22a4758
minor cleanup
brian-dellabetta Jan 16, 2026
bfe4e5c
ignore default values
brian-dellabetta Jan 16, 2026
e713d4b
Merge branch 'main' into bdellabe/example-dsr1-nvfp4-fp8block
brian-dellabetta Jan 16, 2026
b6c9807
stylefixes
brian-dellabetta Jan 16, 2026
a4d4ad9
invert global input/weight scales
brian-dellabetta Jan 16, 2026
5ee4758
fix
brian-dellabetta Jan 18, 2026
64944e0
updates
brian-dellabetta Jan 21, 2026
9f89d29
missing format
brian-dellabetta Jan 22, 2026
d79e0b9
minor touchups
brian-dellabetta Jan 22, 2026
e0e8ccb
comment typo
brian-dellabetta Jan 23, 2026
302330e
merge main
brian-dellabetta Feb 23, 2026
8339433
Processor protocol
brian-dellabetta Feb 23, 2026
2c1f5d2
cleanup
brian-dellabetta Feb 23, 2026
f3e33a5
cleanup
brian-dellabetta Feb 23, 2026
c9c023a
cleanup
brian-dellabetta Feb 24, 2026
0adf115
helper cleanup
brian-dellabetta Feb 24, 2026
7f7663c
bugfix
brian-dellabetta Feb 25, 2026
a54d4cb
Merge branch 'main' into bdellabe/example-dsr1-nvfp4-fp8block
brian-dellabetta Feb 25, 2026
c49f401
fix logic, match_quantizable_tensors
brian-dellabetta Feb 25, 2026
49683b6
Merge branch 'main' into bdellabe/example-dsr1-nvfp4-fp8block
brian-dellabetta Feb 25, 2026
3b667fc
target regex update
brian-dellabetta Feb 27, 2026
5fc016f
refactor to CT entrypoint
brian-dellabetta Mar 2, 2026
179b70a
update create config
brian-dellabetta Mar 2, 2026
69e9a4a
minor cleanup
brian-dellabetta Mar 2, 2026
692bd13
fix overwrite qconfig
brian-dellabetta Mar 2, 2026
2f882ef
revert example
brian-dellabetta Mar 2, 2026
869d85d
Merge branch 'main' into bdellabe/example-dsr1-nvfp4-fp8block
brian-dellabetta Mar 2, 2026
4b47725
refactor from CT changes
brian-dellabetta Mar 3, 2026
3cb89dd
cleanup
brian-dellabetta Mar 3, 2026
0ee7d9b
cleanup
brian-dellabetta Mar 3, 2026
6120b26
post-refactor cleanup
brian-dellabetta Mar 3, 2026
0663bd0
test cosmetics
brian-dellabetta Mar 3, 2026
39f9442
docstrings
brian-dellabetta Mar 3, 2026
be73088
docstring
brian-dellabetta Mar 3, 2026
7e241d0
minor refactor, exec_jobs
brian-dellabetta Mar 4, 2026
a5a1b43
prune find_safetensors_index_file
brian-dellabetta Mar 5, 2026
f4bb2d9
bugfix
brian-dellabetta Mar 5, 2026
6fb2fb6
typo
brian-dellabetta Mar 5, 2026
43b2c36
move simlar named helper to private
brian-dellabetta Mar 5, 2026
75e6478
prune helper
brian-dellabetta Mar 5, 2026
7f4cd5e
Merge branch 'main' into bdellabe/example-dsr1-nvfp4-fp8block
brian-dellabetta Mar 5, 2026
c59baa3
move entrypoints tests to dedicated folder
brian-dellabetta Mar 5, 2026
2041946
move model free validate
brian-dellabetta Mar 5, 2026
a556514
entrypoints tests
brian-dellabetta Mar 5, 2026
b9ce613
cleanup
brian-dellabetta Mar 5, 2026
e025a5f
cleanup
brian-dellabetta Mar 5, 2026
5ae4c63
rename example
brian-dellabetta Mar 5, 2026
96daf7c
Merge branch 'main' into bdellabe/example-dsr1-nvfp4-fp8block
brian-dellabetta Mar 5, 2026
2b1c26d
Merge branch 'main' into bdellabe/example-dsr1-nvfp4-fp8block
brian-dellabetta Mar 6, 2026
d7cba48
reindex_fused_weights
brian-dellabetta Mar 6, 2026
07ccbbe
Merge branch 'main' into bdellabe/example-dsr1-nvfp4-fp8block
brian-dellabetta Mar 9, 2026
dd0be8e
test_calib_deepseekv3_module consistency fix
brian-dellabetta Mar 10, 2026
f520382
Merge branch 'main' into bdellabe/example-dsr1-nvfp4-fp8block
dsikka Mar 10, 2026
1c6874f
failing test fix
brian-dellabetta Mar 10, 2026
1ed8a9d
add not isnan assertion
brian-dellabetta Mar 10, 2026
9c0a8dc
cicd test fix
brian-dellabetta Mar 10, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions examples/model_free_ptq/dsr1_nvfp4_fp8_block.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
from llmcompressor import model_free_ptq
from compressed_tensors.quantization import (
QuantizationScheme,
QuantizationArgs,
QuantizationStrategy,
QuantizationType,
)

MODEL_ID = "nvidia/DeepSeek-R1-NVFP4"
SAVE_DIR = MODEL_ID.rstrip("/").split("/")[-1] + "-FP8-BLOCK"

# Apply FP8-Block to the model's compatible self_attn Linear layers
# Once quantized, the model is saved
# using compressed-tensors to the SAVE_DIR.
model_free_ptq(
model_stub=MODEL_ID,
save_directory=SAVE_DIR,
scheme=QuantizationScheme(
weights=QuantizationArgs(
num_bits=8,
type=QuantizationType.FLOAT,
strategy=QuantizationStrategy.BLOCK,
symmetric=True,
dynamic=False,
block_structure=[128, 128],
),
input_activations=QuantizationArgs(
num_bits=8,
type=QuantizationType.FLOAT,
strategy=QuantizationStrategy.GROUP,
symmetric=True,
dynamic=True,
observer=None,
group_size=128,
),
# TODO cannot set targets here, must be ["Linear"]
# targets=[
# "re:.*self_attn.(o_proj|q_a_proj|q_b_proj).*"
# ],
),
ignore=[
# NOTE: self_attn.kv_a_proj_with_mqa has incompatible shape 576x7168 with block size 128x128
# NOTE: self_attn.kv_b_proj is already dequantized by MLA
# This regex matches all strings that don't contain one of the following substrings:
# - self_attn.o_proj
# - self_attn.q_a_proj
# - self_attn.q_b_proj
"re:^(?!.*self_attn.(o_proj|q_a_proj|q_b_proj)).*$"
],
max_workers=8,
device="cuda:0",
)

# TODO reverse modelopt NVFP4 tensor packing order

# TODO merge hf_quant_config.json with CT quantization_config in config.json
Loading