[convert] Support for DeepSeek-V3.2 and Dequantizing by brian-dellabetta · Pull Request #641 · vllm-project/compressed-tensors

brian-dellabetta · 2026-03-19T19:26:11Z

Corequisite:

DeepSeek V3.2 support llm-compressor#2491

This PR enhances the convert_checkpoint entrypoint to handle dequantization as well, and adds new functionality to be compatible with DeepSeek-V3.2:

Add a FP8Dequantizer` converter to upconvert checkpoints in quant method fp8 and quant scheme FP8_BLOCK back to bfloat16
Add get_dependency_weight method to Converter interface, so converters can define if a weight has dependency weights that also need to be processed along with.
Add build_inverse_weight_maps logic, following pattern in llm-compressor, so that all weight dependencies can be loaded in with the given weight, even if they live in separate safetensors files. The deepseek model often splits a module's weight and weight_scale_inv tensors across different files, and they need to be processed together when dequantizing.
Update converter create_quant_config signature to return an Optional field. This is needed when converting from one checkpoint to bfloat16, when no accompanying compressed-tensors quantization config is needed.

Merge in conjunction with vllm-project/llm-compressor#2491

TODOs:

rebase against [MTP] Add functionality to save mtp layers from the original model + refactor safetensors utilities #640
refactor to align with [Distributed] [model_free_ptq] Eliminate reindexing step via fine-grained parallelized partial reads llm-compressor#2498

Test Plan:

unit tests for new converter and new reindex entrypoint
nightly tests for new converter on accompanying llm-compressor PR

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

Updated README to accurately document the convert_checkpoint entrypoint, including the Converter system, ModelOptNvfp4Converter, and usage examples. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

mergify · 2026-04-01T21:38:02Z

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

mergify · 2026-04-02T21:21:56Z

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

mergify · 2026-04-02T21:30:03Z

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages.

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

kylesayrs

Outside of error for single-shard models, looks good to me! Thanks for being open to suggestions

kylesayrs · 2026-04-03T14:40:01Z

src/compressed_tensors/entrypoints/convert/converters/base.py

+    all_dependencies: set[str] = set()
+    for values in weight_deps_dict.values():
+        for value in values:
+            all_dependencies.add(value)


Suggested change

all_dependencies: set[str] = set()

for values in weight_deps_dict.values():

for value in values:

all_dependencies.add(value)

all_dependencies: set[str] = set().union(*weight_deps_dict.values())

kylesayrs · 2026-04-03T14:47:37Z

src/compressed_tensors/entrypoints/convert/converters/base.py

+        """
+        Given a weight name, return a dictionary of all dependency weight names, so that
+        weights can be processed correctly and in a parallelized fashion.
+        If a dependency is optional, the value associated with the key should be False.


Please give an example of an "optional dependency" to make the concept clear

kylesayrs · 2026-04-03T14:48:32Z

src/compressed_tensors/entrypoints/convert/converters/base.py

+        return current_deps
+
+    # map of weight name -> ( map of dependency name -> is_required )
+    weight_deps_dict: dict[str, set[str]] = defaultdict(set)


Suggested change

weight_deps_dict: dict[str, set[str]] = defaultdict(set)

weight_deps_dict: dict[str, dict[str, bool]] = dict()

kylesayrs · 2026-04-03T14:52:05Z

src/compressed_tensors/entrypoints/convert/converters/base.py

+                else:
+                    continue
+            weight_to_add_shard_name = weight_map[weight_to_add_name]
+            resolved_path = model_files.get(weight_to_add_shard_name)


Suggested change

resolved_path = model_files.get(weight_to_add_shard_name)

resolved_path = model_files[weight_to_add_shard_name]

kylesayrs · 2026-04-03T14:54:30Z

src/compressed_tensors/entrypoints/convert/converters/base.py

+            resolved_path = model_files.get(weight_to_add_shard_name)
+            inverse_weight_map[resolved_path].append(weight_to_add_name)
+
+    # return dicts, not defaultdicts, to avoid silent errors


kylesayrs · 2026-04-03T15:19:42Z

src/compressed_tensors/entrypoints/convert/converters/fp8block_dequantizer.py

+                    # NOTE: sometimes models split weights across different files
+                    logger.warning(


Shouldn't this be an error?

kylesayrs · 2026-04-03T15:23:24Z

src/compressed_tensors/entrypoints/convert/converters/fp8block_dequantizer.py

+        disallowed_names = ["weight_scale_inv"]
+        untargeted_names = [
+            name for name in tensors.keys() if name not in targeted_names
+        ]
+        for name in untargeted_names:
+            param_name = name.rsplit(".", 1)[-1]
+
+            if param_name in disallowed_names:
+                raise ValueError(f"Found unexpected non-targeted tensor {name}")


This makes sense for now, but is inflexible if we want to support mixed recipes. For example, convert some weights to full precision, but convert other weights from fp8 to compressed-tensors

kylesayrs · 2026-04-03T15:27:33Z

src/compressed_tensors/entrypoints/convert/convert_checkpoint.py

+    # Read weight map from safetensors.index.json
+    index_file = find_safetensors_index_file(model_files)
+    with open(index_file, "r") as f:
+        weight_map: dict[str, str] = json.load(f)["weight_map"]


This will error if there is no index file. Use something like this instead

def get_weight_map(model_files): index_file = find_safetensors_index_file(model_files) if index_file is not None: with open(index_file, "r") as f: return json.load(f)["weight_map"] else: with safe_open(SAFE_WEIGHTS_NAME, "r") as file: return {tensor: SAFE_WEIGHTS_NAME for tensor in file.keys()}

kylesayrs · 2026-04-03T15:32:25Z

src/compressed_tensors/entrypoints/convert/convert_file.py

            config_data = json.load(file)

-        config_data[QUANTIZATION_CONFIG_NAME] = quant_config_data
+        if quant_config_data is None:


qconfig field is not guaranteed to exist

Suggested change

if quant_config_data is None:

if quant_config_data is None and QUANTIZATION_CONFIG_NAME in config_data:

kylesayrs · 2026-04-03T15:34:59Z

src/compressed_tensors/utils/safetensors_load.py

+    :return: absolute path to the safetensors index file, or None if not found
+    """
+    for file_path, resolved_path in model_files.items():
+        if file_path.endswith("safetensors.index.json"):


Suggested change

if file_path.endswith("safetensors.index.json"):

if file_path.endswith(SAFE_WEIGHTS_INDEX_NAME):

brian-dellabetta and others added 30 commits February 26, 2026 22:36

convert checkpoint entrypoint

40a3b28

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

quality fixeS

dfbcf30

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

quality fixes

4c39a6f

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

include kv_cache_scheme

10039f3

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

move converters to folder

3075189

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

cleanup

5a1ca69

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

stylefixes

58d0230

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

unit tests

f5de237

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

missing init file

5a64346

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

config_group key

712f9ce

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

drop autoawq todo stub

c65eef1

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

Converter docstrings

66cdd0f

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

break out into convert module

0ba80e3

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

test fixes

96125d6

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

rename

cb9c123

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

rename

88b2691

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

rename

6a2c622

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

working example

d7590a6

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

docstrings/cosmetics

278f177

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

stricter validate, test for entrypoint

b852c3c

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

test fixes

fea715e

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

style fixes

d3ddadf

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

exec_jobs helper, remove test

60ac422

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

stylefix

33deb5d

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

p1

934abd2

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

example

8c673b8

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

impl

40e2430

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

example

4d72ba7

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

check1

cbf7470

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta mentioned this pull request Mar 30, 2026

Support quantizing native FP8 models #536

Closed

brian-dellabetta added 2 commits April 1, 2026 18:56

Merge branch 'main' into bdellabe/ds32-to-bfloat16

a3247fe

requires/is_required_by interface with build inverse weights map

320ce06

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

mergify bot added the quality-failed label Apr 1, 2026

example update, dequantizer rename

9fbce9f

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

mergify bot removed the quality-failed label Apr 1, 2026

brian-dellabetta added 6 commits April 1, 2026 21:48

typo

b6f516f

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

typo

2de9cd7

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

typo

acc01d0

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

typo

e57a2c1

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

working poc with requires

70343c5

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

update reindex test

44e89f2

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

mergify bot added the quality-failed label Apr 2, 2026

single job fix

47c9e36

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

mergify bot removed the quality-failed label Apr 2, 2026

mergify bot added the quality-failed label Apr 2, 2026

requires -> get_dependencies, is_required bool

129a9fe

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

mergify bot removed the quality-failed label Apr 2, 2026

brian-dellabetta added 2 commits April 2, 2026 22:36

unit test bugfix

23af737

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

docstrings

4ed0bc1

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta marked this pull request as ready for review April 2, 2026 22:50

brian-dellabetta changed the title ~~[convert] Support for DeepSeek-V3.2~~ [convert] Support for DeepSeek-V3.2 and Dequantizing Apr 2, 2026

brian-dellabetta requested review from HDCharles, dsikka and kylesayrs April 2, 2026 23:02

brian-dellabetta mentioned this pull request Apr 3, 2026

[Bug]: Cannot properly compress remaining layers of already compressed model (gpt-oss-20b) vllm-project/llm-compressor#2560

Open

kylesayrs requested changes Apr 3, 2026

View reviewed changes

	weight_deps_dict: dict[str, set[str]] = defaultdict(set)
	weight_deps_dict: dict[str, dict[str, bool]] = dict()

	resolved_path = model_files.get(weight_to_add_shard_name)
	resolved_path = model_files[weight_to_add_shard_name]

		# NOTE: sometimes models split weights across different files
		logger.warning(

	if quant_config_data is None:
	if quant_config_data is None and QUANTIZATION_CONFIG_NAME in config_data:

	if file_path.endswith("safetensors.index.json"):
	if file_path.endswith(SAFE_WEIGHTS_INDEX_NAME):

Conversation

brian-dellabetta commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Apr 1, 2026

Uh oh!

mergify bot commented Apr 2, 2026

Uh oh!

mergify bot commented Apr 2, 2026

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brian-dellabetta commented Mar 19, 2026 •

edited

Loading