Fix `quantization_config` parsing for `--kv-cache-dtype=auto` by alvarobartt · Pull Request #32 · alvarobartt/hf-mem

alvarobartt · 2026-02-04T10:09:21Z

Description

Warning

This PR constitutes a breaking change given that as there was a bug before with the parsing order where dtype (and torch_dtype) where being parsed as the default --kv-cache-dtype when --kv-cache-dtype=auto (or simply not set); the actual dtype from the quantization_config was not being used as the default option, meaning that now models with a dtype or torch_dtype set with an invalid quantization_config will raise a RuntimeError whereas before those where working "fine", just estimating the KV cache requirements with a "wrong" dtype.

This PR fixes the placement of the quantization_config to have more priority if there than dtype or torch_dtype, as well as bumping the version to 0.4.4 with uv version 0.4.4.

I have read and followed the guidelines in CONTRIBUTING.md.
This has been discussed over an issue or discussion.

alvarobartt added 10 commits February 4, 2026 10:40

Fix F8_E4M3 in torch_dtype_to_safetensors_dtype

9f98054

Add missing INT8 in torch_dtype_to_safetensors_dtype

6fe39e9

Add note on I8

fb64a65

Fix quantization_config parsing to make fmt optional

93f5937

Consider quantization_config before dtype or torch_dtype

dcb88b2

Add format alt for fmt and exclude quant_method != fp8

f0d60bc

Run uv version 0.4.4

9d86958

Fix --kv-cache-dtype={fp8,fp8_ds_mla,fp8_inc} to default to F8_E4M3

3bcb6b3

Fix RuntimeError messages when parsing quantization_config

ce5b9a4

Fix fallback to F8_* to default to most used dtype (if any)

e2b8980

alvarobartt merged commit 6d6b46c into main Feb 4, 2026
2 checks passed

alvarobartt deleted the fix-quantization-config-for-kv-cache branch February 4, 2026 11:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `quantization_config` parsing for `--kv-cache-dtype=auto`#32

Fix `quantization_config` parsing for `--kv-cache-dtype=auto`#32
alvarobartt merged 10 commits intomainfrom
fix-quantization-config-for-kv-cache

alvarobartt commented Feb 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alvarobartt commented Feb 4, 2026

Description

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant