Skip to content

Fix quantization_config parsing for --kv-cache-dtype=auto#32

Merged
alvarobartt merged 10 commits intomainfrom
fix-quantization-config-for-kv-cache
Feb 4, 2026
Merged

Fix quantization_config parsing for --kv-cache-dtype=auto#32
alvarobartt merged 10 commits intomainfrom
fix-quantization-config-for-kv-cache

Conversation

@alvarobartt
Copy link
Owner

Description

Warning

This PR constitutes a breaking change given that as there was a bug before with the parsing order where dtype (and torch_dtype) where being parsed as the default --kv-cache-dtype when --kv-cache-dtype=auto (or simply not set); the actual dtype from the quantization_config was not being used as the default option, meaning that now models with a dtype or torch_dtype set with an invalid quantization_config will raise a RuntimeError whereas before those where working "fine", just estimating the KV cache requirements with a "wrong" dtype.

This PR fixes the placement of the quantization_config to have more priority if there than dtype or torch_dtype, as well as bumping the version to 0.4.4 with uv version 0.4.4.


  • I have read and followed the guidelines in CONTRIBUTING.md.
  • This has been discussed over an issue or discussion.

@alvarobartt alvarobartt merged commit 6d6b46c into main Feb 4, 2026
2 checks passed
@alvarobartt alvarobartt deleted the fix-quantization-config-for-kv-cache branch February 4, 2026 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant