Release GPT-QModel v5.7.0 · ModelCloud/GPTQModel

Notable Changes:

Feature: MoE.Routing control (Bypass or Override) by @avtc in #2235
Feature: Use FailSafe Naive Quantization when GPTQ fails due to MoE uneven routing by @ZX-ModelCloud in #2293
Feature: ability to pause/resume quantization via 'p' key by @avtc in #2294
Glm4v support by @LRL2-ModelCloud in #2303
Failsafe smoothers by @Qubitium in #2304
New median strategy and SmoothPercentileAsymmetric smoother by @Qubitium in
Support for Qwen2.5-Omni calibration data includes audio. by @ChenShisen in #2309
Add Smooth trigger based on group_size by @Qubitium in #2312
Voxtral support by @LRL2-ModelCloud in #2315
Better compat with triton-windows and other alternative triton packages by @Qubitium in #2395
Dynamically map format/backend to kernel by @Qubitium in #2353
Add EXAONE4 support by @namgyu-youn in #2405

What's Changed

[FIX] unittest by @ZX-ModelCloud in #2291
[FIX] marlin forward by @ZX-ModelCloud in #2296
FIX fast_hadamard_transform import by @LRL2-ModelCloud in #2298
do not log moe errors if failesafe enabled by @Qubitium in #2299
[CI] allow cancel action by @CSY-ModelCloud in #2300
Fix non-rtn packing by @Qubitium in #2302
log q vs weight abs.mean for loss column by @Qubitium in #2306
fix inverted failsafe log condition by @Qubitium in #2310
#2311
Allow failsafe to be none by @Qubitium in #2313
move non-inference affecting fields to meta on save by @Qubitium in #2314
[FIX] GPTQModel.load() can now correctly load non-quantized models. by @ZX-ModelCloud in #2317
FIX hf kernel by @jiqing-feng in #2319
[CI] test_qwen3_moe add eval task: GSM8K_PLATINUM_COT and MMLU_STEM by @ZX-ModelCloud in #2320
Release 5.7 Prep by @Qubitium in #2318
[FIX] Exclude unrouted MoE experts on load by @ZX-ModelCloud in #2321
[FIX] Skip empty subset by @ZX-ModelCloud in #2322
[FIX] GLM-4.5-Air quantize fail by @ZX-ModelCloud in #2323
fix: offload_to_disk=True uses more vram than offload_to_disk=False by @avtc in #2325
Fix import no_init_weights from transformers by @jiqing-feng in #2329
[FIX] qqq quantize by @ZX-ModelCloud in #2330
chery pick: attempt to fix terminal state after pause/resume handlers by @avtc in #2327
[FIX] quantization to fail for non-MoE models by @ZX-ModelCloud in #2333
Device check by @jiqing-feng in #2334
FIX moe flag passing not passing nested ci test by @Qubitium in #2337
Use safer checks for nullable properties where they may not exists at… by @Qubitium in #2338
Fix unit test by @Qubitium in #2339
Group module_tree/subsection parsing related tests to module_tree folder by @Qubitium in #2340
Group kernel tests by @Qubitium in #2341
Lifecycle: Move awq.pack_module to submodule_finalize() from process() by @ZX-ModelCloud in #2335
Partial Revert 2235: temp remove moe bypass by @ZX-ModelCloud in #2343
Re apply compute device filter by @Qubitium in #2345
Re-apply moe routing bypass by @ZX-ModelCloud in #2347
Fix: Zero point underflow in AWQ Exllama v2 kernel by @12345txy in #2351
Remove unnecessary +1/-1 inference/packing zerpoint offset for AWQ Exllama v2 kernel by @Qubitium in #2352
Normalize AWQ.qcfg zero_point to sym property by @Qubitium in #2355
FIX sym True with AWQ by @ZX-ModelCloud in #2357
Prepare for 5.7 by @Qubitium in #2358
[FIX] self_attn.q_proj was not quantized in the Moonlight Model by @ZX-ModelCloud in #2360
[FIX] torch_fused inference error by @ZX-ModelCloud in #2362
[FIX] FORMAT.LLM_AWQ was incorrectly quantized as FORMAT.GEMM by @ZX-ModelCloud in #2364
[CI] load all tests include sub dirs & merge some small tests in to one file by @CSY-ModelCloud in #2363
Fix evalplus output filename mismatch by @juraev in #2365
[FIX] FORMAT.GEMV and FORMAT.GEMV_FAST could not be quantized by @ZX-ModelCloud in #2366
[CI] add deps config for CI tests by @CSY-ModelCloud in #2368
[FIX] unittest by @ZX-ModelCloud in #2370
[FIX] In AWQProcessor, the failsafe threshold_value should be calculated based on the scale group, not the entire layer by @ZX-ModelCloud in #2369
[CI] fix ci didn't read correct yaml by @CSY-ModelCloud in #2371
[FIX] ci unittest by @ZX-ModelCloud in #2372
[FIX] test_q4_bitblas and test_qqq by @ZX-ModelCloud in #2373
[CI] add test_integration deps by @CSY-ModelCloud in #2374
[CI] fix torch version was upgraded by deps by @CSY-ModelCloud in #2377
select_quant_linear should always receive a non-null device by @Qubitium in #2376
[CI] uninstall pynvml by @CSY-ModelCloud in #2378
[FIX] failed ci test by @ZX-ModelCloud in #2380
[FIX] test_gptq by @ZX-ModelCloud in #2382
[FIX] correct has_captured_input_ids() logic by using > 0 check by @ZX-ModelCloud in #2383
[FIX] test_model by @ZX-ModelCloud in #2384
[FIX] unit test by @ZX-ModelCloud in #2385
[CI] use new docker by @CSY-ModelCloud in #2387
[FIX] ci test by @ZX-ModelCloud in #2388
[FIX] unittest by @ZX-ModelCloud in #2389
[FIX] missing ExllamaV2 kernels initialization in AutoRound by @ZX-ModelCloud in #2390
[CI] keep uv up to date by @CSY-ModelCloud in #2391
[FIX] test_awq by @ZX-ModelCloud in #2392
[FIX] Incorrectly selected device by @ZX-ModelCloud in #2394
[FIX] quantization failure for Qwen2/2.5/3 VL models with FlashAttention-2 by @ZX-ModelCloud in #2396
[FIX] test_ovis2 and test_ovis_1_6_llama by @ZX-ModelCloud in #2397
[FIX] test_stage_modules by @ZX-ModelCloud in #2398
[CI] list test files with py file & fix duplicated test names by @CSY-ModelCloud in #2399
[FIX] test_pause_resume by @ZX-ModelCloud in #2400
[CI] update sort, root test files first by @CSY-ModelCloud in #2401
[FIX] exllama_v1 kernel crash by @ZX-ModelCloud in #2402
[FIX] test_chatglm by @ZX-ModelCloud in #2406
set tokenicer>=0.0.6 by @CSY-ModelCloud in #2407
Fix tokenizer_class incompatibility with transformers 5.0 by @juraev in #2403
[FIX] model_test by @ZX-ModelCloud in #2410
fixed ValueError: invalid pyproject.toml config: project.license. con… by @CSY-ModelCloud in #2412
[FIX] module_tree tests by @ZX-ModelCloud in #2411
fix license warning by @CSY-ModelCloud in #2413

New Contributors

@ChenShisen made their first contribution in #2309
@12345txy made their first contribution in #2351
@juraev made their first contribution in #2365
@namgyu-youn made their first contribution in #2405

Full Changelog: v5.6.12...v5.7.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT-QModel v5.7.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Notable Changes:

What's Changed

New Contributors

Contributors

Uh oh!