GPT-QModel v5.7.0
Notable Changes:
- Feature: MoE.Routing control (Bypass or Override) by @avtc in #2235
- Feature: Use FailSafe Naive Quantization when GPTQ fails due to MoE uneven routing by @ZX-ModelCloud in #2293
- Feature: ability to pause/resume quantization via 'p' key by @avtc in #2294
- Glm4v support by @LRL2-ModelCloud in #2303
- Failsafe smoothers by @Qubitium in #2304
- New median strategy and SmoothPercentileAsymmetric smoother by @Qubitium in
- Support for Qwen2.5-Omni calibration data includes audio. by @ChenShisen in #2309
- Add Smooth trigger based on group_size by @Qubitium in #2312
- Voxtral support by @LRL2-ModelCloud in #2315
- Better compat with triton-windows and other alternative triton packages by @Qubitium in #2395
- Dynamically map format/backend to kernel by @Qubitium in #2353
- Add EXAONE4 support by @namgyu-youn in #2405
What's Changed
- [FIX] unittest by @ZX-ModelCloud in #2291
- [FIX] marlin forward by @ZX-ModelCloud in #2296
- FIX fast_hadamard_transform import by @LRL2-ModelCloud in #2298
- do not log moe errors if
failesafeenabled by @Qubitium in #2299 - [CI] allow cancel action by @CSY-ModelCloud in #2300
- Fix non-rtn packing by @Qubitium in #2302
- log q vs weight abs.mean for loss column by @Qubitium in #2306
- fix inverted failsafe log condition by @Qubitium in #2310
#2311 - Allow failsafe to be none by @Qubitium in #2313
- move non-inference affecting fields to meta on save by @Qubitium in #2314
- [FIX] GPTQModel.load() can now correctly load non-quantized models. by @ZX-ModelCloud in #2317
- FIX hf kernel by @jiqing-feng in #2319
- [CI] test_qwen3_moe add eval task: GSM8K_PLATINUM_COT and MMLU_STEM by @ZX-ModelCloud in #2320
- Release 5.7 Prep by @Qubitium in #2318
- [FIX] Exclude unrouted MoE experts on load by @ZX-ModelCloud in #2321
- [FIX] Skip empty subset by @ZX-ModelCloud in #2322
- [FIX] GLM-4.5-Air quantize fail by @ZX-ModelCloud in #2323
- fix: offload_to_disk=True uses more vram than offload_to_disk=False by @avtc in #2325
- Fix import no_init_weights from transformers by @jiqing-feng in #2329
- [FIX] qqq quantize by @ZX-ModelCloud in #2330
- chery pick: attempt to fix terminal state after pause/resume handlers by @avtc in #2327
- [FIX] quantization to fail for non-MoE models by @ZX-ModelCloud in #2333
- Device check by @jiqing-feng in #2334
- FIX moe flag passing not passing nested ci test by @Qubitium in #2337
- Use safer checks for nullable properties where they may not exists at… by @Qubitium in #2338
- Fix unit test by @Qubitium in #2339
- Group module_tree/subsection parsing related tests to module_tree folder by @Qubitium in #2340
- Group kernel tests by @Qubitium in #2341
- Lifecycle: Move
awq.pack_moduletosubmodule_finalize()fromprocess()by @ZX-ModelCloud in #2335 - Partial Revert 2235: temp remove moe bypass by @ZX-ModelCloud in #2343
- Re apply compute device filter by @Qubitium in #2345
- Re-apply moe routing bypass by @ZX-ModelCloud in #2347
- Fix: Zero point underflow in AWQ Exllama v2 kernel by @12345txy in #2351
- Remove unnecessary +1/-1 inference/packing zerpoint offset for AWQ Exllama v2 kernel by @Qubitium in #2352
- Normalize AWQ.qcfg
zero_pointtosymproperty by @Qubitium in #2355 - FIX sym True with AWQ by @ZX-ModelCloud in #2357
- Prepare for 5.7 by @Qubitium in #2358
- [FIX]
self_attn.q_projwas not quantized in the Moonlight Model by @ZX-ModelCloud in #2360 - [FIX] torch_fused inference error by @ZX-ModelCloud in #2362
- [FIX] FORMAT.LLM_AWQ was incorrectly quantized as FORMAT.GEMM by @ZX-ModelCloud in #2364
- [CI] load all tests include sub dirs & merge some small tests in to one file by @CSY-ModelCloud in #2363
- Fix evalplus output filename mismatch by @juraev in #2365
- [FIX] FORMAT.GEMV and FORMAT.GEMV_FAST could not be quantized by @ZX-ModelCloud in #2366
- [CI] add deps config for CI tests by @CSY-ModelCloud in #2368
- [FIX] unittest by @ZX-ModelCloud in #2370
- [FIX] In AWQProcessor, the failsafe threshold_value should be calculated based on the scale group, not the entire layer by @ZX-ModelCloud in #2369
- [CI] fix ci didn't read correct yaml by @CSY-ModelCloud in #2371
- [FIX] ci unittest by @ZX-ModelCloud in #2372
- [FIX] test_q4_bitblas and test_qqq by @ZX-ModelCloud in #2373
- [CI] add test_integration deps by @CSY-ModelCloud in #2374
- [CI] fix torch version was upgraded by deps by @CSY-ModelCloud in #2377
select_quant_linearshould always receive a non-nulldeviceby @Qubitium in #2376- [CI] uninstall pynvml by @CSY-ModelCloud in #2378
- [FIX] failed ci test by @ZX-ModelCloud in #2380
- [FIX] test_gptq by @ZX-ModelCloud in #2382
- [FIX] correct
has_captured_input_ids()logic by using> 0check by @ZX-ModelCloud in #2383 - [FIX] test_model by @ZX-ModelCloud in #2384
- [FIX] unit test by @ZX-ModelCloud in #2385
- [CI] use new docker by @CSY-ModelCloud in #2387
- [FIX] ci test by @ZX-ModelCloud in #2388
- [FIX] unittest by @ZX-ModelCloud in #2389
- [FIX] missing ExllamaV2 kernels initialization in AutoRound by @ZX-ModelCloud in #2390
- [CI] keep uv up to date by @CSY-ModelCloud in #2391
- [FIX] test_awq by @ZX-ModelCloud in #2392
- [FIX] Incorrectly selected device by @ZX-ModelCloud in #2394
- [FIX] quantization failure for Qwen2/2.5/3 VL models with FlashAttention-2 by @ZX-ModelCloud in #2396
- [FIX] test_ovis2 and test_ovis_1_6_llama by @ZX-ModelCloud in #2397
- [FIX] test_stage_modules by @ZX-ModelCloud in #2398
- [CI] list test files with py file & fix duplicated test names by @CSY-ModelCloud in #2399
- [FIX] test_pause_resume by @ZX-ModelCloud in #2400
- [CI] update sort, root test files first by @CSY-ModelCloud in #2401
- [FIX] exllama_v1 kernel crash by @ZX-ModelCloud in #2402
- [FIX] test_chatglm by @ZX-ModelCloud in #2406
- set tokenicer>=0.0.6 by @CSY-ModelCloud in #2407
- Fix tokenizer_class incompatibility with transformers 5.0 by @juraev in #2403
- [FIX] model_test by @ZX-ModelCloud in #2410
- fixed ValueError: invalid pyproject.toml config: project.license. con… by @CSY-ModelCloud in #2412
- [FIX] module_tree tests by @ZX-ModelCloud in #2411
- fix license warning by @CSY-ModelCloud in #2413
New Contributors
- @ChenShisen made their first contribution in #2309
- @12345txy made their first contribution in #2351
- @juraev made their first contribution in #2365
- @namgyu-youn made their first contribution in #2405
Full Changelog: v5.6.12...v5.7.0