Releases · ModelCloud/GPTQModel

02 Apr 23:56

Qubitium

v6.0.3

6a65d69

GPT-QModel v6.0.3 Latest

Latest

Notable Changes:

Quantization and inference

Major ParoQuant improvements across speed, inference, and accuracy.
Added Paro inference support and a new layer optimizer.
Auto-enables AMP for the fast Paro implementation to better match reference behavior.
Added Paro rotation autotuning and fixed BF16 rotation support for the fused CUDA kernel.
Improved Paro stability with seeding fixes, cleanup, learned channel scale clamping, and contiguous tensor handling fixes.
Fixed a layer output replay/re-capture regression.
Added FOEM (First-Order Error Matters) for more accurate quantized LLM compensation, plus follow-up fixes to its data processing pipeline.
Replaced the old marlin_fp16 backend behavior with environment-flag control for FP32 reduction.

Model and backend support

Added support for Gemma4, MiniCPMO, MiniCPMV, and GLM4-MoE-Lite.
Added PrismML/Bonsai model support for inference.
Fixed Qwen3_5QModel definition issues.
Fixed Qwen 3.5 rotary embedding behavior.
Fixed AWQ layer grouping for qwen3_5_moe, llama4, qwen2_moe, and qwen3_next.
Fixed awq_processor.dynamic so skipped layers are handled correctly.
Improved dtype compatibility.
Hugging Face kernels are now gated off on Python no-GIL builds until upstream wheel support is fixed.

Evaluation, calibration, and usability

Integrated Evalution into the workflow.
Added evalution.VLLM and evalution.SGLang backends.
Fixed SGLang evaluation engine initialization.
Automatically determines MODEL_COMPAT_FAST_LAYER_COUNT.
Improved calibration data device handling.
Updated tokenizer handling, and collation now respects tokenizer padding_size.
Improved import performance by lazy-loading _DEVICE_THREAD_POOL.
Cleaned up warning behavior and added an option to suppress warnings.
Removed forced random seed overrides.

Dependency and compatibility updates

Updated pypcre to 0.2.14.
Pinned logbar to >=0.4.1.
Updated transformers and defuser package versions.
Fixed SAVE_PATH handling and import path resolution issues.

Breaking and removed

Removed GPTQModel.upload_to_hub().
Removed MLX export support.

What's Changed

[CI] fix pkgs' order & fix flashinfer version was overridden by @CSY-ModelCloud in #2575
allow to disable warning by @CSY-ModelCloud in #2576
lazy load _DEVICE_THREAD_POOL, to speed up import by @CSY-ModelCloud in #2577
remove disable env check by @CSY-ModelCloud in #2578
[CI] no need to set MAX_JOBS by @CSY-ModelCloud in #2579
Update pypcre version to 0.2.14 by @Qubitium in #2581
Nothing to see here... by @Qubitium in #2456
dtype compat by @Qubitium in #2582
fix test_moe_config by @ZX-ModelCloud in #2583
fix new format test by @ZX-ModelCloud in #2586
[CI] add test config by @CSY-ModelCloud in #2587
fix Qwen3_5QModel definition by @ZX-ModelCloud in #2588
speed up paroquant quant speed and resolve accuracy issues by @Qubitium in #2590
append last commit to version by @CSY-ModelCloud in #2591
speedup paroquant test by @ZX-ModelCloud in #2592
[CI] generate release matrix from torch registry by @CSY-ModelCloud in #2593
Evalution integration by @Qubitium in #2585
move eval.sh to tests by @Qubitium in #2594
remove warning by @Qubitium in #2595
[CI] use new docker image by @CSY-ModelCloud in #2596
[CI] install required pkg by @CSY-ModelCloud in #2597
Automatically Determine MODEL_COMPAT_FAST_LAYER_COUNT by @ZX-ModelCloud in #2598
[CI] no need to set MAX_JOBS by @CSY-ModelCloud in #2599
Fix: Paroquant impl accuracy by @Qubitium in #2601
remove forced random seed override in cls proper by @Qubitium in #2603
Paro test by @Qubitium in #2604
[FIX] incorrect SAVE_PATH by @ZX-ModelCloud in #2605
pin logbar to >= 0.4.1 by @Qubitium in #2606
Update the evalution scores by @ZX-ModelCloud in #2600
Paro: auto enable amp for fast impl to sync with reference by @Qubitium in #2607
paro: fix seeding and cleanup by @Qubitium in #2609
gate hf kernel to non-nogil builds of python until upsteram fix wheels by @Qubitium in #2610
[CI] use Ubuntu 24.04 docker image by @CSY-ModelCloud in #2612
Fix layer output re-capture (replay) regression by @Qubitium in #2611
remove legacy ppl codes by @Qubitium in #2613
replace marlin_fp16 backend with env flag control for fp32 reduction … by @Qubitium in #2614
[CI] default py 3.14t & install latest Evalution by @CSY-ModelCloud in #2616
[CI] fix Evalution is private by @CSY-ModelCloud in #2617
updat tokenicer by @Qubitium in #2618
make collate respect tokenier padding_size by @Qubitium in #2620
paro: clamp learned channel scales to avoid collapse by @Qubitium in #2622
Calibration data device by @avtc in #2608
[FIX] qwen3_5 rotary_embedding by @ZX-ModelCloud in #2624
Temporarily disable gptqmodel spit_by feature by @ZX-ModelCloud in #2625
use evalution.VLLM by @CSY-ModelCloud in #2615
use evalution.SGLang by @ZX-ModelCloud in #2626
paro: enter the dragon by @Qubitium in #2623
[CI] use torch 2.11 by @CSY-ModelCloud in #2627
[FIX] sglang evaluation engine initialization error. by @ZX-ModelCloud in #2629
[MODEL] Add minicpmo support by @ZX-ModelCloud in #2630
[CI] update CI path by @CSY-ModelCloud in #2633
[FIX] qwen3_5_moe / llama4 / qwen2_moe / qwen3_next awq layer grouping by @ZX-ModelCloud in #2634
Remove GPTQModel.upload_to_hub() api by @ZX-ModelCloud in #2635
remove export to mlx option by @ZX-ModelCloud in #2636
[MODEL] supports minicpmv by @ZX-ModelCloud in #2637
Paro: layer optimizer by @Qubitium in #2628
Paro inference by @Qubitium in #2638
PrismAI/Bonsai Model Support (inference only) by @Qubitium in #2640
Update README.md by @Qubitium in #2641
Update transformers and defuser package versions by @Qubitium in #2642
[CI] install gguf for test_local_model_paths by @CSY-ModelCloud in #2645
fix imported path not found by @CSY-ModelCloud in #2646
[MODEL] support glm4_moe_lite by @ZX-ModelCloud in #2644
[FEATURE] Add FOEM: First-Order Error Matters; Accurate Compensation for Quantized LLM by @Xingyu-Zheng in #2639
Revise README with latest news and article references by @Qubitium in #2647
FIX paroquant bf16 rotation support for fused cuda kernel by @Qubitium in #2648
paroquant rotation autotune by @Qubitium in #2649
[FIX] In awq_processor, dynamic did not correctly skip layers. by @ZX-ModelCloud in #2650
ruff fix by @Qubitium in #2651
Ruff fix by @Qubitium in #2652
update readme by @Qubitium in #2653
fix: ensure contagious tensors by @Qubitium in #2655
fix failed test by @ZX-ModelCloud in https://github.com/ModelCl...

Contributors

Qubitium, avtc, and 3 other contributors

Assets 66

gptqmodel-6.0.3+cu128torch2.10-cp310-cp310-linux_x86_64.whl

sha256:78a6a1e202ba6ed55d45a29805d844b36a723bbd406b6954f76f95f9ee65c1b7

154 MB 2026-04-03T04:46:31Z
gptqmodel-6.0.3+cu128torch2.10-cp311-cp311-linux_x86_64.whl

sha256:46caaf4b70a3ff1b0875ad7db7d315bb7d6733b6dcbd95ed11e47a7ff3d0f2ab

154 MB 2026-04-03T04:45:23Z
gptqmodel-6.0.3+cu128torch2.10-cp312-cp312-linux_x86_64.whl

sha256:14e656f51fae3b4f81baad14fe8872fd8d3edbb202ea21f4706d4489888aed43

154 MB 2026-04-03T04:45:51Z
gptqmodel-6.0.3+cu128torch2.10-cp313-cp313-linux_x86_64.whl

sha256:6be95a94bdd6a75ffe14b99a5d665bb9cb434dc9ca1ec8e03d4301406da744c7

154 MB 2026-04-03T04:26:34Z
gptqmodel-6.0.3+cu128torch2.10-cp313-cp313t-linux_x86_64.whl

sha256:cafe47ecab7c1d1ffb325c199073a5534caf86f5026785b22044f9d0c19feefb

154 MB 2026-04-03T04:36:24Z
gptqmodel-6.0.3+cu128torch2.10-cp314-cp314-linux_x86_64.whl

sha256:a139f4ba7792ef6eee240b74a7248bedb2b84979b3cc33956e8e07690ecb12fe

154 MB 2026-04-03T04:13:07Z
gptqmodel-6.0.3+cu128torch2.10-cp314-cp314t-linux_x86_64.whl

sha256:e38822e4376c50b907652c514487a7ade0368870fcc177b19f5a0ce0804e781f

154 MB 2026-04-03T04:08:54Z
gptqmodel-6.0.3+cu128torch2.11-cp310-cp310-linux_x86_64.whl

sha256:01e510f74647130616987c4f16aee372b7cc892a2040656eb341d66139ed14f0

154 MB 2026-04-03T06:46:02Z
gptqmodel-6.0.3+cu128torch2.11-cp311-cp311-linux_x86_64.whl

sha256:aa5bcb7a1a778ec64aa103c1145571daf4655b860596c7a85d3895e658ef80aa

154 MB 2026-04-03T03:26:57Z
gptqmodel-6.0.3+cu128torch2.11-cp312-cp312-linux_x86_64.whl

sha256:fc6f4e1aee1937d32b5b5e7fdc55a1dcbe55e7a7691c1cbf9879f0ac5e586293

154 MB 2026-04-03T03:19:38Z
Source code (zip)

2026-04-02T23:55:57Z
Source code (tar.gz)

2026-04-02T23:55:57Z

19 Mar 16:35

Qubitium

v5.8.0

9980f01

GPT-QModel v5.8.0

Notable Changes

Transformers 5.3.0 compatibility.
Video Quantization Support
- Added support for video input during quantization.
MoE & Model Support
- Added support for Qwen 3.5 and Qwen 3.5 MoE.
- Expanded compatibility for Qwen 3 variants including MoE / VL / Omni / Next.
- Added support for LLada2 block diffusion LLM models.
- Improved compatibility for Mixtral, Phi-4, Nemotron Ultra, BaiChuan, ChatGLM, Yi, and GLM4V.
- Fixed multiple MoE-specific AWQ and multi-GPU issues, including routing, module tree, position embeddings, and device mismatches.
AWQ / GPTQ Kernels
- Added CPU fused AWQ kernels for torch_fused and hf_kernel.
- Added torch_int8 AWQ kernel.
- Added BitBLAS AWQ kernel.
- Ported Intel int8 GPTQ/AWQ kernels.
- Updated kernel selection to prefer HF kernels where they provide the best performance and compatibility.
- Added BitBLAS fallback protection and fixed BitBLAS accuracy and qzero remap regressions.
Quantization Improvements
- Replaced greedy search with ternary search in SmoothBSE.
- Fixed SmoothMAD overly aggressive clipping.
- Added layer-level dynamic skip for fast quantization.
- Added early stop when all remaining layers are skipped during quantization.
- Fixed AWQ OOM and dequantization-related issues.
Runtime & Dequantization
- Added optional CPU int64 g_idx cache for TorchQuantLinear dequantization.
- Improved TorchFused dequantization and fp32 dtype support.
- Removed unnecessary symmetric handling in dequantize_gemm.
- Fixed rotary embedding device mismatch by storing per-device rotary copies.
- Added warmup protection for threaded timing.
Defuser Integration
- Integrated defuser.convert_hf_model().
- Integrated defuser.materialize_model().
- Integrated defuser.replace_fused_blocks().
- Improved defuser meta/offload compatibility and fused block handling.
Compatibility Fixes
- Improved compatibility with older and newer Hugging Face Transformers / Optimum versions.
- Fixed import compatibility issues in models/utils.
- Fixed rotary / embedding config compatibility with older HF and model variants.
- Improved tokenizer and model compatibility updates related to tokenicer.
- Fixed OSS compatibility issues.
Kernel / Backend Changes
- Hard deprecated ExLLaMA v1 kernel.
- Exposed the Triton patcher as an externally callable API.

What's Changed

support video input for quantization by @techshoww in #2386
feat: moe-router-bypass-batch-size by @avtc in #2349
[CI] use UV as python manager by @CSY-ModelCloud in #2415
[CI] fix deps installation & gpu service api path by @CSY-ModelCloud in #2416
[CI] auto release GPU if job has sth wrong or unrecoverable by @CSY-ModelCloud in #2417
[CI] save log to disk & fix deps installation by @CSY-ModelCloud in #2418
Replace Greedy with Tenary Search for SmoothBSE by @namgyu-youn in #2419
Feature/LLada2 support: Block Diffusion LLM by @blazingbhavneek in #2422
Bump the github-actions group with 2 updates by @dependabot[bot] in #2426
[MODEL] supports qwen3_5 by @ZX-ModelCloud in #2427
[FIX] eval bug for qwn3_5 quantized model by @ZX-ModelCloud in #2428
[MODEL] supports qwen3_5_moe by @ZX-ModelCloud in #2433
Update tokenicer dependency version to 0.0.7 by @Qubitium in #2434
Optional CPU g_idx int64 cache for TorchQuantLinear dequant path by @Qubitium in #2431
fix import compat issues for models/utils that is locked to higher ve… by @Qubitium in #2436
call defuser.convert_hf_model() by @ZX-ModelCloud in #2437
Update defuser dependency version to 0.0.3 by @Qubitium in #2439
quantize mlp experts module for qwen3_5_moe by @ZX-ModelCloud in #2443
Fix typo in setup.py causing wheel build failure (sys.abiflag -> sys.abiflags) by @beomchan0 in #2444
call defuser's materialize_model() by @ZX-ModelCloud in #2446
Update defuser dependency version to 0.0.4 by @Qubitium in #2447
port intel's int8 gptq/awq kernel over by @Qubitium in #2438
expose triton patcher as externally callable by @Qubitium in #2448
docs by @Qubitium in #2449
Add AWQ support for CPU fused kernels (torch_fused & hf_kernel) by @jiqing-feng in #2445
Cleanupx by @Qubitium in #2450
Make HF kernels for gptq/awq highest priority as they are the highest… by @Qubitium in #2451
rm sym in dequantize_gemm by @jiqing-feng in #2452
fix awq rotary device mismatch. store per-device copy of rotary by @Qubitium in #2453
add torch_int8 awq kernel by @Qubitium in #2454
[CI] move check log to a new step by @CSY-ModelCloud in #2455
cleanup hf kernel gptq/awq post_init loading by @Qubitium in #2457
fix SmoothMAD overly-aggressive clipping by @Qubitium in #2459
upgrade defuser version to 0.0.5 by @ZX-ModelCloud in #2460
[FIX] test_qwen3_5_moe by @ZX-ModelCloud in #2461
Update defuser dependency version to 0.0.6 by @Qubitium in #2462
fix awq oom by @CSY-ModelCloud in #2458
[CI] CUDA 131 + Torch 2.10.0 + Python 3.13 by @CSY-ModelCloud in #2463
Fix the module_tree in Qwen3_5_Moe to correctly support AWQ by @ZX-ModelCloud in #2464
[CI] fix git link cannot be installed by uv by @CSY-ModelCloud in #2465
[FIX] GEMM can't pack by @ZX-ModelCloud in #2466
[CI] add peft for test_asym_gptq_v1 & check log after test by @CSY-ModelCloud in #2467
[CI] get path error from log & install pre-compiled bitblas by @CSY-ModelCloud in #2468
[CI] fix log files were saved with wrong runid by @CSY-ModelCloud in #2469
[FIX] where qwen3_5_moe got incorrect position_embeddings during AWQ quantization by @ZX-ModelCloud in #2470
Update pypcre version to 0.2.13 by @CSY-ModelCloud in #2471
read dependencies from requirements.txt by @CSY-ModelCloud in #2472
add setuptools to requirements.txt by @CSY-ModelCloud in #2474
set minimum setuptools version to 78.1.1 by @CSY-ModelCloud in #2475
[FIX] device mismatch issue that occurred during multi-GPU AWQ quantization in moe Model by @ZX-ModelCloud in #2476
[CI] auto uninstall unneeded pkgs by @CSY-ModelCloud in #2478
fix ci failed tests by @ZX-ModelCloud in #2477
update mixtral's module_tree by @ZX-ModelCloud in #2480
Fix CI by @Qubitium in #2481
[CI] add pypi as backup by @CSY-ModelCloud in #2482
Ci fixes 2 by @Qubitium in #2483
CI Tests Fix 3 by @Qubitium in #2484
[CI] fix old models need old transformers by @CSY-ModelCloud in #2485
fix failed test by @ZX-ModelCloud in #2486
[CI] install latest bitblas & fix missing pkgs by @CSY-ModelCloud in #2487
BaiChuan fix by @Qubitium in #2488
Ci fix 5 by @Qubitium in #2489
Shelll/Src module buffer registratio mismatch + Qwen 2.5 VL patch by @Qubitium in #2490
[CI] install latest evalplus wheel by @CSY-ModelCloud in #2492
[CI] throw error for fast check by @CSY-ModelCloud in #2493
[FIX] test_post_quant_eora by @ZX-ModelCloud in https://github.com/ModelCloud/GPTQ...

Contributors

Qubitium, avtc, and 8 other contributors

Assets 46

10 Feb 10:09

Qubitium

v5.7.0

ed96f2e

GPT-QModel v5.7.0

Notable Changes:

Feature: MoE.Routing control (Bypass or Override) by @avtc in #2235
Feature: Use FailSafe Naive Quantization when GPTQ fails due to MoE uneven routing by @ZX-ModelCloud in #2293
Feature: ability to pause/resume quantization via 'p' key by @avtc in #2294
Glm4v support by @LRL2-ModelCloud in #2303
Failsafe smoothers by @Qubitium in #2304
New median strategy and SmoothPercentileAsymmetric smoother by @Qubitium in
Support for Qwen2.5-Omni calibration data includes audio. by @ChenShisen in #2309
Add Smooth trigger based on group_size by @Qubitium in #2312
Voxtral support by @LRL2-ModelCloud in #2315
Better compat with triton-windows and other alternative triton packages by @Qubitium in #2395
Dynamically map format/backend to kernel by @Qubitium in #2353
Add EXAONE4 support by @namgyu-youn in #2405

What's Changed

[FIX] unittest by @ZX-ModelCloud in #2291
[FIX] marlin forward by @ZX-ModelCloud in #2296
FIX fast_hadamard_transform import by @LRL2-ModelCloud in #2298
do not log moe errors if failesafe enabled by @Qubitium in #2299
[CI] allow cancel action by @CSY-ModelCloud in #2300
Fix non-rtn packing by @Qubitium in #2302
log q vs weight abs.mean for loss column by @Qubitium in #2306
fix inverted failsafe log condition by @Qubitium in #2310
#2311
Allow failsafe to be none by @Qubitium in #2313
move non-inference affecting fields to meta on save by @Qubitium in #2314
[FIX] GPTQModel.load() can now correctly load non-quantized models. by @ZX-ModelCloud in #2317
FIX hf kernel by @jiqing-feng in #2319
[CI] test_qwen3_moe add eval task: GSM8K_PLATINUM_COT and MMLU_STEM by @ZX-ModelCloud in #2320
Release 5.7 Prep by @Qubitium in #2318
[FIX] Exclude unrouted MoE experts on load by @ZX-ModelCloud in #2321
[FIX] Skip empty subset by @ZX-ModelCloud in #2322
[FIX] GLM-4.5-Air quantize fail by @ZX-ModelCloud in #2323
fix: offload_to_disk=True uses more vram than offload_to_disk=False by @avtc in #2325
Fix import no_init_weights from transformers by @jiqing-feng in #2329
[FIX] qqq quantize by @ZX-ModelCloud in #2330
chery pick: attempt to fix terminal state after pause/resume handlers by @avtc in #2327
[FIX] quantization to fail for non-MoE models by @ZX-ModelCloud in #2333
Device check by @jiqing-feng in #2334
FIX moe flag passing not passing nested ci test by @Qubitium in #2337
Use safer checks for nullable properties where they may not exists at… by @Qubitium in #2338
Fix unit test by @Qubitium in #2339
Group module_tree/subsection parsing related tests to module_tree folder by @Qubitium in #2340
Group kernel tests by @Qubitium in #2341
Lifecycle: Move awq.pack_module to submodule_finalize() from process() by @ZX-ModelCloud in #2335
Partial Revert 2235: temp remove moe bypass by @ZX-ModelCloud in #2343
Re apply compute device filter by @Qubitium in #2345
Re-apply moe routing bypass by @ZX-ModelCloud in #2347
Fix: Zero point underflow in AWQ Exllama v2 kernel by @12345txy in #2351
Remove unnecessary +1/-1 inference/packing zerpoint offset for AWQ Exllama v2 kernel by @Qubitium in #2352
Normalize AWQ.qcfg zero_point to sym property by @Qubitium in #2355
FIX sym True with AWQ by @ZX-ModelCloud in #2357
Prepare for 5.7 by @Qubitium in #2358
[FIX] self_attn.q_proj was not quantized in the Moonlight Model by @ZX-ModelCloud in #2360
[FIX] torch_fused inference error by @ZX-ModelCloud in #2362
[FIX] FORMAT.LLM_AWQ was incorrectly quantized as FORMAT.GEMM by @ZX-ModelCloud in #2364
[CI] load all tests include sub dirs & merge some small tests in to one file by @CSY-ModelCloud in #2363
Fix evalplus output filename mismatch by @juraev in #2365
[FIX] FORMAT.GEMV and FORMAT.GEMV_FAST could not be quantized by @ZX-ModelCloud in #2366
[CI] add deps config for CI tests by @CSY-ModelCloud in #2368
[FIX] unittest by @ZX-ModelCloud in #2370
[FIX] In AWQProcessor, the failsafe threshold_value should be calculated based on the scale group, not the entire layer by @ZX-ModelCloud in #2369
[CI] fix ci didn't read correct yaml by @CSY-ModelCloud in #2371
[FIX] ci unittest by @ZX-ModelCloud in #2372
[FIX] test_q4_bitblas and test_qqq by @ZX-ModelCloud in #2373
[CI] add test_integration deps by @CSY-ModelCloud in #2374
[CI] fix torch version was upgraded by deps by @CSY-ModelCloud in #2377
select_quant_linear should always receive a non-null device by @Qubitium in #2376
[CI] uninstall pynvml by @CSY-ModelCloud in #2378
[FIX] failed ci test by @ZX-ModelCloud in #2380
[FIX] test_gptq by @ZX-ModelCloud in #2382
[FIX] correct has_captured_input_ids() logic by using > 0 check by @ZX-ModelCloud in #2383
[FIX] test_model by @ZX-ModelCloud in #2384
[FIX] unit test by @ZX-ModelCloud in #2385
[CI] use new docker by @CSY-ModelCloud in #2387
[FIX] ci test by @ZX-ModelCloud in #2388
[FIX] unittest by @ZX-ModelCloud in #2389
[FIX] missing ExllamaV2 kernels initialization in AutoRound by @ZX-ModelCloud in #2390
[CI] keep uv up to date by @CSY-ModelCloud in #2391
[FIX] test_awq by @ZX-ModelCloud in #2392
[FIX] Incorrectly selected device by @ZX-ModelCloud in #2394
[FIX] quantization failure for Qwen2/2.5/3 VL models with FlashAttention-2 by @ZX-ModelCloud in #2396
[FIX] test_ovis2 and test_ovis_1_6_llama by @ZX-ModelCloud in #2397
[FIX] test_stage_modules by @ZX-ModelCloud in #2398
[CI] list test files with py file & fix duplicated test names by @CSY-ModelCloud in #2399
[FIX] test_pause_resume by @ZX-ModelCloud in #2400
[CI] update sort, root test files first by @CSY-ModelCloud in #2401
[FIX] exllama_v1 kernel crash by @ZX-ModelCloud in #2402
[FIX] test_chatglm by @ZX-ModelCloud in #2406
set tokenicer>=0.0.6 by @CSY-ModelCloud in #2407
Fix tokenizer_class incompatibility with transformers 5.0 by @juraev in #2403
[FIX] model_test by @ZX-ModelCloud in #2410
fixed ValueError: invalid pyproject.toml config: project.license. con… by @CSY-ModelCloud in htt...

Contributors

Qubitium, avtc, and 8 other contributors

Assets 32

17 Dec 11:28

Qubitium

v5.6.12

1a19cd0

GPT-QModel v5.6.12

Notable Changes:

uv compat
Both uv and pip install will now display ui progress for external wheel/depend downloads.

What's Changed

[FIX] failed unittest by @ZX-ModelCloud in #2286
fix wheel name mistaches with version name by @CSY-ModelCloud in #2288
Setup download progress by @Qubitium in #2289
Update latest news section in README.md by @Qubitium in #2290

Full Changelog: v5.6.10...v5.6.12

Contributors

Qubitium, ZX-ModelCloud, and CSY-ModelCloud

Assets 34

16 Dec 10:13

Qubitium

v5.6.10

70a507d

GPT-QModel v5.6.10

Notable Changes:

Triton check by @Qubitium in #2274
Fix bitblas support for gptq_v2 format by @xxxxyu in #2281
Fix awq triton kernel has invalid properties by @Qubitium in #2279

What's Changed

Add kernel selection log by @ZX-ModelCloud in #2275
Update README.md by @Qubitium in #2276
Update pypcre depend by @Qubitium in #2277
Update version.py by @Qubitium in #2278
Add macos unit tests by @CSY-ModelCloud in #2282
Update README.md by @Qubitium in #2283

New Contributors

@xxxxyu made their first contribution in #2281

Full Changelog: v5.6.6...v5.6.10

Contributors

Qubitium, xxxxyu, and 2 other contributors

Assets 34

16 Dec 04:11

Qubitium

v5.6.8

711b214

GPT-QModel v5.6.8

Notable Changes:

Fix Triton check/import by @Qubitium in #2274

What's Changed

Add kernel selection log by @ZX-ModelCloud in #2275
Update README.md by @Qubitium in #2276

Full Changelog: v5.6.6...v5.6.8

Contributors

Qubitium and ZX-ModelCloud

Assets 34

15 Dec 10:35

Qubitium

v5.6.6

9a79b62

v5.6.6

Notable Changes:

Use static cuda ctx for triton kernel launch by @Qubitium in #2269
Remove random-word depend by @LRL2-ModelCloud in #2266
Update PyPcre depend from 0.2.7 to 0.2.8 by @Qubitium in #2267

What's Changed

Bump the github-actions group with 2 updates by @dependabot[bot] in #2265
Update version.py by @Qubitium in #2268
Ready 5.6.6 by @Qubitium in #2270

Full Changelog: v5.6.2...v5.6.6

Contributors

Qubitium, dependabot, and LRL2-ModelCloud

Assets 34

15 Dec 08:27

Qubitium

v5.6.4

61e5e7f

GPT-QModel v5.6.4

What's Changed

Bump the github-actions group with 2 updates by @dependabot[bot] in #2265
remove random-word depend by @LRL2-ModelCloud in #2266
Update pypcre version from 0.2.7 to 0.2.8 by @Qubitium in #2267
Update version.py by @Qubitium in #2268

Full Changelog: v5.6.2...v5.6.4

Contributors

Qubitium, dependabot, and LRL2-ModelCloud

Assets 34

12 Dec 10:04

Qubitium

v5.6.2

d97478f

GPT-QModel v5.6.2

Notable Changes

FIX JIT Pytorch extension pack_cpu_ext stall by @ZX-ModelCloud in #2248
Refractor Kernel External Dependency Validation by @LRL2-ModelCloud in #2249
FIX some models not honoring model.config.use_cache by force pass use_cache=false by @LRL2-ModelCloud in #2246
FIX Incorrect Triton dequant_kernel for 3-bit GPTQ (INT3) leads to Triton compile error / wrong dequantization #2251 by
Support llm-awq by @ZX-ModelCloud in #2252

What's Changed

Update version.py by @Qubitium in #2247
Update README.md by @davedgd in #2250
[CI] add torch 2.9.1 by @CSY-ModelCloud in #2254
@KingdalfGoodman in #2258
Update license declaration in pyproject.toml by @CSY-ModelCloud in #2259
Modify setup by @Qubitium in #2260
Add release notes for version 5.6.2 by @Qubitium in #2261
fix test_quant_formats.py by @LRL2-ModelCloud in #2262
[CI] mount dateset dir to /monster/data/model/dataset by @CSY-ModelCloud in #2263
fix parsing args by @CSY-ModelCloud in #2264

New Contributors

@KingdalfGoodman made their first contribution in #2258

Full Changelog: v5.6.0...v5.6.2

Contributors

Qubitium, davedgd, and 4 other contributors

Assets 34

09 Dec 11:53

Qubitium

v5.6.0

b63b373

GPT-QModel v5.6.0

Notable Changes:

HF Kernel for CPU: AMX, AVX2, AVX512 optimized by @jiqing-feng in #2232
Fix: Resolve performance regression during initial forward pass with offload_to_disk by @avtc in #2239
Auto module tree by @LRL2-ModelCloud in #2204
Afmoe support by @LRL2-ModelCloud in #2243
Add dots1 by @Qubitium in #2231

What's Changed

Update description and code about GPTAQ in README.md by @wayneguow in #2202
Update test cases for qwen2.5-vl and qwen3-vl by @wayneguow in #2203
Optimize minimax m2 modelling forward pass by @avtc in #2176
remove gemm ipex by @LRL2-ModelCloud in #2206
Bump actions/checkout from 5 to 6 in the github-actions group by @dependabot[bot] in #2207
Update device-smi dependency version to 0.5.2 by @Qubitium in #2208
Fix loading an AWQ-quantized model with GPTQModel when it is not actu… by @LRL2-ModelCloud in #2209
fix exllama v2 post init by @LRL2-ModelCloud in #2211
[FIX] Add fallback for "module_dir" and "entry key" lookup by @ZX-ModelCloud in #2210
Update unit_tests.yml by @Qubitium in #2213
fix mps backend does not implement float64 by @Qubitium in #2216
[FIX] _apply_quant() not being called with awq by @ZX-ModelCloud in #2218
Fix AWQ Extension by @LRL2-ModelCloud in #2217
Auto AWQ kernel selection for Transformers compat by @Qubitium in #2214
Fix add bias for torch_fuse by @jiqing-feng in #2223
[CI] Add torch_fused test with Bias by @ZX-ModelCloud in #2222
[FIX] device_map with cpu only causing CpuOffload hooks to be injected by @ZX-ModelCloud in #2225
fix awq apply_scale and apply_clip multi thread issue by @LRL2-ModelCloud in #2224
Fix CI test not pasing by @Qubitium in #2226
Monkeypatch lm-eval latest broken imports by @Qubitium in #2227
make file can be pytest called by @CSY-ModelCloud in #2228
CI Fix awq weight mean by @LRL2-ModelCloud in #2229
fix pycharm auto imported wrong path by @CSY-ModelCloud in #2230
[FIX] TorchFusedAwqQuantLinear selection by @ZX-ModelCloud in #2233
[CI] update CI path by @CSY-ModelCloud in #2236
[Model] Mistral3 support by @LRL2-ModelCloud in #2238
Update setup.py by @Qubitium in #2240
Increase MAX_JOBS from 4 to 8 in release.yml by @Qubitium in #2241
[FIX] non-peristent buffer was saved incorrectly by @ZX-ModelCloud in #2242

New Contributors

@wayneguow made their first contribution in #2202

Contributors

Qubitium, avtc, and 6 other contributors

Assets 34

Releases: ModelCloud/GPTQModel

GPT-QModel v6.0.3

Notable Changes:

Quantization and inference

Model and backend support

Evaluation, calibration, and usability

Dependency and compatibility updates

Breaking and removed

What's Changed

Contributors

Uh oh!

GPT-QModel v5.8.0

Notable Changes

What's Changed

Contributors

Uh oh!

GPT-QModel v5.7.0

Notable Changes:

What's Changed

Contributors

Uh oh!

GPT-QModel v5.6.12

Notable Changes:

What's Changed

Contributors

Uh oh!

GPT-QModel v5.6.10

Notable Changes:

What's Changed

New Contributors

Contributors

Uh oh!

GPT-QModel v5.6.8

Notable Changes:

What's Changed

Contributors

Uh oh!

v5.6.6

Notable Changes:

What's Changed

Contributors

Uh oh!

GPT-QModel v5.6.4

What's Changed

Contributors

Uh oh!

GPT-QModel v5.6.2

Notable Changes

What's Changed

New Contributors

Contributors

Uh oh!

GPT-QModel v5.6.0

Notable Changes:

What's Changed

New Contributors

Contributors

Uh oh!