Releases · ml-explore/mlx-lm

12 Feb 18:40

awni

v0.30.7

1974376

v0.30.7 Latest

Latest

What's Changed

Fix Kimi Linear by @kernelpool in #853
Bump version for next release by @awni in #865
Pythonic tool calling for LFM2 models by @viktike in #864
Fix DeepSeek V3.2 indexer and weight loading by @kernelpool in #866
Make validation set optional in training process by @Goekdeniz-Guelmez in #857
Mistral tool parser by @awni in #874
LongCat MLA by @kernelpool in #868
[MODEL] support qwen3.5 series w/o vision by @JJJYmmm in #869
Faster DSV32 generation by @kernelpool in #885
Add GLM5 by @Goekdeniz-Guelmez in #867

New Contributors

@viktike made their first contribution in #864
@JJJYmmm made their first contribution in #869

Full Changelog: v0.30.6...v0.30.7

Contributors

kernelpool, awni, and 3 other contributors

Assets 2

04 Feb 21:27

awni

v0.30.6

f18526f

v0.30.6

What's Changed

Transformers v5 by @awni in #811
Add LongCat Flash tool parser by @kernelpool in #810
Add Kimi-K2.5 by @kernelpool in #813
Bump mlx version and version by @awni in #816
Fix NemotronH config compatibility with HuggingFace format by @LuqDaMan in #820
Fix for Exception - MultiLinear.to_quantized() missing 'mode' by @inferencers in #809
Fix Kimi K2.5 tool call handling by @kernelpool in #821
Actually add cli by @awni in #823
Add LongCat Flash Lite by @kernelpool in #819
Fix mixed quant by @awni in #825
Support distributed inference in the server by @angeloskath in #741
fix cli by @solarpunkin in #827
Enable loading custom models by @awni in #830
Allow default creation of BatchRotatingKVCache instead of BatchKVCache in batch mode by @christian-lms in #834
Add Step 3.5 Flash by @kernelpool in #836
server: support chat_template_kwargs and top_logprobs by @percontation in #829
fix: handle GLM 4.7 tool call fallbacks by @jalehman in #792
Deepseek V3.2 implementation fixes by @sjug in #838
Fix Step 3.5 Flash model conversion by @kernelpool in #840
Fix batch mamba by @awni in #842
Fix sliding window mask during generation by @kernelpool in #843
DSV3 MLA by @awni in #839

New Contributors

@jalehman made their first contribution in #792

Full Changelog: v0.30.5...v0.30.6

Contributors

kernelpool, jalehman, and 8 other contributors

Assets 2

25 Jan 15:29

awni

v0.30.5

beceb5c

v0.30.5

What's Changed

import logging as it throws no logging error in place of actual error by @Maanas-Verma in #778
server: use OpenAI compatible finish_reason by @percontation in #782
move Xielu Activation in Apertus to activations.py by @Goekdeniz-Guelmez in #772
bump transformers by @awni in #746
Update glm4_moe_lite to store KV latent in cache by @N8python in #780
Adding TeleChat3 by @Goekdeniz-Guelmez in #773
add kimi tool parser by @Evanev7 in #791
Allow qq ops with activation quantization by @awni in #749
fix: use correct variable for logprobs in batch generation by @LuqDaMan in #800
Sync random seed across ranks in distributed chat by @kernelpool in #801
Fix ArraysCache.from_state not initializing left_padding and lengths by @lpalbou in #807

New Contributors

@Maanas-Verma made their first contribution in #778
@percontation made their first contribution in #782
@LuqDaMan made their first contribution in #800
@lpalbou made their first contribution in #807

Full Changelog: v0.30.4...v0.30.5

Contributors

kernelpool, percontation, and 7 other contributors

Assets 2

19 Jan 16:13

awni

v0.30.4

0222860

v0.30.4

What's Changed

Add AWQ/GPTQ weight transformation utilities by @ericcurtin in #730
Add IQuest Coder V1 Loop variant by @kernelpool in #716
Fix sliding window batching by @awni in #738
Fix Batch Generation: Add extract method to ArraysCache for item retrieval by @Goekdeniz-Guelmez in #740
Make MambaCache compatible with batch generation for nemotron-h by @nikhilmitrax in #690
Add a server benchmark for continuous batching by @awni in #728
Fix tools parameter in apply_chat_template call by @kernelpool in #747
Refactor tokenizer error handling to use warnings instead of exceptio… by @cubist38 in #744
Make cache list batchable by @awni in #743
Fix batch generation for IQuestLoopCoder model by @kernelpool in #748
Fix type hint and pydoc for batch_generate by @tibbes in #745
Handle empty caches during batch merge by @ivanfioravanti in #755
Update for latest mlx by @awni in #759
Use compiled Swiglu by @awni in #753
Adds support for Nemotron Super 49b v1.5 by @lazarust in #756
fix(falcon_h1): support tied embeddings and correct muP scaling by @solarpunkin in #764
Fix swiglu parameter order by @kernelpool in #767
Fix CacheList batching by @kernelpool in #769
fix: unused batch_size parameter for mlx_lm.evaluate by @AndrewTan517 in #762
Add gpt-oss sharding by @Evanev7 in #761
Fix LongCat Flash extended context support by @kernelpool in #768
Add minimax tensor sharding by @Evanev7 in #760
Shard LongCat Flash by @kernelpool in #771
Add glm4 moe lite model by @ivanfioravanti in #776

New Contributors

@ericcurtin made their first contribution in #730
@nikhilmitrax made their first contribution in #690
@tibbes made their first contribution in #745
@solarpunkin made their first contribution in #764
@AndrewTan517 made their first contribution in #762
@Evanev7 made their first contribution in #761

Full Changelog: v0.30.2...v0.30.4

Contributors

kernelpool, ivanfioravanti, and 10 other contributors

Assets 2

06 Jan 02:32

awni

v0.30.2

94497d5

v0.30.2

What's Changed

Fix mlx-lm release by @awni in #733

Full Changelog: v0.30.1...v0.30.2

Contributors

awni

Assets 2

06 Jan 00:55

awni

v0.30.1

4c80c68

v0.30.1

What's Changed

custom dsv32 chat template by @awni in #693
shard glm by @awni in #698
support minimax m2 by @awni in #700
Enhance load_config function to check for config file existence and i… by @cubist38 in #701
batch_generate fails with Phi3 (LongRoPE) when prompts have different lengths by @vyaivanove in #707
Fix GIL starvation in _generate thread when batch is idle by @sjug in #706
Ignore generation_config decode errors by @will-lms in #708
Allow mxfp8 and nvfp4 by @awni in #709
Fix chat template detection for models with custom tokenizers by @kernelpool in #712
chore: add model-path param flag for convert API for better clarity by @jaycoolslm in #702
Add RWKV7 by @MollySophia in #580
Fix empty /v1/models response for locally loaded models by @cxl-git-hub in #713
Add IQuest Coder V1 by @kernelpool in #714
Add YoutuLLM by @johnmai-dev in #720
Add logits_processors support to batch_generate by @lazarust in #635
Add Solar Open by @kernelpool in #721
Add K-EXAONE MoE by @kernelpool in #719
Improve reasoning and tool call parsing in server by @awni in #711
Patch bump by @awni in #731

New Contributors

@cubist38 made their first contribution in #701
@vyaivanove made their first contribution in #707
@sjug made their first contribution in #706
@jaycoolslm made their first contribution in #702
@MollySophia made their first contribution in #580
@cxl-git-hub made their first contribution in #713
@lazarust made their first contribution in #635

Full Changelog: v0.30.0...v0.30.1

Contributors

kernelpool, awni, and 9 other contributors

Assets 2

18 Dec 21:46

angeloskath

v0.30.0

1b2d11b

v0.30.0

What's Changed

fix: server busy-waiting during idle request polling by @zenyr in #674
Fixes for transformers v5 by @awni in #684
Add mimo v2 flash by @awni in #685
More useful error message for unsupported batching by @awni in #687
Model parallel generation by @angeloskath in #676
Bump to transformer v5 by @awni in #689
Revert return dict and wrap apply_chat_template by @awni in #691
Bump the version by @angeloskath in #692

Full Changelog: v0.29.0...v0.30.0

Contributors

angeloskath, awni, and zenyr

Assets 2

16 Dec 16:58

awni

v0.29.0

f3ed856

v0.29.0

What's Changed

version bump by @angeloskath in #651
Fix slow batch generation in server by setting wired_limit by @otarkhan in #652
Fix RoPE for rnj-1 by @awni in #657
fix: calling correct dequantize function by @devnamrits in #666
Use test data zipfile in CI by @awni in #662
Default repetition penalty to 0.0 in the server by @awni in #658
fix dsv32 and gemma3 by @awni in #664
Fix fusion and test by @awni in #668
Fix server batching condition for SSMs by @angeloskath in #655
Fix SuScaledRoPE by @DePasqualeOrg in #660
Fix DSV32 by @awni in #669
Fix for Devstral-2 by @inferencers in #671
support nemotron 3 by @awni in #678

New Contributors

@otarkhan made their first contribution in #652
@devnamrits made their first contribution in #666
@DePasqualeOrg made their first contribution in #660
@inferencers made their first contribution in #671

Full Changelog: v0.28.4...v0.29.0

Contributors

angeloskath, awni, and 4 other contributors

Assets 2

08 Dec 22:50

awni

test_data

7d042c6

test_data

Commands (assuming the blow script is in dl_test_data.py):

HF_HOME="." python dl_test_data.py
zip -r test_data.zip datasets hub
gh release upload test_data test_data.zip

import datasets
from huggingface_hub import snapshot_download

repos = [
    "mlx-community/Qwen1.5-0.5B-Chat-4bit",
    "mlx-community/Mistral-7B-v0.2-4bit",
    "mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx",
    "mlx-community/Mistral-7B-Instruct-v0.3",
    "mlx-community/Phi-3.5-mini-instruct-4bit",
    "mlx-community/Llama-3.2-1B-Instruct-4bit",
    "mlx-community/Falcon3-7B-Instruct-4bit",
    "mlx-community/Qwen3-4B-4bit",
]

allow_patterns = [
            "*.md",
            "*.json",
            "*.py",
            "tokenizer.model",
            "*.tiktoken",
            "tiktoken.model",
            "*.txt",
            "*.jsonl",
            "*.jinja",
]

for repo in repos:
    snapshot_download(
        repo,
        allow_patterns=allow_patterns,
    )

snapshot_download(
    "mlx-community/Qwen1.5-0.5B-Chat-4bit",
    allow_patterns=["model*.safetensors"],
)

datasets.load_dataset("billsum")

Assets 3

03 Dec 22:36

angeloskath

v0.28.4

454bf9a

v0.28.4

What's Changed

version by @awni in #559
Add Minimax-M2 by @Blaizzy in #568
Align checkpoint loading with Jamba Mini and Large by @Goekdeniz-Guelmez in #555
Fix dequant + minor refactor by @awni in #572
fix eval thinking by @awni in #578
Fixed typo in load_adapters that broke adapter loading by @jyork03 in #583
Fix AttributeError when loading custom draft models by @kernelpool in #590
Add gen options and CoT removal by @awni in #587
Add parallel_residual setting to gptneox by @spotbot2k in #586
Fixed/improved behavior of the mask_prompt feature. by @jyork03 in #584
add MiniMax-M2 in supported models by @sriting in #575
Fix: Remove call to deleted method by @jyork03 in #591
Make mlx-lm more type-checker friendly by @tnadav in #573
Fix: JSON parse error handling: avoid referencing stream before init by @jyork03 in #592
Adding ring mini linear by @Goekdeniz-Guelmez in #513
Add Kimi Linear by @Blaizzy in #577
DWQ for very large models by @awni in #536
Fix Byte Decoder Lookup for Esoteric Single-Characters by @N8python in #600
Fix input_embeddings prefill bug in generate_step by @Blaizzy in #606
ACKNOWLEDGMENTS.md House keeping by @Goekdeniz-Guelmez in #594
FIX: Add missing sentencepiece dependency for tokenizers by @Deekshith-Dade in #611
switch to github actions by @awni in #618
Fix for kimi k2 by @awni in #593
Allow providing prompt caches in batched generation by @angeloskath in #602
Fix olmo3 by @awni in #628
add support for Trinity/AfMoE model by @ivanfioravanti in #640
Ministral3 by @awni in #642
Add a prompt cache that can hold multiple prompts by @angeloskath in #625
Fix flaky losses test by @awni in #643
Fix lora fusion for non affine quantization by @awni in #647
Batching in the server by @angeloskath in #626
Add deepseek v32 by @awni in #512
version bump by @awni in #649
Fix the release action by @angeloskath in #650

New Contributors

@jyork03 made their first contribution in #583
@spotbot2k made their first contribution in #586
@sriting made their first contribution in #575
@tnadav made their first contribution in #573
@Deekshith-Dade made their first contribution in #611

Full Changelog: v0.28.3...v0.28.4

Contributors

kernelpool, tnadav, and 10 other contributors

Assets 2

Releases: ml-explore/mlx-lm

v0.30.7

What's Changed

New Contributors

Contributors

Uh oh!

v0.30.6

What's Changed

New Contributors

Contributors

Uh oh!

v0.30.5

What's Changed

New Contributors

Contributors

Uh oh!

v0.30.4

What's Changed

New Contributors

Contributors

Uh oh!

v0.30.2

What's Changed

Contributors

Uh oh!

v0.30.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.30.0

What's Changed

Contributors

Uh oh!

v0.29.0

What's Changed

New Contributors

Contributors

Uh oh!

test_data

Uh oh!

v0.28.4

What's Changed

New Contributors

Contributors

Uh oh!