Releases: ml-explore/mlx-lm
Releases · ml-explore/mlx-lm
v0.30.7
What's Changed
- Fix Kimi Linear by @kernelpool in #853
- Bump version for next release by @awni in #865
- Pythonic tool calling for LFM2 models by @viktike in #864
- Fix DeepSeek V3.2 indexer and weight loading by @kernelpool in #866
- Make validation set optional in training process by @Goekdeniz-Guelmez in #857
- Mistral tool parser by @awni in #874
- LongCat MLA by @kernelpool in #868
- [MODEL] support qwen3.5 series w/o vision by @JJJYmmm in #869
- Faster DSV32 generation by @kernelpool in #885
- Add GLM5 by @Goekdeniz-Guelmez in #867
New Contributors
Full Changelog: v0.30.6...v0.30.7
v0.30.6
What's Changed
- Transformers v5 by @awni in #811
- Add LongCat Flash tool parser by @kernelpool in #810
- Add Kimi-K2.5 by @kernelpool in #813
- Bump mlx version and version by @awni in #816
- Fix NemotronH config compatibility with HuggingFace format by @LuqDaMan in #820
- Fix for Exception - MultiLinear.to_quantized() missing 'mode' by @inferencers in #809
- Fix Kimi K2.5 tool call handling by @kernelpool in #821
- Actually add cli by @awni in #823
- Add LongCat Flash Lite by @kernelpool in #819
- Fix mixed quant by @awni in #825
- Support distributed inference in the server by @angeloskath in #741
- fix cli by @solarpunkin in #827
- Enable loading custom models by @awni in #830
- Allow default creation of BatchRotatingKVCache instead of BatchKVCache in batch mode by @christian-lms in #834
- Add Step 3.5 Flash by @kernelpool in #836
- server: support chat_template_kwargs and top_logprobs by @percontation in #829
- fix: handle GLM 4.7 tool call fallbacks by @jalehman in #792
- Deepseek V3.2 implementation fixes by @sjug in #838
- Fix Step 3.5 Flash model conversion by @kernelpool in #840
- Fix batch mamba by @awni in #842
- Fix sliding window mask during generation by @kernelpool in #843
- DSV3 MLA by @awni in #839
New Contributors
Full Changelog: v0.30.5...v0.30.6
v0.30.5
What's Changed
- import logging as it throws no logging error in place of actual error by @Maanas-Verma in #778
- server: use OpenAI compatible finish_reason by @percontation in #782
- move Xielu Activation in Apertus to activations.py by @Goekdeniz-Guelmez in #772
- bump transformers by @awni in #746
- Update glm4_moe_lite to store KV latent in cache by @N8python in #780
- Adding TeleChat3 by @Goekdeniz-Guelmez in #773
- add kimi tool parser by @Evanev7 in #791
- Allow qq ops with activation quantization by @awni in #749
- fix: use correct variable for logprobs in batch generation by @LuqDaMan in #800
- Sync random seed across ranks in distributed chat by @kernelpool in #801
- Fix ArraysCache.from_state not initializing left_padding and lengths by @lpalbou in #807
New Contributors
- @Maanas-Verma made their first contribution in #778
- @percontation made their first contribution in #782
- @LuqDaMan made their first contribution in #800
- @lpalbou made their first contribution in #807
Full Changelog: v0.30.4...v0.30.5
v0.30.4
What's Changed
- Add AWQ/GPTQ weight transformation utilities by @ericcurtin in #730
- Add IQuest Coder V1 Loop variant by @kernelpool in #716
- Fix sliding window batching by @awni in #738
- Fix Batch Generation: Add extract method to ArraysCache for item retrieval by @Goekdeniz-Guelmez in #740
- Make MambaCache compatible with batch generation for nemotron-h by @nikhilmitrax in #690
- Add a server benchmark for continuous batching by @awni in #728
- Fix tools parameter in apply_chat_template call by @kernelpool in #747
- Refactor tokenizer error handling to use warnings instead of exceptio… by @cubist38 in #744
- Make cache list batchable by @awni in #743
- Fix batch generation for IQuestLoopCoder model by @kernelpool in #748
- Fix type hint and pydoc for batch_generate by @tibbes in #745
- Handle empty caches during batch merge by @ivanfioravanti in #755
- Update for latest mlx by @awni in #759
- Use compiled Swiglu by @awni in #753
- Adds support for Nemotron Super 49b v1.5 by @lazarust in #756
- fix(falcon_h1): support tied embeddings and correct muP scaling by @solarpunkin in #764
- Fix swiglu parameter order by @kernelpool in #767
- Fix CacheList batching by @kernelpool in #769
- fix: unused batch_size parameter for mlx_lm.evaluate by @AndrewTan517 in #762
- Add gpt-oss sharding by @Evanev7 in #761
- Fix LongCat Flash extended context support by @kernelpool in #768
- Add minimax tensor sharding by @Evanev7 in #760
- Shard LongCat Flash by @kernelpool in #771
- Add glm4 moe lite model by @ivanfioravanti in #776
New Contributors
- @ericcurtin made their first contribution in #730
- @nikhilmitrax made their first contribution in #690
- @tibbes made their first contribution in #745
- @solarpunkin made their first contribution in #764
- @AndrewTan517 made their first contribution in #762
- @Evanev7 made their first contribution in #761
Full Changelog: v0.30.2...v0.30.4
v0.30.2
v0.30.1
What's Changed
- custom dsv32 chat template by @awni in #693
- shard glm by @awni in #698
- support minimax m2 by @awni in #700
- Enhance load_config function to check for config file existence and i… by @cubist38 in #701
- batch_generate fails with Phi3 (LongRoPE) when prompts have different lengths by @vyaivanove in #707
- Fix GIL starvation in _generate thread when batch is idle by @sjug in #706
- Ignore generation_config decode errors by @will-lms in #708
- Allow mxfp8 and nvfp4 by @awni in #709
- Fix chat template detection for models with custom tokenizers by @kernelpool in #712
- chore: add model-path param flag for convert API for better clarity by @jaycoolslm in #702
- Add RWKV7 by @MollySophia in #580
- Fix empty /v1/models response for locally loaded models by @cxl-git-hub in #713
- Add IQuest Coder V1 by @kernelpool in #714
- Add YoutuLLM by @johnmai-dev in #720
- Add logits_processors support to batch_generate by @lazarust in #635
- Add Solar Open by @kernelpool in #721
- Add K-EXAONE MoE by @kernelpool in #719
- Improve reasoning and tool call parsing in server by @awni in #711
- Patch bump by @awni in #731
New Contributors
- @cubist38 made their first contribution in #701
- @vyaivanove made their first contribution in #707
- @sjug made their first contribution in #706
- @jaycoolslm made their first contribution in #702
- @MollySophia made their first contribution in #580
- @cxl-git-hub made their first contribution in #713
- @lazarust made their first contribution in #635
Full Changelog: v0.30.0...v0.30.1
v0.30.0
What's Changed
- fix: server busy-waiting during idle request polling by @zenyr in #674
- Fixes for transformers v5 by @awni in #684
- Add mimo v2 flash by @awni in #685
- More useful error message for unsupported batching by @awni in #687
- Model parallel generation by @angeloskath in #676
- Bump to transformer v5 by @awni in #689
- Revert return dict and wrap apply_chat_template by @awni in #691
- Bump the version by @angeloskath in #692
Full Changelog: v0.29.0...v0.30.0
v0.29.0
What's Changed
- version bump by @angeloskath in #651
- Fix slow batch generation in server by setting wired_limit by @otarkhan in #652
- Fix RoPE for rnj-1 by @awni in #657
- fix: calling correct dequantize function by @devnamrits in #666
- Use test data zipfile in CI by @awni in #662
- Default repetition penalty to 0.0 in the server by @awni in #658
- fix dsv32 and gemma3 by @awni in #664
- Fix fusion and test by @awni in #668
- Fix server batching condition for SSMs by @angeloskath in #655
- Fix SuScaledRoPE by @DePasqualeOrg in #660
- Fix DSV32 by @awni in #669
- Fix for Devstral-2 by @inferencers in #671
- support nemotron 3 by @awni in #678
New Contributors
- @otarkhan made their first contribution in #652
- @devnamrits made their first contribution in #666
- @DePasqualeOrg made their first contribution in #660
- @inferencers made their first contribution in #671
Full Changelog: v0.28.4...v0.29.0
test_data
Commands (assuming the blow script is in dl_test_data.py):
HF_HOME="." python dl_test_data.py
zip -r test_data.zip datasets hub
gh release upload test_data test_data.zip
import datasets
from huggingface_hub import snapshot_download
repos = [
"mlx-community/Qwen1.5-0.5B-Chat-4bit",
"mlx-community/Mistral-7B-v0.2-4bit",
"mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx",
"mlx-community/Mistral-7B-Instruct-v0.3",
"mlx-community/Phi-3.5-mini-instruct-4bit",
"mlx-community/Llama-3.2-1B-Instruct-4bit",
"mlx-community/Falcon3-7B-Instruct-4bit",
"mlx-community/Qwen3-4B-4bit",
]
allow_patterns = [
"*.md",
"*.json",
"*.py",
"tokenizer.model",
"*.tiktoken",
"tiktoken.model",
"*.txt",
"*.jsonl",
"*.jinja",
]
for repo in repos:
snapshot_download(
repo,
allow_patterns=allow_patterns,
)
snapshot_download(
"mlx-community/Qwen1.5-0.5B-Chat-4bit",
allow_patterns=["model*.safetensors"],
)
datasets.load_dataset("billsum")v0.28.4
What's Changed
- version by @awni in #559
- Add Minimax-M2 by @Blaizzy in #568
- Align checkpoint loading with Jamba Mini and Large by @Goekdeniz-Guelmez in #555
- Fix dequant + minor refactor by @awni in #572
- fix eval thinking by @awni in #578
- Fixed typo in
load_adaptersthat broke adapter loading by @jyork03 in #583 - Fix AttributeError when loading custom draft models by @kernelpool in #590
- Add gen options and CoT removal by @awni in #587
- Add parallel_residual setting to gptneox by @spotbot2k in #586
- Fixed/improved behavior of the mask_prompt feature. by @jyork03 in #584
- add MiniMax-M2 in supported models by @sriting in #575
- Fix: Remove call to deleted method by @jyork03 in #591
- Make mlx-lm more type-checker friendly by @tnadav in #573
- Fix: JSON parse error handling: avoid referencing stream before init by @jyork03 in #592
- Adding ring mini linear by @Goekdeniz-Guelmez in #513
- Add Kimi Linear by @Blaizzy in #577
- DWQ for very large models by @awni in #536
- Fix Byte Decoder Lookup for Esoteric Single-Characters by @N8python in #600
- Fix input_embeddings prefill bug in generate_step by @Blaizzy in #606
- ACKNOWLEDGMENTS.md House keeping by @Goekdeniz-Guelmez in #594
- FIX: Add missing sentencepiece dependency for tokenizers by @Deekshith-Dade in #611
- switch to github actions by @awni in #618
- Fix for kimi k2 by @awni in #593
- Allow providing prompt caches in batched generation by @angeloskath in #602
- Fix olmo3 by @awni in #628
- add support for Trinity/AfMoE model by @ivanfioravanti in #640
- Ministral3 by @awni in #642
- Add a prompt cache that can hold multiple prompts by @angeloskath in #625
- Fix flaky losses test by @awni in #643
- Fix lora fusion for non affine quantization by @awni in #647
- Batching in the server by @angeloskath in #626
- Add deepseek v32 by @awni in #512
- version bump by @awni in #649
- Fix the release action by @angeloskath in #650
New Contributors
- @jyork03 made their first contribution in #583
- @spotbot2k made their first contribution in #586
- @sriting made their first contribution in #575
- @tnadav made their first contribution in #573
- @Deekshith-Dade made their first contribution in #611
Full Changelog: v0.28.3...v0.28.4