Skip to content

Releases: ml-explore/mlx

v0.30.6

06 Feb 17:05
185b06d

Choose a tag to compare

Highlights

  • Much faster bandwidth with JACCL on macOS >= 26.3 (some numbers)

What's Changed

New Contributors

Full Changelog: v0.30.5...v0.30.6

v0.30.5

03 Feb 02:56
adcbb91

Choose a tag to compare

What's Changed

  • patch by @awni in #3074
  • [CUDA] Fallback Event impl when there is no hardware cpu/gpu coherency by @zcbenz in #3070
  • Tune CUDA gaph sizes on B200 and H100 by @awni in #3077
  • [Docs] Simple example of using MLX distributed by @stefpi in #2973
  • Use lower-right causal mask alignment consistently by @Anri-Lombard in #2967
  • Fix ALiBi slopes for non-power-of-2 num_heads by @vovw in #3071
  • More useful error for large indices by @awni in #3079
  • Fix nax condition for iphone by @awni in #3083
  • Fallback to pinned host memory when managed memory is not supported by @zcbenz in #3075
  • Fix failing python tests on Windows by @zcbenz in #3076
  • [Metal] Tune splitk gemm dispatch conditions and partition sizes by @awni in #3087
  • Fix for NAX overflow. by @awni in #3092

New Contributors

Full Changelog: v0.30.4...v0.30.5

v0.30.4

27 Jan 22:27
2f324cc

Choose a tag to compare

Highlights

  • Metal: Much faster vector fused grouped-query attention for long context
  • CUDA: Several improvements to speed up LLM inference for CUDA backend
  • CUDA: Support for dense MoEs
  • CUDA: Better support for consumer GPUs (4090, 5090, RTX 6000, ...)

What's Changed

New Contributors

Full Changelog: v0.30.3...v0.30.4

v0.30.3

13 Jan 23:52
ac26a4c

Choose a tag to compare

Highlights

  • Support nvfp4 and mxfp8 quantized ops on Metal
  • Support nvfp4 and mxfp8 quantized-quantized matrix-matrix multiplication on CUDA

What's Changed

New Contributors

Full Changelog: v0.30.1...v0.30.3

v0.30.1

18 Dec 00:32
c215b6f

Choose a tag to compare

Highlights

  • RDMA over thunderbolt with the JACCL backend (macOS >= 26.2) (some numbers)
  • NAX with JIT so that they can be used in MLX Swift
  • CUDA improvements
    • Many improvements to SDPA (masking, T_q != T_kv)
    • Faster quantize/dequantize
    • QQMM to make use of faster tensor cores
    • Fix in col reduce speeds up training

What's Changed

New Contributors

Full Changelog: v0.30.0...v0.30.1

v0.30.0

19 Nov 23:09
54f1cc6

Choose a tag to compare

Highlights

  • Support for Neural Accelerators on M5 (macOS >= 26.2)

What's Changed

Read more

v0.29.4

11 Nov 15:45
60d80a3

Choose a tag to compare

🚀

v0.29.3

17 Oct 19:11
4bce5f9

Choose a tag to compare

⏭️

v0.29.2

26 Sep 22:51
7a6adda

Choose a tag to compare

⬆️

v0.29.1

12 Sep 00:12
ee18e1c

Choose a tag to compare

🚀