Releases · ml-explore/mlx

24 May 01:33

angeloskath

v0.14.0

9f9cb7a

v0.14.0

Highlights

Small-size build that JIT compiles kernels and omits the CPU backend which results in a binary <4MB
- Series of PRs 1, 2, 3, 4, 5
mx.gather_qmm quantized equivalent for mx.gather_mm which speeds up MoE inference by ~2x
- Some numbers
Grouped 2D convolutions
- Some numbers

Core

mx.conjugate
mx.conv3d and nn.Conv3d
List based indexing
Started mx.distributed which uses MPI (if installed) for communication across machines
- mx.distributed.init
- mx.distributed.all_gather
- mx.distributed.all_reduce_sum
Support conversion to and from dlpack
mx.linalg.cholesky on CPU
mx.quantized_matmul sped up for vector-matrix products
mx.trace
mx.block_masked_mm now supports floating point masks!

Fixes

Error messaging in eval
Add some missing docs
Scatter index bug
The extensions example now compiles and runs
CPU copy bug with many dimensions

Assets 2

17 May 03:52

awni

v0.13.1

6a9b584

v0.13.1

🚀

Assets 2

10 May 01:21

angeloskath

v0.13.0

8bd6bfa

v0.13.0

Highlights

Block sparse matrix multiply speeds up MoEs by >2x
- some numbers
Improved quantization algorithm should work well for all networks
- see evaluations
Improved gpu command submission speeds up training and inference
- some numbers

Core

Bitwise ops added:
- mx.bitwise_[or|and|xor], mx.[left|right]_shift, operator overloads
Groups added to Conv1d
Added mx.metal.device_info to get better informed memory limits
Added resettable memory stats
mlx.optimizers.clip_grad_norm and mlx.utils.tree_reduce added
Add mx.arctan2
Unary ops now accept array-like inputs ie one can do mx.sqrt(2)

Bugfixes

Fixed shape for slice update
Bugfix in quantize that used slightly wrong scales/biases
Fixed memory leak for multi-output primitives encountered with gradient checkpointing
Fixed conversion from other frameworks for all datatypes
Fixed index overflow for matmul with large batch size
Fixed initialization ordering that occasionally caused segfaults

Assets 2

02 May 23:38

awni

v0.12.2

02a9fc7

v0.12.2

Patch bump (#1067)

* version

* use 0.12.2

Assets 2

25 Apr 21:31

angeloskath

v0.12.0

82463e9

v0.12.0

Highlights

Faster quantized matmul
- Up to 40% faster QLoRA or prompt processing, some numbers

Core

mx.synchronize to wait for computation dispatched with mx.async_eval
mx.radians and mx.degrees
mx.metal.clear_cache to return to the OS the memory held by MLX as a cache for future allocations
Change quantization to always represent 0 exactly (relevant issue)

Bugfixes

Fixed quantization of a block with all 0s that produced NaNs
Fixed the len field in the buffer protocol implementation

Assets 2

18 Apr 20:25

awni

v0.11.0

090ff65

v0.11.0

Core

mx.block_masked_mm for block-level sparse matrix multiplication
Shared events for synchronization and asynchronous evaluation

NN

nn.QuantizedEmbedding layer
nn.quantize for quantizing modules
gelu_approx uses tanh for consistency with PyTorch

Assets 2

11 Apr 19:53

awni

v0.10.0

d07e295

v0.10.0

Highlights

Improvements for LLM generation
- Reshapeless quant matmul/matvec
- mx.async_eval
- Async command encoding

Core

Slightly faster reshapeless quantized gemms
Option for precise softmax
mx.metal.start_capture and mx.metal.stop_capture for GPU debug/profile
mx.expm1
mx.std
mx.meshgrid
CPU only mx.random.multivariate_normal
mx.cumsum (and other scans) for bfloat
Async command encoder with explicit barriers / dependency management

NN

nn.upsample support bicubic interpolation

Misc

Updated MLX Extension to work with nanobind

Bugfixes

Fix buffer donation in softmax and fast ops
Bug in layer norm vjp
Bug initializing from lists with scalar
Bug in indexing
CPU compilation bug
Multi-output compilation bug
Fix stack overflow issues in eval and array destruction

Assets 2

28 Mar 23:19

awni

v0.9.0

d8cb312

v0.9.0

Highlights:

Fast partial RoPE (used by Phi-2)
Fast gradients for RoPE, RMSNorm, and LayerNorm
- Up to 7x faster, benchmarks

Core

More overhead reductions
Partial fast RoPE (fast Phi-2)
Better buffer donation for copy
Type hierarchy and issubdtype
Fast VJPs for RoPE, RMSNorm, and LayerNorm

NN

Module.set_dtype
Chaining in nn.Module (model.freeze().update(…))

Bugfixes

Fix set item bugs
Fix scatter vjp
Check shape integer overlow on array construction
Fix bug with module attributes
Fix two bugs for odd shaped QMV
Fix GPU sort for large sizes
Fix bug in negative padding for convolutions
Fix bug in multi-stream race condition for graph evaluation
Fix random normal generation for half precision

Assets 2

21 Mar 21:00

awni

v0.8.0

44390bd

v0.8.0

Highlights

More perf!
mx.fast.rms_norm and mx.fast.layer_norm
Switch to Nanobind substantially reduces overhead
Up to 4x faster __setitem__ (e.g. a[...] = b)

Core

mx.inverse, CPU only
vmap over mx.matmul and mx.addmm
Switch to nanobind from pybind11
Faster setitem indexing
- Benchmarks
mx.fast.rms_norm, token generation benchmark
mx.fast.layer_norm, token generation benchmark
vmap for inverse and svd
Faster non-overlapping pooling

Optimizers

Set minimum value in cosine decay scheduler

Bugfixes

Fix bug in multi-dimensional reduction

Assets 2

14 Mar 19:34

awni

v0.7.0

63ab0ab

v0.7.0

Highlights

Perf improvements for attention ops:
- No copy broadcast matmul (benchmarks)
- Fewer copies in reshape

Core

Faster broadcast + gemm
- benchmarks
mx.linalg.svd (CPU only)
Fewer copies in reshape
Faster small reductions
- benchmarks

NN

nn.RNN, nn.LSTM, nn.GRU

Bugfixes

Fix bug in depth traversal ordering
Fix two edge case bugs in compilation
Fix bug with modules with dictionaries of weights
Fix bug with scatter which broke MOE training
Fix bug with compilation kernel collision

Assets 2

Releases: ml-explore/mlx

v0.14.0

Highlights

Core

Fixes

Uh oh!

v0.13.1

Uh oh!

v0.13.0

Highlights

Core

Bugfixes

Uh oh!

v0.12.2

Uh oh!

v0.12.0

Highlights

Core

Bugfixes

Uh oh!

v0.11.0

Core

NN

Uh oh!

v0.10.0

Highlights

Core

NN

Misc

Bugfixes

Uh oh!

v0.9.0

Highlights:

Core

NN

Bugfixes

Uh oh!

v0.8.0

Highlights

Core

Optimizers

Bugfixes

Uh oh!

v0.7.0

Highlights

Core

NN

Bugfixes

Uh oh!