Releases: ml-explore/mlx
Releases · ml-explore/mlx
v0.5.0
Highlights:
- Faster convolutions.
- Up to 14x faster for some common sizes.
- See benchmarks
Core
mx.whereproperly handlesinf- Faster and more general convolutions
- Input and kernel dilation
- Asymmetric padding
- Support for cross-correlation and convolution
atleast_{1,2,3}daccept any number of arrays
NN
nn.Upsamplelayer- Supports nearest neighbor and linear interpolation
- Any number of dimensions
Optimizers
- Linear schedule and schedule joiner:
- Use for e.g. linear warmup + cosine decay
Bugfixes
arangethrows oninfinputs- Fix Cmake build with MLX
- Fix
logsumexpinfedge case - Fix grad of power w.r.t. to exponent edge case
- Fix compile with
infconstants - Bug temporary bug in convolution
v0.4.0
Highlights:
- Partial shapeless compilation
- Default shapeless compilation for all activations
- Can be more than 5x faster than uncompiled versions
- CPU kernel fusion
- Some functions can be up to 10x faster
Core
- CPU compilation
- Shapeless compilation for some cases
mx.compile(function, shapeless=True)
- Up to 10x faster scatter: benchmarks
mx.atleast_1d,mx.atleast_2d,mx.atleast_3d
Bugfixes
- Bug with
tolistwithbfloat16andfloat16 - Bug with
argmaxon M3
v0.3.0
Highlights:
mx.fastsubpackage- Custom
mx.fast.ropeup to 20x faster
Core
- Support metadata with
safetensors - Up to 5x faster scatter and 30% faster gather
- 40% faster
bfloat16quantizated matrix-vector multiplies mx.fastsubpackage with a fast RoPE- Context manager
mx.streamto set the default device
NN
- Average and Max pooling layers for 1D and 2D inputs
Optimizers
- Support schedulers for e.g. learning rates
- A few basic schedulers:
optimizers.step_decayoptimizers.cosine_decayopimtizers.exponential_decay
Bugfixes
- Fix bug in remainder with negative numerators and integers
- Fix bug with slicing into softmax
- Fix quantized matmuls with non 32 multiples
v0.2.0
Highlights:
mx.compilemakes stuff go fast- Some functions are up to 10x faster (benchmarks)
- Training models anywhere from 10% to twice as fast (benchmarks)
- Simple syntax for compiling full training steps
Core
mx.compilefunction transformation- Find devices properly for iOS
- Up to 10x faster GPU gather
__abs__overload forabson arrayslocandscalein parameter formx.random.normal
NN
- Margin ranking loss
- BCE loss with weights
Bugfixes
- Fix for broken eval during function transformations
- Fix
mx.varto giveinfwithdoff >= nelem - Fix loading empty modules in
nn.Sequential
v0.1.0
Highlights
- Memory use improvements:
- Gradient checkpointing for training with
mx.checkpoint - Better graph execution order
- Buffer donation
- Gradient checkpointing for training with
Core
- Gradient checkpointing with
mx.checkpoint - CPU only QR factorization
mx.linalg.qr - Release Python GIL during
mx.eval - Depth-based graph execution order
- Lazy loading arrays from files
- Buffer donation for reduced memory use
mx.diag,mx.diagonal- Breaking:
array.shapeis a Python tuple - GPU support for
int64anduint64reductions - vmap over reductions and arg reduction:
sum,prod,max,min,all,anyargmax,argmin
NN
- Softshrink activation
Bugfixes
- Comparisons with
infwork, and fixmx.isinf - Bug fix with RoPE cache
- Handle empty Matmul on the CPU
- Negative shape checking for
mx.full - Correctly propagate
NaNin some binary opsmx.logaddexp,mx.maximum,mx.minimum
- Fix > 4D non-contiguous binary ops
- Fix
mx.log1pwithinfinput - Fix SGD to apply weight decay even with 0 momentum
v0.0.11
Highlights:
- GGUF improvements:
- Native quantizations
Q4_0,Q4_1, andQ8_0 - Metadata
- Native quantizations
Core
- Support for reading and writing GGUF metadata
- Native GGUF quantization (
Q4_0,Q4_1, andQ8_0) - Quantize with group size of 32 (2x32, 4x32, and 8x32)
NN
Module.save_weightssupports safetensorsnn.initpackage with several commonly used neural network initializers- Binary cross entropy and cross entropy losses can take probabilities as targets
Adafactorinnn.optimizers
Bugfixes
- Fix
isinfand friends for integer types - Fix array creation from list Python ints to
int64,uint, andfloat32 - Fix power VJP for
0inputs - Fix out of bounds
infreads ingemv mx.arangecrashes on NaN inputs
v0.0.10
Highlights:
- Faster matmul: up to 2.5x faster for certain sizes, benchmarks
- Fused matmul + addition (for faster linear layers)
Core
- Quantization supports sizes other than multiples of 32
- Faster GEMM (matmul)
- ADMM primitive (fused addition and matmul)
mx.isnan,mx.isinf,isposinf,isneginfmx.tile- VJPs for
scatter_minandscatter_max - Multi output split primitive
NN
- Losses: Gaussian negative log-likelihood
Misc
- Performance enhancements for graph evaluation with lots of outputs
- Default PRNG seed is based on current time instead of 0
- Primitive VJP takes output as input. Reduces redundant work without need for simplification
- PRNGs default seed based on system time rather than fixed to 0
- Format boolean printing in Python style when in Python
Bugfixes
- Scatter < 32 bit precision and integer overflow fix
- Overflow with
mx.eye - Report Metal out of memory issues instead of silent failure
- Change
mx.roundto follow NumPy which rounds to even
v0.0.9
Highlights:
- Initial (and experimental) GGUF support
- Support Python buffer protocol (easy interoperability with NumPy, Jax, Tensorflow, PyTorch, etc)
at[]syntax for scatter style operations:x.at[idx].add(y), (min,max,prod, etc)
Core
- Array creation from other mx.array’s (
mx.array([x, y])) - Complete support for Python buffer protocol
mx.inner,mx.outer- mx.logical_and, mx.logical_or, and operator overloads
- Array at syntax for scatter ops
- Better support for in-place operations (
+=,*=,-=, ...) - VJP for scatter and scatter add
- Constants (
mx.pi,mx.inf,mx.newaxis, …)
NN
- GLU activation
cosine_similarityloss- Cache for
RoPEandALiBi
Bugfixes / Misc
- Fix data type with
tri - Fix saving non-contiguous arrays
- Fix graph retention for inlace state, and remove
retain_graph - Multi-output primitives
- Better support for loading devices
v0.0.7
Core
- Support for loading and saving HuggingFace's safetensor format
- Transposed quantization matmul kernels
mlx.core.linalgsub-package withmx.linalg.norm(Frobenius, infininty, p-norms)tensordotandrepeat
NN
- Layers
Bilinear,Identity,InstanceNormDropout2D,Dropout3D- more customizable
Transformer(pre/post norm, dropout) - More activations:
SoftSign,Softmax,HardSwish,LogSoftmax - Configurable scale in
RoPEpositional encodings
- Losses:
hinge,huber,log_cosh
Misc
- Faster GPU reductions for certain cases
- Change to memory allocation to allow swapping
v0.0.6
Core
- quantize, dequantize, quantized_matmul
- moveaxis, swapaxes, flatten
- stack
- floor, ceil, clip
- tril, triu, tri
- linspace
Optimizers
- RMSProp, Adamax, Adadelta, Lion
NN
- Layers:
QuantizedLinear,ALiBipositional encodings - Losses: Label smoothing, Smooth L1 loss, Triplet loss
Misc
- Bug fixes