Releases · peter277/llama.cpp

09 Aug 04:24

b6121 Latest

Latest

gguf-py : add Numpy MXFP4 de/quantization support (#15111)

* gguf-py : add MXFP4 de/quantization support

* ggml-quants : handle zero amax for MXFP4

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-08-09T04:24:49Z
llama-b6121-bin-macos-arm64.zip

sha256:a8804abb8045ab0689c62c0fe586ca3d9fc62c197fba8b1e31e6e745fd6617c8

10.8 MB 2025-08-09T04:25:02Z
llama-b6121-bin-macos-x64.zip

sha256:e730ec45515fa6b9bc59f13150698de4fef3e0aeccb24fed8146fe9e22c4cefe

27.6 MB 2025-08-09T04:25:03Z
llama-b6121-bin-ubuntu-vulkan-x64.zip

sha256:eb4ab81bc56657c37a2c7a6b0a7d49649b78c97f96f65ed966d1922220f95208

21.5 MB 2025-08-09T04:25:05Z
llama-b6121-bin-ubuntu-x64.zip

sha256:83b66d6df389d640b8fbc2b9fca3661afffb8d88d0079fdd7a77122549fbda0d

12.7 MB 2025-08-09T04:25:06Z
llama-b6121-bin-win-cpu-arm64.zip

sha256:e60a43e2c1e254a1b022d5499a61a253b44123bc9f6db2c0ba11c0222445c083

11 MB 2025-08-09T04:25:07Z
llama-b6121-bin-win-cpu-x64.zip

sha256:96401e0980fdbddc77d1f4b310caccd9bd5dca27b47d843c68ebbee8b572b19d

13.9 MB 2025-08-09T04:25:08Z
llama-b6121-bin-win-cuda-12.4-x64.zip

sha256:6531e9ae1f47d85e9afa61cf185354a5f25eb32703c2d6a63e85a3af602663e4

138 MB 2025-08-09T04:25:10Z
llama-b6121-bin-win-hip-radeon-x64.zip

sha256:cd0eab2a1cefde74bc255e4f7f63e3f654677bb5723faf3437cb644792844524

287 MB 2025-08-09T04:25:16Z
llama-b6121-bin-win-opencl-adreno-arm64.zip

sha256:9573c0a6f936106004c454881b4314a6ad39586d2da4f921dfc72a1243c57e79

11.4 MB 2025-08-09T04:25:26Z
Source code (zip)

2025-08-08T21:48:26Z
Source code (tar.gz)

2025-08-08T21:48:26Z

Provide feedback