Releases: bitsandbytes-foundation/bitsandbytes
Latest `main` wheel
Latest main pre-release wheel
This pre-release contains the latest development wheels for all supported platforms, rebuilt automatically on every commit to the main branch.
How to install:
Pick the correct command for your platform and run it in your terminal:
Linux (ARM/aarch64)
pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_aarch64.whlLinux (x86_64)
pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whlWindows (x86_64)
pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-win_amd64.whlNote:
These wheels are updated automatically with every commit tomainand become available as soon as the python-package.yml workflow finishes.
The version number is replaced with 1.33.7-preview in order to keep the link stable, this however does not affect the installed version at all:
> pip install https://.../bitsandbytes-1.33.7-preview-py3-none-manylinux_2_24_x86_64.whl
Collecting bitsandbytes==1.33.7rc0
...
Successfully installed bitsandbytes-0.46.0.dev0
0.48.2
What's Changed
- Fix indexing overflow issue for blockwise quantization by @matthewdouglas in #1784
- Fix regression with CPU/disk offloading for accelerate + int8 by @matthewdouglas in #1786
- XPU: Add Windows build for SYCL kernels by @matthewdouglas in #1787
Full Changelog: 0.48.1...0.48.2
0.48.1
This release fixes a regression introduced in 0.48.0 related to LLM.int8(). This issue caused poor inference results with pre-quantized checkpoints in HF transformers.
What's Changed
- Add trove-classifiers requirement to pyproject.toml by @ccoulombe in #1774
- Fix regression in 8bit parameter device movement by @matthewdouglas in #1776
Full Changelog: 0.48.0...0.48.1
0.48.0: Intel GPU & Gaudi support, CUDA 13, performance improvements, and more!
Highlights
🎉 Intel GPU Support
We now officially support Intel GPUs on Linux and Windows! Support is included for all major features (LLM.int8(), QLoRA, 8bit optimizers) with the exception of the paged optimizer feature.
This support includes the following hardware:
- Intel® Arc™ B-Series Graphics
- Intel® Arc™ A-Series Graphics
- Intel® Data Center GPU Max Series
A compatible PyTorch version with Intel XPU support is required. The current minimum is PyTorch 2.6.0. It is recommended to use the latest stable release. See Getting Started on Intel GPU for guidance.
🎉 Intel Gaudi Support
We now officially support Intel Gaudi2 and Gaudi3 accelerators. This support includes LLM.int8() and QLoRA with the NF4 data type. At this time optimizers are not implemented.
A compatible PyTorch version with Intel Gaudi support is required. The current minimum is Gaudi v1.21 with PyTorch 2.6.0. It is recommended to use the latest stable release. See the Gaudi software installation guide for guidance.
NVIDIA CUDA
- The 4bit dequantization kernel was improved by @Mhmd-Hisham in #1746. This change brings noticeable speed improvements for prefill, batch token generation, and training. The improvement is particularly prominent on A100, H100, and B200.
- We've added CUDA 13.0 compatibility across Linux x86-64, Linux aarch64, and Windows x86-64 platforms.
- Hardware support for CUDA 13.0 is limited to Turing generation and newer.
- Support for Thor (SM110) is available in the Linux aarch64 build.
🚨 Breaking Changes
- Dropped support for PyTorch 2.2. The new minimum requirement is 2.3.0.
- Removed Maxwell GPU support for all CUDA builds.
What's Changed
- add py.typed by @cyyever in #1726
- Enable F841 by @cyyever in #1727
- add int mm for xpu after torch 2.9 by @jiqing-feng in #1736
- for intel xpu case, use MatMul8bitFp even not use ipex by @kaixuanliu in #1728
- 4bit quantization for arbitrary
nn.Parameterby @matthewdouglas in #1720 - Adjust 4bit test tolerance on CPU for larger blocksizes by @matthewdouglas in #1749
- Test improvements by @matthewdouglas in #1750
- [XPU] Implemented 32bit optimizers in triton by @YangKai0616 in #1710
- Add SYCL Kernels for XPU backend by @xiaolil1 in #1679
- [XPU] Implemented 8bit optimizers in triton by @Egor-Krivov in #1692
- Drop Maxwell (sm50) build from distribution by @matthewdouglas in #1755
- Bump minimum PyTorch to 2.3 by @matthewdouglas in #1754
- [CUDA] Branchless NF4/FP4 kDequantizeBlockwise kernel for faster dequantization by @Mhmd-Hisham in #1746
- Update log by @YangKai0616 in #1758
- Add function to reverse 4bit weights for HPU by @vivekgoe in #1757
- Add CUDA 13.0 Support by @matthewdouglas in #1761
- Fix for warpSize deprecation in ROCm 7.0 by @pnunna93 in #1762
- Build/Package Intel XPU binary for Linux by @matthewdouglas in #1763
- Update workflow for packaging by @matthewdouglas in #1766
- Add Thor support by @jasl in #1764
- ROCm: Add 6.4 and 7.0 builds by @matthewdouglas in #1767
- Linear8bitLt: support device movement after forward() by @matthewdouglas in #1769
New Contributors
- @cyyever made their first contribution in #1726
- @kaixuanliu made their first contribution in #1728
- @YangKai0616 made their first contribution in #1710
- @xiaolil1 made their first contribution in #1679
- @vivekgoe made their first contribution in #1757
- @jasl made their first contribution in #1764
Full Changelog: 0.47.0...0.48.0