Update dependency bitsandbytes to ^0.46.0 #133

red-hat-konflux · 2025-05-24T18:31:02Z

This PR contains the following updates:

Package	Change	Age	Confidence
bitsandbytes (changelog)	`^0.42.0` -> `^0.46.0`

Release Notes

bitsandbytes-foundation/bitsandbytes (bitsandbytes)

Highlights

Support for torch.compile without graph breaks for LLM.int8().
- Compatible with PyTorch 2.4+, but PyTorch 2.6+ is recommended.
- Experimental CPU support is included.
Support torch.compile without graph breaks for 4bit.
- Compatible with PyTorch 2.4+ for fullgraph=False.
- Requires PyTorch 2.8 nightly for fullgraph=True.
We are now publishing wheels for CUDA Linux aarch64 (sbsa)!
- Targets are Turing generation and newer: sm75, sm80, sm90, and sm100.
PyTorch Custom Operators refactoring and integration:
- We have refactored most of the library code to integrate better with PyTorch via the torch.library and custom ops APIs. This helps enable our torch.compile and additional hardware compatibility efforts.
- End-users do not need to change the way they are using bitsandbytes.
Unit tests have been cleaned up for increased determinism and most are now device-agnostic.
- A new nightly CI runs unit tests for CPU (Windows x86-64, Linux x86-64/aarch64) and CUDA (Linux/Windows x86-64).

Compatability Changes

Support for Python 3.8 is dropped.
Support for PyTorch < 2.2.0 is dropped.
CUDA 12.6 and 12.8 builds are now compatible for manylinux_2_24 (previously manylinux_2_34).
Many APIs that were previously marked as deprecated have now been removed.
New deprecations:
- bnb.autograd.get_inverse_transform_indices()
- bnb.autograd.undo_layout()
- bnb.functional.create_quantile_map()
- bnb.functional.estimate_quantiles()
- bnb.functional.get_colrow_absmax()
- bnb.functional.get_row_absmax()
- bnb.functional.histogram_scatter_add_2d()

What's Changed

PyTorch Custom Operator Integration by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1544
Bump CUDA 12.8.0 build to CUDA 12.8.1 by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1575
Drop Python 3.8 support. by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1574
Test cleanup by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1576
Fix: Return tuple in get_cuda_version_tuple by @DevKimbob in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1580
Fix torch.compile issue for LLM.int8() with threshold=0 by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1581
fix for missing cpu lib by @Titus-von-Koeller in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1585
Fix #1588 - torch compatability for <=2.4 by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1590
Add autoloading for backend packages by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1593
Specify blocksize by @cyr0930 in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1586
fix typo getitem by @ved1beta in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1597
fix: Improve CUDA version detection and error handling by @ved1beta in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1599
Support LLM.int8() inference with torch.compile by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1594
Updates for device agnosticism by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1601
Stop building for CUDA toolkit < 11.8 by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1605
fix intel cpu/xpu installation by @jiqing-feng in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1613
Support 4bit torch.compile fullgraph with PyTorch nightly by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1616
Improve torch.compile support for int8 with torch>=2.8 nightly by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1617
Add simple op implementations for CPU by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1602
Set up nightly CI for unit tests by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1619
point to correct latest continuous release main by @winglian in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1621
ARM runners (faster than cross compilation qemu) by @johnnynunez in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1539
Linux aarch64 CI updates by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1622
Moved int8_mm_dequant from CPU to default backend by @Egor-Krivov in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1626
Refresh content for README.md by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1620
C lib loading: add fallback with sensible error msg by @Titus-von-Koeller in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1615
Switch CUDA builds to use Rocky Linux 8 container by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1638
Improvements to test suite by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1636
Additional CI runners by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1639
CI runner updates by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1643
Optimizer backwards compatibility fix by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1647
General cleanup & test improvements by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1646
Add torch.compile tests by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1648
Documentation Cleanup by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1644
simplified non_sign_bits by @ved1beta in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1649

New Contributors

@DevKimbob made their first contribution in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1580
@cyr0930 made their first contribution in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1586
@ved1beta made their first contribution in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1597
@winglian made their first contribution in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1621
@Egor-Krivov made their first contribution in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1626

Full Changelog: bitsandbytes-foundation/bitsandbytes@0.45.4...0.46.0

`v0.45.5`

Compare Source

This is a minor release that affects CPU-only usage of bitsandbytes. The CPU build of the library was inadvertently omitted from the v0.45.4 wheels.

Full Changelog: bitsandbytes-foundation/bitsandbytes@0.45.4...0.45.5

`v0.45.4`

Compare Source

This is a minor release that affects CPU-only usage of bitsandbytes. There is one bugfix and improved system compatibility on Linux.

What's Changed

Build: use ubuntu-22.04 instead of 24.04 for CPU build (glibc compat) by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1538
Fix CPU dequantization to use nested dequantized scaling constant by @zyklotomic in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1549

New Contributors

@zyklotomic made their first contribution in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1549

Full Changelog: bitsandbytes-foundation/bitsandbytes@0.45.3...0.45.4

`v0.45.3`

Compare Source

Overview

This is a small patch release containing a few bug fixes.

Additionally, this release contains a CUDA 12.8 build which adds the sm100 and sm120 targets for NVIDIA Blackwell GPUs.

What's Changed

Fix #1490 by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1496
Blackwell binaries! by @johnnynunez in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1491
Bug fix: Update create_dynamic_map to always return a float32 tensor by @mitchellgoffpc in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1521
Update cuda versions in error messages by @FxMorin in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1520
QuantState.to(): move code tensor with others to correct device by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1528
Installation doc updates by @matthewdouglas in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1529

New Contributors

@mitchellgoffpc made their first contribution in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1521
@FxMorin made their first contribution in https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1520

Full Changelog: bitsandbytes-foundation/bitsandbytes@0.45.2...0.45.3

`v0.45.2`

Compare Source

This patch release fixes a compatibility issue with Triton 3.2 in PyTorch 2.6. When importing bitsandbytes without any GPUs visible in an environment with Triton installed, a RuntimeError may be raised:

RuntimeError: 0 active drivers ([]). There should only be one.

Full Changelog: bitsandbytes-foundation/bitsandbytes@0.45.1...0.45.2

`v0.45.1`

Compare Source

Improvements:

Compatibility for triton>=3.2.0
Moved package configuration to pyproject.toml
Build system: initial support for NVIDIA Blackwell B100 GPUs, RTX 50 Blackwell series GPUs and Jetson Thor Blackwell.
- Note: Binaries built for these platforms are not included in this release. They will be included in future releases upon the availability of the upcoming CUDA Toolkit 12.7 and 12.8.

Bug Fixes:

Packaging: wheels will no longer include unit tests. (#1478)

Dependencies:

Sets the minimum PyTorch version to 2.0.0.

`v0.45.0`

Compare Source

This is a significant release, bringing support for LLM.int8() to NVIDIA Hopper GPUs such as the H100.

As part of the compatibility enhancements, we've rebuilt much of the LLM.int8() code in order to simplify for future compatibility and maintenance. We no longer use the col32 or architecture-specific tensor layout formats while maintaining backwards compatibility. We additionally bring performance improvements targeted for inference scenarios.

Performance Improvements

This release includes broad performance improvements for a wide variety of inference scenarios. See this X thread for a detailed explanation.

Breaking Changes

🤗PEFT users wishing to merge adapters with 8-bit weights will need to upgrade to peft>=0.14.0.

Packaging Improvements

The size of our wheel has been reduced by ~43.5% from 122.4 MB to 69.1 MB! This results in an on-disk size decrease from ~396MB to ~224MB.
Binaries built with CUDA Toolkit 12.6.2 are now included in the PyPI distribution.
The CUDA 12.5.0 build has been updated to CUDA Toolkit 12.5.1.

Deprecations

A number of public API functions have been marked for deprecation and will emit FutureWarning when used. These functions will become unavailable in future releases. This should have minimal impact on most end-users.
The k-bit quantization features are deprecated in favor of blockwise quantization. For all optimizers, using block_wise=False is not recommended and support will be removed in a future release.
As part of the refactoring process, we've implemented many new 8bit operations. These operations no longer use specialized data layouts.

Full Changelog

refine docs for multi-backend alpha release by @Titus-von-Koeller in #1380
README: Replace special Unicode text symbols with regular characters by @akx in #1385
Update CI tools & fix typos by @akx in #1386
Fix invalid escape sequence warning in Python 3.12 by @oshiteku in #1420
[Build] Add CUDA 12.6.2 build; update 12.5.0 to 12.5.1 by @matthewdouglas in #1431
LLM.int8() Refactoring: Part 1 by @matthewdouglas in #1401

`v0.44.1`

Compare Source

Bug fixes:

Fix optimizer support for Python <= 3.9 by @matthewdouglas in #1379

`v0.44.0`

Compare Source

New: AdEMAMix Optimizer

The AdEMAMix optimizer is a modification to AdamW which proposes tracking two EMAs to better leverage past gradients. This allows for faster convergence with less training data and improved resistance to forgetting.

We've implemented 8bit and paged variations: AdEMAMix, AdEMAMix8bit, PagedAdEMAMix, and PagedAdEMAMix8bit. These can be used with a similar API to existing optimizers.

Improvements:

8-bit Optimizers: The block size for all 8-bit optimizers has been reduced from 2048 to 256 in this release. This is a change from the original implementation proposed in the paper which improves accuracy.
CUDA Graphs support: A fix to enable CUDA Graphs capture of kernel functions was made in #1330. This allows for performance improvements with inference frameworks like vLLM. Thanks @jeejeelee!

Full Changelog:

Embedding4bit and Embedding8bit implementation by @galqiwi in #1292
Bugfix: Load correct nocublaslt library variant when BNB_CUDA_VERSION override is set by @matthewdouglas in #1318
Enable certain CUDA kernels to accept specified cuda stream by @jeejeelee in #1330
Initial support for ppc64le by @mgiessing in #1316
Cuda source cleanup , refactor and fixes by @abhilash1910 in #1328
Update for VS2022 17.11 compatibility with CUDA < 12.4 by @matthewdouglas in #1341
Bump the minor-patch group with 3 updates by @dependabot in #1362
Update matplotlib requirement from ~=3.9.1 to ~=3.9.2 in the major group by @dependabot in #1361
docs: add internal reference to multi-backend guide by @Titus-von-Koeller in #1352
Add move_to_device kwarg to the optimizer's load_state_dict by @koute in #1344
Add AdEMAMix optimizer by @matthewdouglas in #1360
Change 8bit optimizer blocksize 2048->256; additional bf16 support by @matthewdouglas in #1365

`v0.43.3`

Compare Source

Improvements:

FSDP: Enable loading prequantized weights with bf16/fp16/fp32 quant_storage
- Background: This update, linked to Transformer PR #32276, allows loading prequantized weights with alternative storage formats. Metadata is tracked similarly to Params4bit.__new__ post PR #970. It supports models exported with non-default quant_storage, such as this NF4 model with BF16 storage.
- Special thanks to @winglian and @matthewdouglas for enabling FSDP+QLoRA finetuning of Llama 3.1 405B on a single 8xH100 or 8xA100 node with as little as 256GB system RAM.

`v0.43.2`

Compare Source

This release is quite significant as the QLoRA bug fix big implications for higher seqlen and batch sizes.

For each sequence (i.e. batch size increase of one) we expect memory savings of:

405B: 39GB for seqlen=1024, and 4888GB for seqlen=128,00
70B: 10.1GB for seqlen=1024 and 1258GB for seqlen=128,00

This was due to activations being unnecessary for frozen parameters, yet the memory for them was still erroneously allocated due to the now fixed bug.

Improvements:

docs: FSDP+QLoRA and CPU install guide (#1211 #1227, thanks @stevhliu)
Add CUDA 12.5 and update 12.4 builds (#1284)

Bug Fixes

4bit getstate and 8bit deepcopy (#1230 #1231, thanks @BenjaminBossan)
missing optimizers in str2optimizer32bit (#1222, thanks @EtienneDosSantos)
CUDA 12.5 build issue (#1273, thanks @HennerM)
fix for min_8bit_size functionality in Optimizer base classes (#1286, thanks @Edenzzzz)
QLoRA mem bug (#1270, thanks @Ther-nullptr)
tests for cpu only platforms (#1259, thanks @galqiwi)
restoration of quant_storage for CPU offloading (#1279)
optim update error with non-contiguous grads/params (deepspeed) (#1187)

`v0.43.1`

Compare Source

Improvements:

Improved the serialization format for 8-bit weights; this change is fully backwards compatible. (#1164, thanks to @younesbelkada for the contributions and @akx for the review).
Added CUDA 12.4 support to the Linux x86-64 build workflow, expanding the library's compatibility with the latest CUDA versions. (#1171, kudos to @matthewdouglas for this addition).
Docs enhancement: Improved the instructions for installing the library from source. (#1149, special thanks to @stevhliu for the enhancements).

Bug Fixes

Fix 4bit quantization with blocksize = 4096, where an illegal memory access was encountered. (#1160, thanks @matthewdouglas for fixing and @YLGH for reporting)

Internal Improvements:

Tests: improve memory usage (#1147, thanks @matthewdouglas)
Add CUDA 12.4 to docs/install helper (#1136, thanks @matthewdouglas)
Minor type/doc fixes (#1128, thanks @akx)
Reformat Python code with Ruff (#1081, thanks @akx)
Rework of CUDA/native-library setup and diagnostics (#1041, thanks @akx)

`v0.43.0`

Compare Source

Improvements and New Features:

QLoRA + FSDP official support is now live! https://github.com/TimDettmers/bitsandbytes/pull/970 by @warner-benjamin and team - with FSDP you can train very large models (70b scale) on multiple 24GB consumer-type GPUs. See https://www.answer.ai/posts/2024-03-06-fsdp-qlora.html for more details.
Introduced improvements to the CI process for enhanced performance and efficiency during builds, specifically enabling more effective cross-compilation on Linux platforms. This was accomplished by deprecating Make and migrating to Cmake, as well as implementing new corresponding workflows. Huge thanks go to @wkpark, @rickardp, @matthewdouglas and @younesbelkada; #1055, #1050, #1111.
Windows should be officially supported in bitsandbytes if you install the library from source. See: https://huggingface.co/docs/bitsandbytes/main/en/index for more details
Updated installation instructions to provide more comprehensive guidance for users. This includes clearer explanations and additional tips for various setup scenarios, making the library more accessible to a broader audience (@rickardp, #1047).
Enhanced the library's compatibility and setup process, including fixes for CPU-only installations and improvements in CUDA setup error messaging. This effort aims to streamline the installation process and improve user experience across different platforms and setups (@wkpark, @akx, #1038, #996, #1012).
Setup a new documentation at https://huggingface.co/docs/bitsandbytes/main with extensive new sections and content to help users better understand and utilize the library. Especially notable are the new API docs. (big thanks to @stevhliu and @mishig25 from HuggingFace #1012). The API docs have been also addressed in #1075.

Bug Fixes:

Addressed a race condition in kEstimateQuantiles, enhancing the reliability of quantile estimation in concurrent environments (@pnunna93, #1061).
Fixed various minor issues, including typos in code comments and documentation, to improve code clarity and prevent potential confusion (@Brian Vaughan, #1063).

Backwards Compatibility

After upgrading from v0.42 to v0.43, when using 4bit quantization, models may generate slightly different outputs (approximately up to the 2nd decimal place) due to a fix in the code. For anyone interested in the details, see this comment.

Internal and Build System Enhancements:

Implemented several enhancements to the internal and build systems, including adjustments to the CI workflows, portability improvements, and build artifact management. These changes contribute to a more robust and flexible development process, ensuring the library's ongoing quality and maintainability (@rickardp, @akx, @wkpark, @matthewdouglas; #949, #1053, #1045, #1037).

Contributors:

This release is made possible thanks to the many active contributors that submitted PRs and many others who contributed to discussions, reviews, and testing. Your efforts greatly enhance the library's quality and user experience. It's truly inspiring to work with such a dedicated and competent group of volunteers and professionals!

We give a special thanks to @TimDettmers for managing to find a little bit of time for valuable consultations on critical topics, despite preparing for and touring the states applying for professor positions. We wish him the utmost success!

We also extend our gratitude to the broader community for your continued support, feedback, and engagement, which play a crucial role in driving the library's development forward.

Configuration

📅 Schedule: Branch creation - "after 5am on saturday" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

To execute skipped test pipelines write comment /ok-to-test.

This PR has been generated by MintMaker (powered by Renovate Bot).

openshift-ci · 2025-05-24T18:31:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: red-hat-konflux[bot]
Once this PR has been reviewed and has the lgtm label, please assign danielezonca for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2025-05-24T18:31:13Z

Hi @red-hat-konflux[bot]. Thanks for your PR.

I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>

openshift-ci bot requested review from dtrifiro and tarukumar May 24, 2025 18:31

openshift-ci bot added the needs-ok-to-test label May 24, 2025

fix(deps): update dependency bitsandbytes to ^0.46.0

69440dc

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>

red-hat-konflux bot force-pushed the konflux/mintmaker/konflux-poc/bitsandbytes-0.x branch from 3686a0a to 69440dc Compare May 31, 2025 15:53

red-hat-konflux bot changed the title ~~fix(deps): update dependency bitsandbytes to ^0.45.0~~ fix(deps): update dependency bitsandbytes to ^0.46.0 May 31, 2025

red-hat-konflux bot changed the title ~~fix(deps): update dependency bitsandbytes to ^0.46.0~~ Update dependency bitsandbytes to ^0.46.0 Jun 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update dependency bitsandbytes to ^0.46.0 #133

Update dependency bitsandbytes to ^0.46.0 #133

Uh oh!

red-hat-konflux bot commented May 24, 2025 •

edited

Loading

Uh oh!

openshift-ci bot commented May 24, 2025

Uh oh!

openshift-ci bot commented May 24, 2025

Uh oh!

Uh oh!

Update dependency bitsandbytes to ^0.46.0 #133

Are you sure you want to change the base?

Update dependency bitsandbytes to ^0.46.0 #133

Uh oh!

Conversation

red-hat-konflux bot commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

What's Changed

New Contributors

v0.46.0: : torch.compile() support; custom ops refactor; Linux aarch64 wheels

Highlights

Compatability Changes

What's Changed

New Contributors

What's Changed

New Contributors

Overview

What's Changed

New Contributors

Improvements:

Bug Fixes:

Dependencies:

Performance Improvements

Breaking Changes

Packaging Improvements

Deprecations

Full Changelog

Bug fixes:

New: AdEMAMix Optimizer

Improvements:

Full Changelog:

Improvements:

Improvements:

Bug Fixes

Improvements:

Bug Fixes

Internal Improvements:

Improvements and New Features:

Bug Fixes:

Backwards Compatibility

Internal and Build System Enhancements:

Contributors:

Configuration

Uh oh!

openshift-ci bot commented May 24, 2025

Uh oh!

openshift-ci bot commented May 24, 2025

Uh oh!

Uh oh!

red-hat-konflux bot commented May 24, 2025 •

edited

Loading

`v0.46.0`: : torch.compile() support; custom ops refactor; Linux aarch64 wheels