Skip to content

Update dependency bitsandbytes to ^0.46.0 #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: konflux-poc
Choose a base branch
from

Conversation

red-hat-konflux[bot]
Copy link

@red-hat-konflux red-hat-konflux bot commented May 24, 2025

This PR contains the following updates:

Package Change Age Confidence
bitsandbytes (changelog) ^0.42.0 -> ^0.46.0 age confidence

Release Notes

bitsandbytes-foundation/bitsandbytes (bitsandbytes)

v0.46.1

Compare Source

What's Changed

New Contributors

Full Changelog: bitsandbytes-foundation/bitsandbytes@0.46.0...0.46.1

v0.46.0: : torch.compile() support; custom ops refactor; Linux aarch64 wheels

Compare Source

Highlights

  • Support for torch.compile without graph breaks for LLM.int8().
    • Compatible with PyTorch 2.4+, but PyTorch 2.6+ is recommended.
    • Experimental CPU support is included.
  • Support torch.compile without graph breaks for 4bit.
    • Compatible with PyTorch 2.4+ for fullgraph=False.
    • Requires PyTorch 2.8 nightly for fullgraph=True.
  • We are now publishing wheels for CUDA Linux aarch64 (sbsa)!
    • Targets are Turing generation and newer: sm75, sm80, sm90, and sm100.
  • PyTorch Custom Operators refactoring and integration:
    • We have refactored most of the library code to integrate better with PyTorch via the torch.library and custom ops APIs. This helps enable our torch.compile and additional hardware compatibility efforts.
    • End-users do not need to change the way they are using bitsandbytes.
  • Unit tests have been cleaned up for increased determinism and most are now device-agnostic.
    • A new nightly CI runs unit tests for CPU (Windows x86-64, Linux x86-64/aarch64) and CUDA (Linux/Windows x86-64).

Compatability Changes

  • Support for Python 3.8 is dropped.
  • Support for PyTorch < 2.2.0 is dropped.
  • CUDA 12.6 and 12.8 builds are now compatible for manylinux_2_24 (previously manylinux_2_34).
  • Many APIs that were previously marked as deprecated have now been removed.
  • New deprecations:
    • bnb.autograd.get_inverse_transform_indices()
    • bnb.autograd.undo_layout()
    • bnb.functional.create_quantile_map()
    • bnb.functional.estimate_quantiles()
    • bnb.functional.get_colrow_absmax()
    • bnb.functional.get_row_absmax()
    • bnb.functional.histogram_scatter_add_2d()

What's Changed

New Contributors

Full Changelog: bitsandbytes-foundation/bitsandbytes@0.45.4...0.46.0

v0.45.5

Compare Source

This is a minor release that affects CPU-only usage of bitsandbytes. The CPU build of the library was inadvertently omitted from the v0.45.4 wheels.

Full Changelog: bitsandbytes-foundation/bitsandbytes@0.45.4...0.45.5

v0.45.4

Compare Source

This is a minor release that affects CPU-only usage of bitsandbytes. There is one bugfix and improved system compatibility on Linux.

What's Changed

New Contributors

Full Changelog: bitsandbytes-foundation/bitsandbytes@0.45.3...0.45.4

v0.45.3

Compare Source

Overview

This is a small patch release containing a few bug fixes.

Additionally, this release contains a CUDA 12.8 build which adds the sm100 and sm120 targets for NVIDIA Blackwell GPUs.

What's Changed

New Contributors

Full Changelog: bitsandbytes-foundation/bitsandbytes@0.45.2...0.45.3

v0.45.2

Compare Source

This patch release fixes a compatibility issue with Triton 3.2 in PyTorch 2.6. When importing bitsandbytes without any GPUs visible in an environment with Triton installed, a RuntimeError may be raised:

RuntimeError: 0 active drivers ([]). There should only be one.

Full Changelog: bitsandbytes-foundation/bitsandbytes@0.45.1...0.45.2

v0.45.1

Compare Source

Improvements:
  • Compatibility for triton>=3.2.0
  • Moved package configuration to pyproject.toml
  • Build system: initial support for NVIDIA Blackwell B100 GPUs, RTX 50 Blackwell series GPUs and Jetson Thor Blackwell.
    • Note: Binaries built for these platforms are not included in this release. They will be included in future releases upon the availability of the upcoming CUDA Toolkit 12.7 and 12.8.
Bug Fixes:
  • Packaging: wheels will no longer include unit tests. (#​1478)
Dependencies:
  • Sets the minimum PyTorch version to 2.0.0.

v0.45.0

Compare Source

This is a significant release, bringing support for LLM.int8() to NVIDIA Hopper GPUs such as the H100.

As part of the compatibility enhancements, we've rebuilt much of the LLM.int8() code in order to simplify for future compatibility and maintenance. We no longer use the col32 or architecture-specific tensor layout formats while maintaining backwards compatibility. We additionally bring performance improvements targeted for inference scenarios.

Performance Improvements

This release includes broad performance improvements for a wide variety of inference scenarios. See this X thread for a detailed explanation.

Breaking Changes

🤗PEFT users wishing to merge adapters with 8-bit weights will need to upgrade to peft>=0.14.0.

Packaging Improvements
  • The size of our wheel has been reduced by ~43.5% from 122.4 MB to 69.1 MB! This results in an on-disk size decrease from ~396MB to ~224MB.
  • Binaries built with CUDA Toolkit 12.6.2 are now included in the PyPI distribution.
  • The CUDA 12.5.0 build has been updated to CUDA Toolkit 12.5.1.
Deprecations
  • A number of public API functions have been marked for deprecation and will emit FutureWarning when used. These functions will become unavailable in future releases. This should have minimal impact on most end-users.
  • The k-bit quantization features are deprecated in favor of blockwise quantization. For all optimizers, using block_wise=False is not recommended and support will be removed in a future release.
  • As part of the refactoring process, we've implemented many new 8bit operations. These operations no longer use specialized data layouts.
Full Changelog

v0.44.1

Compare Source

Bug fixes:

v0.44.0

Compare Source

New: AdEMAMix Optimizer

The AdEMAMix optimizer is a modification to AdamW which proposes tracking two EMAs to better leverage past gradients. This allows for faster convergence with less training data and improved resistance to forgetting.

We've implemented 8bit and paged variations: AdEMAMix, AdEMAMix8bit, PagedAdEMAMix, and PagedAdEMAMix8bit. These can be used with a similar API to existing optimizers.

Improvements:
  • 8-bit Optimizers: The block size for all 8-bit optimizers has been reduced from 2048 to 256 in this release. This is a change from the original implementation proposed in the paper which improves accuracy.
  • CUDA Graphs support: A fix to enable CUDA Graphs capture of kernel functions was made in #​1330. This allows for performance improvements with inference frameworks like vLLM. Thanks @​jeejeelee!
Full Changelog:

v0.43.3

Compare Source

Improvements:
  • FSDP: Enable loading prequantized weights with bf16/fp16/fp32 quant_storage
    • Background: This update, linked to Transformer PR #​32276, allows loading prequantized weights with alternative storage formats. Metadata is tracked similarly to Params4bit.__new__ post PR #​970. It supports models exported with non-default quant_storage, such as this NF4 model with BF16 storage.
    • Special thanks to @​winglian and @​matthewdouglas for enabling FSDP+QLoRA finetuning of Llama 3.1 405B on a single 8xH100 or 8xA100 node with as little as 256GB system RAM.

v0.43.2

Compare Source

This release is quite significant as the QLoRA bug fix big implications for higher seqlen and batch sizes.

For each sequence (i.e. batch size increase of one) we expect memory savings of:

  • 405B: 39GB for seqlen=1024, and 4888GB for seqlen=128,00
  • 70B: 10.1GB for seqlen=1024 and 1258GB for seqlen=128,00

This was due to activations being unnecessary for frozen parameters, yet the memory for them was still erroneously allocated due to the now fixed bug.

Improvements:
Bug Fixes

v0.43.1

Compare Source

Improvements:
  • Improved the serialization format for 8-bit weights; this change is fully backwards compatible. (#​1164, thanks to @​younesbelkada for the contributions and @​akx for the review).
  • Added CUDA 12.4 support to the Linux x86-64 build workflow, expanding the library's compatibility with the latest CUDA versions. (#​1171, kudos to @​matthewdouglas for this addition).
  • Docs enhancement: Improved the instructions for installing the library from source. (#​1149, special thanks to @​stevhliu for the enhancements).
Bug Fixes
  • Fix 4bit quantization with blocksize = 4096, where an illegal memory access was encountered. (#​1160, thanks @​matthewdouglas for fixing and @​YLGH for reporting)
Internal Improvements:

v0.43.0

Compare Source

Improvements and New Features:
Bug Fixes:
  • Addressed a race condition in kEstimateQuantiles, enhancing the reliability of quantile estimation in concurrent environments (@​pnunna93, #​1061).
  • Fixed various minor issues, including typos in code comments and documentation, to improve code clarity and prevent potential confusion (@​Brian Vaughan, #​1063).
Backwards Compatibility
  • After upgrading from v0.42 to v0.43, when using 4bit quantization, models may generate slightly different outputs (approximately up to the 2nd decimal place) due to a fix in the code. For anyone interested in the details, see this comment.
Internal and Build System Enhancements:
  • Implemented several enhancements to the internal and build systems, including adjustments to the CI workflows, portability improvements, and build artifact management. These changes contribute to a more robust and flexible development process, ensuring the library's ongoing quality and maintainability (@​rickardp, @​akx, @​wkpark, @​matthewdouglas; #​949, #​1053, #​1045, #​1037).
Contributors:

This release is made possible thanks to the many active contributors that submitted PRs and many others who contributed to discussions, reviews, and testing. Your efforts greatly enhance the library's quality and user experience. It's truly inspiring to work with such a dedicated and competent group of volunteers and professionals!

We give a special thanks to @​TimDettmers for managing to find a little bit of time for valuable consultations on critical topics, despite preparing for and touring the states applying for professor positions. We wish him the utmost success!

We also extend our gratitude to the broader community for your continued support, feedback, and engagement, which play a crucial role in driving the library's development forward.


Configuration

📅 Schedule: Branch creation - "after 5am on saturday" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

To execute skipped test pipelines write comment /ok-to-test.

This PR has been generated by MintMaker (powered by Renovate Bot).

@openshift-ci openshift-ci bot requested review from dtrifiro and tarukumar May 24, 2025 18:31
Copy link

openshift-ci bot commented May 24, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: red-hat-konflux[bot]
Once this PR has been reviewed and has the lgtm label, please assign danielezonca for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

openshift-ci bot commented May 24, 2025

Hi @red-hat-konflux[bot]. Thanks for your PR.

I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
@red-hat-konflux red-hat-konflux bot force-pushed the konflux/mintmaker/konflux-poc/bitsandbytes-0.x branch from 3686a0a to 69440dc Compare May 31, 2025 15:53
@red-hat-konflux red-hat-konflux bot changed the title fix(deps): update dependency bitsandbytes to ^0.45.0 fix(deps): update dependency bitsandbytes to ^0.46.0 May 31, 2025
@red-hat-konflux red-hat-konflux bot changed the title fix(deps): update dependency bitsandbytes to ^0.46.0 Update dependency bitsandbytes to ^0.46.0 Jun 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants