Skip to content

Commit c75eecd

Browse files
Merge branch 'main' into int8
2 parents 521da0c + 9568735 commit c75eecd

File tree

9 files changed

+220
-86
lines changed

9 files changed

+220
-86
lines changed

.github/workflows/python-package.yml

Lines changed: 16 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -163,45 +163,36 @@ jobs:
163163
needs:
164164
- build-wheels
165165
steps:
166-
- name: Download artifacts to tmp directory
166+
- name: Download and rename artifacts
167167
uses: actions/download-artifact@v4
168168
with:
169169
path: tmp/
170170
pattern: "bdist_wheel_*"
171171
merge-multiple: true
172172
- name: Inspect tmp directory after downloading artifacts
173173
run: ls -alFR tmp/
174-
- name: Move and rename wheel files
174+
- name: Move and rename wheel files with pattern replacement
175175
run: |
176176
mkdir -p wheels/
177-
find tmp/ -type f -name '*.whl' -print0 | while IFS= read -r -d '' wheel; do
177+
# exclude macos wheels for now
178+
find tmp/ -type f -name '*.whl' ! -name '*macos*' -print0 | while IFS= read -r -d '' wheel; do
178179
wheel_filename=$(basename "$wheel")
179-
if [[ $wheel_filename == *linux*x86_64* ]]; then
180-
mv "$wheel" wheels/bnb-linux-x86_64.whl
181-
elif [[ $wheel_filename == *linux*aarch64* ]]; then
182-
mv "$wheel" wheels/bnb-linux-aarch64.whl
183-
elif [[ $wheel_filename == *macosx*x86_64* ]]; then
184-
mv "$wheel" wheels/bnb-macos-x86_64.whl
185-
elif [[ $wheel_filename == *macosx*arm64* ]]; then
186-
mv "$wheel" wheels/bnb-macos-arm64.whl
187-
elif [[ $wheel_filename == *win*amd64* ]]; then
188-
mv "$wheel" wheels/bnb-windows-x86_64.whl
189-
else
190-
echo "Unknown wheel format: $wheel_filename"
191-
exit 1
192-
fi
180+
# Remove the gith hash, e.g. `+1234567`, for a stable download link on the multi-backend pre-release
181+
cleaned_filename=$(echo "$wheel_filename" | sed -E 's/\+[0-9a-f]{7}-/-/g')
182+
mv "$wheel" "wheels/$cleaned_filename"
193183
done
194184
- name: Inspect wheels directory after renaming files
195185
run: ls -alFR wheels/
196186
- name: Create release and upload artifacts
197-
env:
198-
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
199-
GITHUB_CONTINUOUS_RELEASE_TYPE: prerelease
200-
GITHUB_CONTINUOUS_RELEASE_TAG: continuous-release_main
201-
run: |
202-
wget -q https://github.com/TheAssassin/pyuploadtool/releases/download/continuous/pyuploadtool-x86_64.AppImage
203-
chmod +x pyuploadtool-x86_64.AppImage
204-
./pyuploadtool-x86_64.AppImage --appimage-extract-and-run wheels/*.whl
187+
uses: softprops/[email protected]
188+
with:
189+
files: wheels/*.whl
190+
prerelease: true
191+
name: Latest `main` wheel
192+
tag_name: continuous-release_main
193+
make_latest: false
194+
draft: false
195+
target_commitish: ${{ github.sha }}
205196

206197
audit-wheels:
207198
needs: build-wheels

.pre-commit-config.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
repos:
22
- repo: https://github.com/astral-sh/ruff-pre-commit
3-
rev: v0.3.2
3+
rev: v0.6.9
44
hooks:
55
- id: ruff
66
args:
77
- --fix
88
- id: ruff-format
99
- repo: https://github.com/pre-commit/pre-commit-hooks
10-
rev: v4.5.0
10+
rev: v5.0.0
1111
hooks:
1212
- id: check-merge-conflict
1313
- id: check-yaml
@@ -18,6 +18,6 @@ repos:
1818
args:
1919
- --fix=lf
2020
- repo: https://github.com/crate-ci/typos
21-
rev: v1.18.2
21+
rev: v1.26.0
2222
hooks:
2323
- id: typos

README.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,15 +12,19 @@ There are ongoing efforts to support further hardware backends, i.e. Intel CPU +
1212

1313
**[https://huggingface.co/docs/bitsandbytes/main](https://huggingface.co/docs/bitsandbytes/main)**
1414

15-
## ALPHA TESTERS WANTED: `multi-backend-refactor` AMD GPU + Intel CPU/GPU specific BNB backend implementations
15+
## `bitsandbytes` multi-backend _alpha_ release is out!
1616

17-
We're in the process of a complex refactor in order to allow the support of additional hardware backends, other than CUDA, in BNB. The efforts around this are already quite far along and there's plenty of functionality already in place that is in need for users to take a hands-on approach! Mac support will likely soon also see progress. However, I recommend waiting 2 weeks until the device abstraction has further consolidated (**breaking changes upcoming**).
17+
🚀 Big news! After months of hard work and incredible community contributions, we're thrilled to announce the **bitsandbytes multi-backend _alpha_ release**! 💥
1818

19-
Currently, you still need to compile from source, after checking out the `multi-backend-refactor` branch (instructions WIP, but [the current docs on the compilation from source](https://huggingface.co/docs/bitsandbytes/main/en/installation#compile-from-source) are a good starting point; [feel free to share tips / input in this Github discussion](https://github.com/TimDettmers/bitsandbytes/discussions/1219). We'll soon enable nightly releases to make this much easier for you!
19+
Now supporting:
20+
- 🔥 **AMD GPUs** (ROCm)
21+
-**Intel CPUs** & **GPUs**
2022

21-
Please give feedback to us in [this dedicated Github Discussion space](https://github.com/TimDettmers/bitsandbytes/discussions/categories/catch-all-alpha-testing-the-multi-backend-refactor)!
23+
We’d love your early feedback! 🙏
2224

23-
We're super excited about these recent developments and grateful for any constructive input or support that you can give to help us make this a reality. BNB is a community project and we're excited for your collaboration 🤗
25+
👉 [Instructions for your `pip install` here](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend)
26+
27+
We're super excited about these recent developments and grateful for any constructive input or support that you can give to help us make this a reality (e.g. helping us with the upcoming Apple Silicon backend or reporting bugs). BNB is a community project and we're excited for your collaboration 🤗
2428

2529
## License
2630

_typos.toml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,10 @@
44
extend-ignore-re = [
55
"@Ther-nul", # valid Github user
66
]
7-
8-
[default.extend-identifiers]
7+
extend-ignore-identifiers-re = [
8+
".*arange.*",
9+
".*ARANGE.*",
10+
]
911

1012
[type.py.extend-words]
1113
"BA" = "BA" # used as a commented-out variable in tests

bitsandbytes/functional.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1860,7 +1860,7 @@ def percentile_clipping(grad: Tensor, gnorm_vec: Tensor, step: int, percentile:
18601860
gnorm_vec: torch.Tensor
18611861
Vector of gradient norms. 100 elements expected.
18621862
step: int
1863-
The current optimiation steps (number of past gradient norms).
1863+
The current optimization steps (number of past gradient norms).
18641864
18651865
"""
18661866
prev_device = pre_call(grad.device)

csrc/kernels.cu

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2703,7 +2703,7 @@ template <int THREADS, int ITEMS_PER_THREAD, int TILE_ROWS, int TILE_COLS, int T
27032703
//const int global_col = base_row; // block offset for col
27042704
if((base_col + subrow_loop_row + jrow + warp_id < outRows) && (base_row+warp_lane < rows))
27052705
{
2706-
// each row hae 32 columns and is offset by 1 to prevent bank conflict during storage into smem
2706+
// each row has 32 columns and is offset by 1 to prevent bank conflict during storage into smem
27072707
char data = smem_data[(subrow_loop_row + jrow + warp_id)*33 + warp_lane];
27082708

27092709
// each 32 columns we have new tile
@@ -2742,7 +2742,7 @@ template <int THREADS, int ITEMS_PER_THREAD, int TILE_ROWS, int TILE_COLS, int T
27422742
//const int global_col = base_row; // block offset for col
27432743
if((base_col + subrow_loop_row + jrow + warp_id < outRows) && (base_row+warp_lane < rows))
27442744
{
2745-
// each row hae 32 columns and is offset by 1 to prevent bank conflict during storage into smem
2745+
// each row has 32 columns and is offset by 1 to prevent bank conflict during storage into smem
27462746
char data = smem_data[(subrow_loop_row + jrow + warp_id)*33 + warp_lane];
27472747

27482748
// each 32 columns we have new tile
@@ -2819,7 +2819,7 @@ template <int THREADS, int ITEMS_PER_THREAD, int TILE_ROWS, int TILE_COLS, int T
28192819
//const int global_col = base_row; // block offset for col
28202820
if((base_col + subrow_loop_row + jrow + warp_id < outRows) && (base_row+warp_lane < rows))
28212821
{
2822-
// each row hae 32 columns and is offset by 1 to prevent bank conflict during storage into smem
2822+
// each row has 32 columns and is offset by 1 to prevent bank conflict during storage into smem
28232823
char data = smem_data[(subrow_loop_row + jrow + warp_id)*33 + warp_lane];
28242824

28252825
// each 32 columns we have new tile

0 commit comments

Comments
 (0)