Skip to content

Commit c85733c

Browse files
authored
Merge branch 'main' into patch-2
2 parents d76b6ca + 9c63202 commit c85733c

27 files changed

+928
-248
lines changed
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: "\U0001F41B Bug Report"
2+
description: Submit a bug report to help us improve bitsandbytes
3+
body:
4+
- type: textarea
5+
id: system-info
6+
attributes:
7+
label: System Info
8+
description: Please share your relevant system information with us
9+
placeholder: platform, python version, hardware, ...
10+
validations:
11+
required: true
12+
13+
- type: textarea
14+
id: reproduction
15+
validations:
16+
required: true
17+
attributes:
18+
label: Reproduction
19+
description: |
20+
Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
21+
Please provide the simplest reproducer as possible so that we can quickly fix the issue.
22+
23+
placeholder: |
24+
Reproducer:
25+
26+
- type: textarea
27+
id: expected-behavior
28+
validations:
29+
required: true
30+
attributes:
31+
label: Expected behavior
32+
description: "A clear and concise description of what you would expect to happen."
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
name: "\U0001F680 Feature request"
2+
description: Submit a proposal/request for a new feature
3+
labels: [ "feature" ]
4+
body:
5+
- type: textarea
6+
id: feature-request
7+
validations:
8+
required: true
9+
attributes:
10+
label: Feature request
11+
description: |
12+
A clear and concise description of the feature proposal.
13+
14+
- type: textarea
15+
id: motivation
16+
validations:
17+
required: true
18+
attributes:
19+
label: Motivation
20+
description: |
21+
Please outline the motivation for the proposal. Is your feature request related to a problem?
22+
23+
- type: textarea
24+
id: contribution
25+
validations:
26+
required: true
27+
attributes:
28+
label: Your contribution
29+
description: |
30+
Is there any way that you could help, e.g. by submitting a PR?

.github/workflows/stale.yml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: Stale Bot
2+
3+
on:
4+
schedule:
5+
- cron: "0 15 * * *"
6+
7+
jobs:
8+
close_stale_issues:
9+
name: Close Stale Issues
10+
if: github.repository == 'TimDettmers/bitsandbytes'
11+
runs-on: ubuntu-latest
12+
env:
13+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
14+
steps:
15+
- uses: actions/checkout@v3
16+
17+
- name: Setup Python
18+
uses: actions/setup-python@v4
19+
with:
20+
python-version: 3.8
21+
22+
- name: Install requirements
23+
run: |
24+
pip install PyGithub
25+
- name: Close stale issues
26+
run: |
27+
python scripts/stale.py

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,3 +133,4 @@ dmypy.json
133133

134134
dependencies
135135
cuda_build
136+
.vscode/*

.style.yapf

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
[style]
2+
ALIGN_CLOSING_BRACKET_WITH_VISUAL_INDENT = True
3+
ALLOW_MULTILINE_LAMBDAS = True
4+
BLANK_LINE_BEFORE_NESTED_CLASS_OR_DEF = True
5+
COLUMN_LIMIT = 88
6+
COALESCE_BRACKETS = True
7+
SPACE_BETWEEN_ENDING_COMMA_AND_CLOSING_BRACKET = True
8+
SPACES_BEFORE_COMMENT = 2
9+
SPLIT_BEFORE_BITWISE_OPERATOR = True
10+
SPLIT_BEFORE_FIRST_ARGUMENT = True
11+
SPLIT_BEFORE_LOGICAL_OPERATOR = True
12+
SPLIT_BEFORE_NAMED_ASSIGNS = True
13+
SPLIT_COMPLEX_COMPREHENSION = True

CHANGELOG.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,3 +283,47 @@ Bug fixes:
283283
- Removed outdated get_cuda_lib_handle calls that lead to errors. #595 Thank you @ihsanturk
284284
- Fixed bug where read-permission was assumed for a file. #497
285285
- Fixed a bug where prefetchAsync lead to errors on GPUs that do not support unified memory but not prefetching (Maxwell, SM52). #470 #451 #453 #477 Thank you @jllllll and @stoperro
286+
287+
288+
### 0.41.0
289+
290+
Features:
291+
- Added precompiled CUDA 11.8 binaries to support H100 GPUs without compilation #571
292+
- CUDA SETUP now no longer looks for libcuda and libcudart and relies PyTorch CUDA libraries. To manually override this behavior see: how_to_use_nonpytorch_cuda.md. Thank you @rapsealk
293+
294+
Bug fixes:
295+
- Fixed a bug where the default type of absmax was undefined which leads to errors if the default type is different than torch.float32. # 553
296+
- Fixed a missing scipy dependency in requirements.txt. #544
297+
- Fixed a bug, where a view operation could cause an error in 8-bit layers.
298+
- Fixed a bug where CPU bitsandbytes would during the import. #593 Thank you @bilelomrani
299+
- Fixed a but where a non-existent LD_LIBRARY_PATH variable led to a failure in python -m bitsandbytes #588
300+
- Removed outdated get_cuda_lib_handle calls that lead to errors. #595 Thank you @ihsanturk
301+
- Fixed bug where read-permission was assumed for a file. #497
302+
- Fixed a bug where prefetchAsync lead to errors on GPUs that do not support unified memory but not prefetching (Maxwell, SM52). #470 #451 #453 #477 Thank you @jllllll and @stoperro
303+
304+
Documentation:
305+
- Improved documentation for GPUs that do not support 8-bit matmul. #529
306+
- Added description and pointers for the NF4 data type. #543
307+
308+
User experience:
309+
- Improved handling of default compute_dtype for Linear4bit Layers, so that compute_dtype = input_dtype if the input data type is stable enough (float32, bfloat16, but not float16).
310+
311+
Performance:
312+
- improved 4-bit inference performance for A100 GPUs. This degraded performance for A40/RTX3090 and RTX 4090 GPUs slightly.
313+
314+
### 0.41.1
315+
316+
Bug fixes:
317+
- Fixed bugs in dynamic exponent data type creation. Thank you @RossM, @KohakuBlueleaf, @ArrowM #659 #227 #262 #152
318+
319+
### 0.41.2
320+
321+
Feature:
322+
- 4-bit serialization now supported. This enables 4-bit load/store. Thank you @poedator #753
323+
324+
### 0.41.3
325+
326+
Bug fixes:
327+
- Fixed an issue where 4-bit serialization would fail for layers without double quantization #868. Thank you, @poedator
328+
- Fixed an issue where calling .to() or .cuda() on a 4-bit layer twice would result in an error #867. Thank you, @jph00
329+

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ python setup.py install
3838
```python
3939
from transformers import AutoModelForCausalLM
4040
model = AutoModelForCausalLM.from_pretrained(
41-
'decapoda-research/llama-7b-hf,
41+
'decapoda-research/llama-7b-hf',
4242
device_map='auto',
4343
load_in_8bit=True,
4444
max_memory=f'{int(torch.cuda.mem_get_info()[0]/1024**3)-2}GB')
@@ -119,7 +119,7 @@ torch.nn.Embedding(...) -> bnb.nn.StableEmbedding(...) # recommended for NLP mo
119119
```
120120

121121
Note that by default all parameter tensors with less than 4096 elements are kept at 32-bit even if you initialize those parameters with 8-bit optimizers. This is done since such small tensors do not save much memory and often contain highly variable parameters (biases) or parameters that require high precision (batch norm, layer norm). You can change this behavior like so:
122-
```
122+
```python
123123
# parameter tensors with less than 16384 values are optimized in 32-bit
124124
# it is recommended to use multiplies of 4096
125125
adam = bnb.optim.Adam8bit(model.parameters(), min_8bit_size=16384)
@@ -146,13 +146,13 @@ For upcoming features and changes and full history see [Patch Notes](CHANGELOG.m
146146
To compile from source, you need an installation of CUDA. If `nvcc` is not installed, you can install the CUDA Toolkit with nvcc through the following commands.
147147

148148
```bash
149-
wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh
149+
wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/install_cuda.sh
150150
# Syntax cuda_install CUDA_VERSION INSTALL_PREFIX EXPORT_TO_BASH
151-
# CUDA_VERSION in {110, 111, 112, 113, 114, 115, 116, 117, 118, 120, 121}
151+
# CUDA_VERSION in {110, 111, 112, 113, 114, 115, 116, 117, 118, 120, 121, 122}
152152
# EXPORT_TO_BASH in {0, 1} with 0=False and 1=True
153153

154-
# For example, the following installs CUDA 11.8 to ~/local/cuda-11.8 and exports the path to your .bashrc
155-
bash cuda install 118 ~/local 1
154+
# For example, the following installs CUDA 11.7 to ~/local/cuda-11.7 and exports the path to your .bashrc
155+
bash install_cuda.sh 117 ~/local 1
156156
```
157157

158158
To use a specific CUDA version just for a single compile run, you can set the variable `CUDA_HOME`, for example the following command compiles `libbitsandbytes_cuda117.so` using compiler flags for cuda11x with the cuda version at `~/local/cuda-11.7`:

bitsandbytes/autograd/_functions.py

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -496,15 +496,15 @@ class MatMul4Bit(torch.autograd.Function):
496496
# backward is mostly the same, but adds one extra clause (see "elif state.CxB is not None")
497497

498498
@staticmethod
499-
def forward(ctx, A, B, out=None, bias=None, state=None):
499+
def forward(ctx, A, B, out=None, bias=None, quant_state: F.QuantState = None):
500500
# default of pytorch behavior if inputs are empty
501501
ctx.is_empty = False
502502
if prod(A.shape) == 0:
503503
ctx.is_empty = True
504504
ctx.A = A
505505
ctx.B = B
506506
ctx.bias = bias
507-
B_shape = state[1]
507+
B_shape = quant_state.shape
508508
if A.shape[-1] == B_shape[0]:
509509
return torch.empty(A.shape[:-1] + B_shape[1:], dtype=A.dtype, device=A.device)
510510
else:
@@ -513,10 +513,10 @@ def forward(ctx, A, B, out=None, bias=None, state=None):
513513

514514
# 1. Dequantize
515515
# 2. MatmulnN
516-
output = torch.nn.functional.linear(A, F.dequantize_4bit(B, state).to(A.dtype).t(), bias)
516+
output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias)
517517

518518
# 3. Save state
519-
ctx.state = state
519+
ctx.state = quant_state
520520
ctx.dtype_A, ctx.dtype_B, ctx.dtype_bias = A.dtype, B.dtype, None if bias is None else bias.dtype
521521

522522
if any(ctx.needs_input_grad[:2]):
@@ -534,7 +534,6 @@ def backward(ctx, grad_output):
534534

535535
req_gradA, _, _, req_gradBias, _= ctx.needs_input_grad
536536
A, B = ctx.tensors
537-
state = ctx.state
538537

539538
grad_A, grad_B, grad_bias = None, None, None
540539

@@ -563,12 +562,11 @@ def matmul(
563562
return MatMul8bitLt.apply(A, B, out, bias, state)
564563

565564

566-
def matmul_4bit(A: tensor, B: tensor, quant_state: List, out: tensor = None, bias=None):
565+
def matmul_4bit(A: tensor, B: tensor, quant_state: F.QuantState, out: tensor = None, bias=None):
567566
assert quant_state is not None
568567
if A.numel() == A.shape[-1] and A.requires_grad == False:
569-
absmax, shape, dtype, blocksize, compressed_stats, quant_type, data_type = quant_state
570-
if A.shape[-1] % blocksize != 0:
571-
warn(f'Some matrices hidden dimension is not a multiple of {blocksize} and efficient inference kernels are not supported for these (slow). Matrix input size found: {A.shape}')
568+
if A.shape[-1] % quant_state.blocksize != 0:
569+
warn(f'Some matrices hidden dimension is not a multiple of {quant_state.blocksize} and efficient inference kernels are not supported for these (slow). Matrix input size found: {A.shape}')
572570
return MatMul4Bit.apply(A, B, out, bias, quant_state)
573571
else:
574572
out = F.gemv_4bit(A, B.t(), out, state=quant_state)

bitsandbytes/cuda_setup/env_vars.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ def to_be_ignored(env_var: str, value: str) -> bool:
88
"OLDPWD",
99
"SSH_AUTH_SOCK", # SSH stuff, therefore unrelated
1010
"SSH_TTY",
11+
"GOOGLE_VM_CONFIG_LOCK_FILE", # on GCP setups, requires elevated permissions, causing problems in Jupyter notebooks
1112
"HOME", # Linux shell default
1213
"TMUX", # Terminal Multiplexer
1314
"XDG_DATA_DIRS", # XDG: Desktop environment stuff
@@ -19,6 +20,7 @@ def to_be_ignored(env_var: str, value: str) -> bool:
1920
"PATH", # this is for finding binaries, not libraries
2021
"LESSOPEN", # related to the `less` command
2122
"LESSCLOSE",
23+
"GOOGLE_VM_CONFIG_LOCK_FILE", # Google Cloud stuff, contains root only paths
2224
"_", # current Python interpreter
2325
}
2426
return env_var in ignorable

bitsandbytes/cuda_setup/main.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -64,9 +64,10 @@ def generate_instructions(self):
6464
self.add_log_entry('CUDA SETUP: Solution 1b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_1a')
6565
self.add_log_entry('CUDA SETUP: Solution 1c): For a permanent solution add the export from 1b into your .bashrc file, located at ~/.bashrc')
6666
self.add_log_entry('CUDA SETUP: Solution 2: If no library was found in step 1a) you need to install CUDA.')
67-
self.add_log_entry('CUDA SETUP: Solution 2a): Download CUDA install script: wget https://github.com/TimDettmers/bitsandbytes/blob/main/cuda_install.sh')
67+
self.add_log_entry('CUDA SETUP: Solution 2a): Download CUDA install script: wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh')
6868
self.add_log_entry('CUDA SETUP: Solution 2b): Install desired CUDA version to desired location. The syntax is bash cuda_install.sh CUDA_VERSION PATH_TO_INSTALL_INTO.')
6969
self.add_log_entry('CUDA SETUP: Solution 2b): For example, "bash cuda_install.sh 113 ~/local/" will download CUDA 11.3 and install into the folder ~/local')
70+
7071
return
7172

7273
make_cmd = f'CUDA_VERSION={self.cuda_version_string}'
@@ -214,8 +215,11 @@ def get_cuda_runtime_lib_paths(candidate_paths: Set[Path]) -> Set[Path]:
214215
paths = set()
215216
for libname in CUDA_RUNTIME_LIBS:
216217
for path in candidate_paths:
217-
if (path / libname).is_file():
218-
paths.add(path / libname)
218+
try:
219+
if (path / libname).is_file():
220+
paths.add(path / libname)
221+
except PermissionError:
222+
pass
219223
return paths
220224

221225

@@ -361,4 +365,4 @@ def evaluate_cuda_setup():
361365
"if not has_cublaslt (CC < 7.5), then we have to choose _nocublaslt.so"
362366
binary_name = f"libbitsandbytes_cuda{cuda_version_string}_nocublaslt.so"
363367

364-
return binary_name, cudart_path, cc, cuda_version_string
368+
return binary_name, cudart_path, cc, cuda_version_string

0 commit comments

Comments
 (0)