Skip to content

Commit 41faf4e

Browse files
authored
Merge branch 'main' into main
2 parents fea5bc7 + 095f7a5 commit 41faf4e

24 files changed

+755
-167
lines changed
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: "\U0001F41B Bug Report"
2+
description: Submit a bug report to help us improve bitsandbytes
3+
body:
4+
- type: textarea
5+
id: system-info
6+
attributes:
7+
label: System Info
8+
description: Please share your relevant system information with us
9+
placeholder: platform, python version, hardware, ...
10+
validations:
11+
required: true
12+
13+
- type: textarea
14+
id: reproduction
15+
validations:
16+
required: true
17+
attributes:
18+
label: Reproduction
19+
description: |
20+
Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
21+
Please provide the simplest reproducer as possible so that we can quickly fix the issue.
22+
23+
placeholder: |
24+
Reproducer:
25+
26+
- type: textarea
27+
id: expected-behavior
28+
validations:
29+
required: true
30+
attributes:
31+
label: Expected behavior
32+
description: "A clear and concise description of what you would expect to happen."
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
name: "\U0001F680 Feature request"
2+
description: Submit a proposal/request for a new feature
3+
labels: [ "feature" ]
4+
body:
5+
- type: textarea
6+
id: feature-request
7+
validations:
8+
required: true
9+
attributes:
10+
label: Feature request
11+
description: |
12+
A clear and concise description of the feature proposal.
13+
14+
- type: textarea
15+
id: motivation
16+
validations:
17+
required: true
18+
attributes:
19+
label: Motivation
20+
description: |
21+
Please outline the motivation for the proposal. Is your feature request related to a problem?
22+
23+
- type: textarea
24+
id: contribution
25+
validations:
26+
required: true
27+
attributes:
28+
label: Your contribution
29+
description: |
30+
Is there any way that you could help, e.g. by submitting a PR?

.github/workflows/stale.yml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: Stale Bot
2+
3+
on:
4+
schedule:
5+
- cron: "0 15 * * *"
6+
7+
jobs:
8+
close_stale_issues:
9+
name: Close Stale Issues
10+
if: github.repository == 'TimDettmers/bitsandbytes'
11+
runs-on: ubuntu-latest
12+
env:
13+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
14+
steps:
15+
- uses: actions/checkout@v3
16+
17+
- name: Setup Python
18+
uses: actions/setup-python@v4
19+
with:
20+
python-version: 3.8
21+
22+
- name: Install requirements
23+
run: |
24+
pip install PyGithub
25+
- name: Close stale issues
26+
run: |
27+
python scripts/stale.py

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,3 +133,4 @@ dmypy.json
133133

134134
dependencies
135135
cuda_build
136+
.vscode/*

.style.yapf

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
[style]
2+
ALIGN_CLOSING_BRACKET_WITH_VISUAL_INDENT = True
3+
ALLOW_MULTILINE_LAMBDAS = True
4+
BLANK_LINE_BEFORE_NESTED_CLASS_OR_DEF = True
5+
COLUMN_LIMIT = 88
6+
COALESCE_BRACKETS = True
7+
SPACE_BETWEEN_ENDING_COMMA_AND_CLOSING_BRACKET = True
8+
SPACES_BEFORE_COMMENT = 2
9+
SPLIT_BEFORE_BITWISE_OPERATOR = True
10+
SPLIT_BEFORE_FIRST_ARGUMENT = True
11+
SPLIT_BEFORE_LOGICAL_OPERATOR = True
12+
SPLIT_BEFORE_NAMED_ASSIGNS = True
13+
SPLIT_COMPLEX_COMPREHENSION = True

CHANGELOG.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -311,7 +311,19 @@ User experience:
311311
Performance:
312312
- improved 4-bit inference performance for A100 GPUs. This degraded performance for A40/RTX3090 and RTX 4090 GPUs slightly.
313313

314-
### 0.41.0
314+
### 0.41.1
315315

316316
Bug fixes:
317317
- Fixed bugs in dynamic exponent data type creation. Thank you @RossM, @KohakuBlueleaf, @ArrowM #659 #227 #262 #152
318+
319+
### 0.41.2
320+
321+
Feature:
322+
- 4-bit serialization now supported. This enables 4-bit load/store. Thank you @poedator #753
323+
324+
### 0.41.3
325+
326+
Bug fixes:
327+
- Fixed an issue where 4-bit serialization would fail for layers without double quantization #868. Thank you, @poedator
328+
- Fixed an issue where calling .to() or .cuda() on a 4-bit layer twice would result in an error #867. Thank you, @jph00
329+

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ python setup.py install
3838
```python
3939
from transformers import AutoModelForCausalLM
4040
model = AutoModelForCausalLM.from_pretrained(
41-
'decapoda-research/llama-7b-hf,
41+
'decapoda-research/llama-7b-hf',
4242
device_map='auto',
4343
load_in_8bit=True,
4444
max_memory=f'{int(torch.cuda.mem_get_info()[0]/1024**3)-2}GB')
@@ -146,13 +146,13 @@ For upcoming features and changes and full history see [Patch Notes](CHANGELOG.m
146146
To compile from source, you need an installation of CUDA. If `nvcc` is not installed, you can install the CUDA Toolkit with nvcc through the following commands.
147147

148148
```bash
149-
wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh
149+
wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/install_cuda.sh
150150
# Syntax cuda_install CUDA_VERSION INSTALL_PREFIX EXPORT_TO_BASH
151-
# CUDA_VERSION in {110, 111, 112, 113, 114, 115, 116, 117, 118, 120, 121}
151+
# CUDA_VERSION in {110, 111, 112, 113, 114, 115, 116, 117, 118, 120, 121, 122}
152152
# EXPORT_TO_BASH in {0, 1} with 0=False and 1=True
153153

154-
# For example, the following installs CUDA 11.8 to ~/local/cuda-11.8 and exports the path to your .bashrc
155-
bash cuda install 118 ~/local 1
154+
# For example, the following installs CUDA 11.7 to ~/local/cuda-11.7 and exports the path to your .bashrc
155+
bash install_cuda.sh 117 ~/local 1
156156
```
157157

158158
To use a specific CUDA version just for a single compile run, you can set the variable `CUDA_HOME`, for example the following command compiles `libbitsandbytes_cuda117.so` using compiler flags for cuda11x with the cuda version at `~/local/cuda-11.7`:

bitsandbytes/autograd/_functions.py

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -496,15 +496,15 @@ class MatMul4Bit(torch.autograd.Function):
496496
# backward is mostly the same, but adds one extra clause (see "elif state.CxB is not None")
497497

498498
@staticmethod
499-
def forward(ctx, A, B, out=None, bias=None, state=None):
499+
def forward(ctx, A, B, out=None, bias=None, quant_state: F.QuantState = None):
500500
# default of pytorch behavior if inputs are empty
501501
ctx.is_empty = False
502502
if prod(A.shape) == 0:
503503
ctx.is_empty = True
504504
ctx.A = A
505505
ctx.B = B
506506
ctx.bias = bias
507-
B_shape = state[1]
507+
B_shape = quant_state.shape
508508
if A.shape[-1] == B_shape[0]:
509509
return torch.empty(A.shape[:-1] + B_shape[1:], dtype=A.dtype, device=A.device)
510510
else:
@@ -513,10 +513,10 @@ def forward(ctx, A, B, out=None, bias=None, state=None):
513513

514514
# 1. Dequantize
515515
# 2. MatmulnN
516-
output = torch.nn.functional.linear(A, F.dequantize_4bit(B, state).to(A.dtype).t(), bias)
516+
output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias)
517517

518518
# 3. Save state
519-
ctx.state = state
519+
ctx.state = quant_state
520520
ctx.dtype_A, ctx.dtype_B, ctx.dtype_bias = A.dtype, B.dtype, None if bias is None else bias.dtype
521521

522522
if any(ctx.needs_input_grad[:2]):
@@ -534,7 +534,6 @@ def backward(ctx, grad_output):
534534

535535
req_gradA, _, _, req_gradBias, _= ctx.needs_input_grad
536536
A, B = ctx.tensors
537-
state = ctx.state
538537

539538
grad_A, grad_B, grad_bias = None, None, None
540539

@@ -563,12 +562,11 @@ def matmul(
563562
return MatMul8bitLt.apply(A, B, out, bias, state)
564563

565564

566-
def matmul_4bit(A: tensor, B: tensor, quant_state: List, out: tensor = None, bias=None):
565+
def matmul_4bit(A: tensor, B: tensor, quant_state: F.QuantState, out: tensor = None, bias=None):
567566
assert quant_state is not None
568567
if A.numel() == A.shape[-1] and A.requires_grad == False:
569-
absmax, shape, dtype, blocksize, compressed_stats, quant_type, data_type = quant_state
570-
if A.shape[-1] % blocksize != 0:
571-
warn(f'Some matrices hidden dimension is not a multiple of {blocksize} and efficient inference kernels are not supported for these (slow). Matrix input size found: {A.shape}')
568+
if A.shape[-1] % quant_state.blocksize != 0:
569+
warn(f'Some matrices hidden dimension is not a multiple of {quant_state.blocksize} and efficient inference kernels are not supported for these (slow). Matrix input size found: {A.shape}')
572570
return MatMul4Bit.apply(A, B, out, bias, quant_state)
573571
else:
574572
out = F.gemv_4bit(A, B.t(), out, state=quant_state)

bitsandbytes/cuda_setup/env_vars.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ def to_be_ignored(env_var: str, value: str) -> bool:
88
"OLDPWD",
99
"SSH_AUTH_SOCK", # SSH stuff, therefore unrelated
1010
"SSH_TTY",
11+
"GOOGLE_VM_CONFIG_LOCK_FILE", # on GCP setups, requires elevated permissions, causing problems in Jupyter notebooks
1112
"HOME", # Linux shell default
1213
"TMUX", # Terminal Multiplexer
1314
"XDG_DATA_DIRS", # XDG: Desktop environment stuff

bitsandbytes/cuda_setup/main.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ def generate_instructions(self):
6767
self.add_log_entry('CUDA SETUP: Solution 2a): Download CUDA install script: wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh')
6868
self.add_log_entry('CUDA SETUP: Solution 2b): Install desired CUDA version to desired location. The syntax is bash cuda_install.sh CUDA_VERSION PATH_TO_INSTALL_INTO.')
6969
self.add_log_entry('CUDA SETUP: Solution 2b): For example, "bash cuda_install.sh 113 ~/local/" will download CUDA 11.3 and install into the folder ~/local')
70+
7071
return
7172

7273
make_cmd = f'CUDA_VERSION={self.cuda_version_string}'

0 commit comments

Comments
 (0)