Skip to content

Commit 675baa7

Browse files
committed
Merge remote-tracking branch 'origin/main' into merge
2 parents f64cfe6 + 9e7cdc9 commit 675baa7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+2938
-350
lines changed

CHANGELOG.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,3 +201,30 @@ Features:
201201

202202
Improvements:
203203
- Improved logging for the CUDA detection mechanism.
204+
205+
### 0.38.0
206+
207+
#### 8-bit Lion, Load/Store 8-bit Models directly from/to HF Hub
208+
209+
Features:
210+
- Support for 32 and 8-bit Lion has been added. Thank you @lucidrains
211+
- Support for serialization of Linear8bitLt layers (LLM.int8()). This allows to store and load 8-bit weights directly from the HuggingFace Hub. Thank you @myrab
212+
- New bug report features `python -m bitsandbytes` now gives extensive debugging details to debug CUDA setup failures.
213+
214+
Bug fixes:
215+
- Fixed a bug where some bitsandbytes methods failed in a model-parallel setup on multiple GPUs. Thank you @tonylins
216+
- Fixed a bug where cudart.so libraries could not be found in newer PyTorch releases.
217+
218+
Improvements:
219+
- Improved the CUDA Setup procedure by doing a more extensive search for CUDA libraries
220+
221+
Deprecated:
222+
- Devices with compute capability 3.0 (GTX 700s, K10) and 3.2 (Tegra K1, Jetson TK1) are now deprecated and support will be removed in 0.39.0.
223+
- Support for CUDA 10.0 and 10.2 will be removed in bitsandbytes 0.39.0
224+
225+
226+
### 0.38.1
227+
228+
Features:
229+
- Added Int8 SwitchBack layers
230+
- Added Fake FP8 layers for research purposes (available under `bnb.research.nn. ...`)

README.md

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,41 @@ Resources:
1111

1212
## TL;DR
1313
**Requirements**
14-
Python >=3.8. Linux distribution (Ubuntu, MacOS, etc.) + CUDA > 10.0. LLM.int8() requires Turing or Ampere GPUs.
14+
Python >=3.8. Linux distribution (Ubuntu, MacOS, etc.) + CUDA > 10.0.
15+
16+
(Deprecated: CUDA 10.0 is deprecated and only CUDA >= 11.0) will be supported with release 0.39.0)
1517

1618
**Installation**:
19+
1720
``pip install bitsandbytes``
1821

22+
In some cases it can happen that you need to compile from source. If this happens please consider submitting a bug report with `python -m bitsandbytes` information. What now follows is some short instructions which might work out of the box if `nvcc` is installed. If these do not work see further below.
23+
24+
Compilation quickstart:
25+
```bash
26+
git clone https://github.com/timdettmers/bitsandbytes.git
27+
cd bitsandbytes
28+
29+
# CUDA_VERSIONS in {110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 120}
30+
# make argument in {cuda110, cuda11x, cuda12x}
31+
# if you do not know what CUDA you have, try looking at the output of: python -m bitsandbytes
32+
CUDA_VERSION=117 make cuda11x
33+
python setup.py install
34+
```
35+
36+
**Using Int8 inference with HuggingFace Transformers**
37+
38+
```python
39+
from transformers import AutoModelForCausalLM
40+
model = AutoModelForCausalLM.from_pretrained(
41+
'decapoda-research/llama-7b-hf,
42+
device_map='auto',
43+
load_in_8bit=True,
44+
max_memory=f'{int(torch.cuda.mem_get_info()[0]/1024**3)-2}GB')
45+
```
46+
47+
A more detailed example, can be found in [examples/int8_inference_huggingface.py](examples/int8_inference_huggingface.py).
48+
1949
**Using 8-bit optimizer**:
2050
1. Comment out optimizer: ``#torch.optim.Adam(....)``
2151
2. Add 8-bit optimizer of your choice ``bnb.optim.Adam8bit(....)`` (arguments stay the same)
@@ -40,7 +70,7 @@ out = linear(x.to(torch.float16))
4070
## Features
4171
- 8-bit Matrix multiplication with mixed precision decomposition
4272
- LLM.int8() inference
43-
- 8-bit Optimizers: Adam, AdamW, RMSProp, LARS, LAMB (saves 75% memory)
73+
- 8-bit Optimizers: Adam, AdamW, RMSProp, LARS, LAMB, Lion (saves 75% memory)
4474
- Stable Embedding Layer: Improved stability through better initialization, and normalization
4575
- 8-bit quantization: Quantile, Linear, and Dynamic quantization
4676
- Fast quantile estimation: Up to 100x faster than other algorithms
@@ -113,8 +143,23 @@ For upcoming features and changes and full history see [Patch Notes](CHANGELOG.m
113143
2. __fatbinwrap_.. [Solution](errors_and_solutions.md#fatbinwrap_)
114144

115145
## Compile from source
146+
To compile from source, you need an installation of CUDA. If `nvcc` is not installed, you can install the CUDA Toolkit with nvcc through the following commands.
147+
148+
```bash
149+
wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh
150+
# Syntax cuda_install CUDA_VERSION INSTALL_PREFIX EXPORT_TO_BASH
151+
# CUDA_VERSION in {110, 111, 112, 113, 114, 115, 116, 117, 118, 120, 121}
152+
# EXPORT_TO_BASH in {0, 1} with 0=False and 1=True
153+
154+
# For example, the following installs CUDA 11.8 to ~/local/cuda-11.8 and exports the path to your .bashrc
155+
bash cuda install 118 ~/local 1
156+
```
157+
158+
To use a specific CUDA version just for a single compile run, you can set the variable `CUDA_HOME`, for example the following command compiles `libbitsandbytes_cuda117.so` using compiler flags for cuda11x with the cuda version at `~/local/cuda-11.7`:
159+
160+
``CUDA_HOME=~/local/cuda-11.7 CUDA_VERSION=117 make cuda11x``
116161

117-
To compile from source, please follow the [compile_from_source.md](compile_from_source.md) instructions.
162+
For more detailed instruction, please follow the [compile_from_source.md](compile_from_source.md) instructions.
118163

119164
## License
120165

benchmarking/switchback/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Steps:
2+
3+
1. Run `python speed_benchmark/speed_benchmark.py` which times operations and writes their time to `speed_benchmark/info_a100_py2.jsonl` (change the name of the jsonl to a different name for your profiling).
4+
2. Run `python speed_benchmark/make_plot_with_jsonl.py`, which produces the `speed_benchmark/plot_with_info.pdf`. Again make sure you change the jsonl which is being processed.

0 commit comments

Comments
 (0)