You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Support for 32 and 8-bit Lion has been added. Thank you @lucidrains
211
+
- Support for serialization of Linear8bitLt layers (LLM.int8()). This allows to store and load 8-bit weights directly from the HuggingFace Hub. Thank you @myrab
212
+
- New bug report features `python -m bitsandbytes` now gives extensive debugging details to debug CUDA setup failures.
213
+
214
+
Bug fixes:
215
+
- Fixed a bug where some bitsandbytes methods failed in a model-parallel setup on multiple GPUs. Thank you @tonylins
216
+
- Fixed a bug where cudart.so libraries could not be found in newer PyTorch releases.
217
+
218
+
Improvements:
219
+
- Improved the CUDA Setup procedure by doing a more extensive search for CUDA libraries
220
+
221
+
Deprecated:
222
+
- Devices with compute capability 3.0 (GTX 700s, K10) and 3.2 (Tegra K1, Jetson TK1) are now deprecated and support will be removed in 0.39.0.
223
+
- Support for CUDA 10.0 and 10.2 will be removed in bitsandbytes 0.39.0
224
+
225
+
226
+
### 0.38.1
227
+
228
+
Features:
229
+
- Added Int8 SwitchBack layers
230
+
- Added Fake FP8 layers for research purposes (available under `bnb.research.nn. ...`)
Copy file name to clipboardExpand all lines: README.md
+48-3Lines changed: 48 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,11 +11,41 @@ Resources:
11
11
12
12
## TL;DR
13
13
**Requirements**
14
-
Python >=3.8. Linux distribution (Ubuntu, MacOS, etc.) + CUDA > 10.0. LLM.int8() requires Turing or Ampere GPUs.
14
+
Python >=3.8. Linux distribution (Ubuntu, MacOS, etc.) + CUDA > 10.0.
15
+
16
+
(Deprecated: CUDA 10.0 is deprecated and only CUDA >= 11.0) will be supported with release 0.39.0)
15
17
16
18
**Installation**:
19
+
17
20
``pip install bitsandbytes``
18
21
22
+
In some cases it can happen that you need to compile from source. If this happens please consider submitting a bug report with `python -m bitsandbytes` information. What now follows is some short instructions which might work out of the box if `nvcc` is installed. If these do not work see further below.
To compile from source, you need an installation of CUDA. If `nvcc` is not installed, you can install the CUDA Toolkit with nvcc through the following commands.
# EXPORT_TO_BASH in {0, 1} with 0=False and 1=True
153
+
154
+
# For example, the following installs CUDA 11.8 to ~/local/cuda-11.8 and exports the path to your .bashrc
155
+
bash cuda install 118 ~/local 1
156
+
```
157
+
158
+
To use a specific CUDA version just for a single compile run, you can set the variable `CUDA_HOME`, for example the following command compiles `libbitsandbytes_cuda117.so` using compiler flags for cuda11x with the cuda version at `~/local/cuda-11.7`:
159
+
160
+
``CUDA_HOME=~/local/cuda-11.7 CUDA_VERSION=117 make cuda11x``
116
161
117
-
To compile from source, please follow the [compile_from_source.md](compile_from_source.md) instructions.
162
+
For more detailed instruction, please follow the [compile_from_source.md](compile_from_source.md) instructions.
1. Run `python speed_benchmark/speed_benchmark.py` which times operations and writes their time to `speed_benchmark/info_a100_py2.jsonl` (change the name of the jsonl to a different name for your profiling).
4
+
2. Run `python speed_benchmark/make_plot_with_jsonl.py`, which produces the `speed_benchmark/plot_with_info.pdf`. Again make sure you change the jsonl which is being processed.
0 commit comments