You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+44Lines changed: 44 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -283,3 +283,47 @@ Bug fixes:
283
283
- Removed outdated get_cuda_lib_handle calls that lead to errors. #595 Thank you @ihsanturk
284
284
- Fixed bug where read-permission was assumed for a file. #497
285
285
- Fixed a bug where prefetchAsync lead to errors on GPUs that do not support unified memory but not prefetching (Maxwell, SM52). #470#451#453#477 Thank you @jllllll and @stoperro
286
+
287
+
288
+
### 0.41.0
289
+
290
+
Features:
291
+
- Added precompiled CUDA 11.8 binaries to support H100 GPUs without compilation #571
292
+
- CUDA SETUP now no longer looks for libcuda and libcudart and relies PyTorch CUDA libraries. To manually override this behavior see: how_to_use_nonpytorch_cuda.md. Thank you @rapsealk
293
+
294
+
Bug fixes:
295
+
- Fixed a bug where the default type of absmax was undefined which leads to errors if the default type is different than torch.float32. # 553
296
+
- Fixed a missing scipy dependency in requirements.txt. #544
297
+
- Fixed a bug, where a view operation could cause an error in 8-bit layers.
298
+
- Fixed a bug where CPU bitsandbytes would during the import. #593 Thank you @bilelomrani
299
+
- Fixed a but where a non-existent LD_LIBRARY_PATH variable led to a failure in python -m bitsandbytes #588
300
+
- Removed outdated get_cuda_lib_handle calls that lead to errors. #595 Thank you @ihsanturk
301
+
- Fixed bug where read-permission was assumed for a file. #497
302
+
- Fixed a bug where prefetchAsync lead to errors on GPUs that do not support unified memory but not prefetching (Maxwell, SM52). #470#451#453#477 Thank you @jllllll and @stoperro
303
+
304
+
Documentation:
305
+
- Improved documentation for GPUs that do not support 8-bit matmul. #529
306
+
- Added description and pointers for the NF4 data type. #543
307
+
308
+
User experience:
309
+
- Improved handling of default compute_dtype for Linear4bit Layers, so that compute_dtype = input_dtype if the input data type is stable enough (float32, bfloat16, but not float16).
310
+
311
+
Performance:
312
+
- improved 4-bit inference performance for A100 GPUs. This degraded performance for A40/RTX3090 and RTX 4090 GPUs slightly.
313
+
314
+
### 0.41.1
315
+
316
+
Bug fixes:
317
+
- Fixed bugs in dynamic exponent data type creation. Thank you @RossM, @KohakuBlueleaf, @ArrowM#659#227#262#152
318
+
319
+
### 0.41.2
320
+
321
+
Feature:
322
+
- 4-bit serialization now supported. This enables 4-bit load/store. Thank you @poedator#753
323
+
324
+
### 0.41.3
325
+
326
+
Bug fixes:
327
+
- Fixed an issue where 4-bit serialization would fail for layers without double quantization #868. Thank you, @poedator
328
+
- Fixed an issue where calling .to() or .cuda() on a 4-bit layer twice would result in an error #867. Thank you, @jph00
@@ -119,7 +119,7 @@ torch.nn.Embedding(...) -> bnb.nn.StableEmbedding(...) # recommended for NLP mo
119
119
```
120
120
121
121
Note that by default all parameter tensors with less than 4096 elements are kept at 32-bit even if you initialize those parameters with 8-bit optimizers. This is done since such small tensors do not save much memory and often contain highly variable parameters (biases) or parameters that require high precision (batch norm, layer norm). You can change this behavior like so:
122
-
```
122
+
```python
123
123
# parameter tensors with less than 16384 values are optimized in 32-bit
124
124
# it is recommended to use multiplies of 4096
125
125
adam = bnb.optim.Adam8bit(model.parameters(), min_8bit_size=16384)
@@ -146,13 +146,13 @@ For upcoming features and changes and full history see [Patch Notes](CHANGELOG.m
146
146
To compile from source, you need an installation of CUDA. If `nvcc` is not installed, you can install the CUDA Toolkit with nvcc through the following commands.
# EXPORT_TO_BASH in {0, 1} with 0=False and 1=True
153
153
154
-
# For example, the following installs CUDA 11.8 to ~/local/cuda-11.8 and exports the path to your .bashrc
155
-
bash cuda install 118~/local 1
154
+
# For example, the following installs CUDA 11.7 to ~/local/cuda-11.7 and exports the path to your .bashrc
155
+
bash install_cuda.sh 117~/local 1
156
156
```
157
157
158
158
To use a specific CUDA version just for a single compile run, you can set the variable `CUDA_HOME`, for example the following command compiles `libbitsandbytes_cuda117.so` using compiler flags for cuda11x with the cuda version at `~/local/cuda-11.7`:
warn(f'Some matrices hidden dimension is not a multiple of {blocksize} and efficient inference kernels are not supported for these (slow). Matrix input size found: {A.shape}')
568
+
ifA.shape[-1] %quant_state.blocksize!=0:
569
+
warn(f'Some matrices hidden dimension is not a multiple of {quant_state.blocksize} and efficient inference kernels are not supported for these (slow). Matrix input size found: {A.shape}')
self.add_log_entry('CUDA SETUP: Solution 1b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_1a')
65
65
self.add_log_entry('CUDA SETUP: Solution 1c): For a permanent solution add the export from 1b into your .bashrc file, located at ~/.bashrc')
66
66
self.add_log_entry('CUDA SETUP: Solution 2: If no library was found in step 1a) you need to install CUDA.')
67
-
self.add_log_entry('CUDA SETUP: Solution 2a): Download CUDA install script: wget https://github.com/TimDettmers/bitsandbytes/blob/main/cuda_install.sh')
67
+
self.add_log_entry('CUDA SETUP: Solution 2a): Download CUDA install script: wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh')
68
68
self.add_log_entry('CUDA SETUP: Solution 2b): Install desired CUDA version to desired location. The syntax is bash cuda_install.sh CUDA_VERSION PATH_TO_INSTALL_INTO.')
69
69
self.add_log_entry('CUDA SETUP: Solution 2b): For example, "bash cuda_install.sh 113 ~/local/" will download CUDA 11.3 and install into the folder ~/local')
0 commit comments