Skip to content

Commit aa06fa6

Browse files
committed
load_diff bugfix
1 parent 964ead3 commit aa06fa6

File tree

2 files changed

+13
-13
lines changed

2 files changed

+13
-13
lines changed

README.md

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
[# Compressing Model Diffs for High-Througput Multi-Model Serving]: #
2-
31
# BitDelta: Your Fine-Tune May Only Be Worth One Bit
42

53
[[Paper](https://arxiv.org/abs/2402.10193)][[Blog](https://fasterdecoding.github.io/BitDelta/)]
@@ -12,13 +10,8 @@ BitDelta compresses the weight delta between a fine-tuned and base model LLM to
1210
</a>
1311
</div>
1412

15-
1613
The current release supports:
1714

18-
19-
20-
21-
2215
- Llama-2 and Mistral based models.
2316
- Memory efficient 16-bit + 1-bit Δ Linear in PyTorch
2417
- Triton kernel for fast inference
@@ -63,7 +56,6 @@ See [`demo/README.md`](https://github.com/FasterDecoding/BitDelta/blob/main/demo
6356

6457
[BitDelta Demo.webm](https://github.com/FasterDecoding/BitDelta/assets/51351043/b56747df-1108-42f2-ae6f-05e1c460080c)
6558

66-
6759
## Usage
6860

6961
We provide some scripts in (`./scripts`) so you can compress your own models! As an example, we will compress `lmsys/vicuna-7b-v1.5` with base model `meta-llama/Llama-2-7b-hf`.
@@ -92,7 +84,7 @@ If `--save_full_model` is specified, the compressed model will also be saved in
9284
Double check the perplexity of the compressed model:
9385

9486
```
95-
CUDA_VISIBLE_DEVICES=0 python \
87+
### Perplexity CheckCUDA_VISIBLE_DEVICES=0 python \
9688
bitdelta/eval_ppl.py \
9789
--base_model meta-llama/Llama-2-7b-hf \
9890
--dataset_name wikitext \
@@ -103,17 +95,23 @@ CUDA_VISIBLE_DEVICES=0 python \
10395
10496
```
10597

98+
### Perplexity Check
99+
100+
To replicate our other results, please use `--save_full_model` to run the model in Llama format for compatibility with eval harnesses.
101+
106102
## Citation
107103

108104
If you find BitDelta useful, please consider citing:
109105

110106
```
111107
@misc{liu2024bitdelta,
112-
title={BitDelta: Your Fine-Tune May Only Be Worth One Bit},
108+
title={BitDelta: Your Fine-Tune May Only Be Worth One Bit},
113109
author={James Liu and Guangxuan Xiao and Kai Li and Jason D. Lee and Song Han and Tri Dao and Tianle Cai},
114110
year={2024},
115111
eprint={2402.10193},
116112
archivePrefix={arXiv},
117113
primaryClass={cs.LG}
118114
}
119115
```
116+
117+
[# Compressing Model Diffs for High-Througput Multi-Model Serving]: #

bitdelta/diff.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -88,9 +88,11 @@ def load_diff(model, diff_dir):
8888
coeff = diff_dict[name + ".coeff"].to(device)
8989
mask = diff_dict[name + ".mask"].to(device)
9090

91-
setattr(module, "mask", mask)
92-
setattr(module, "coeff", coeff)
93-
# module.weight.add_((mask * coeff).to(module.weight.dtype))
91+
# setattr(module, "mask", mask)
92+
# setattr(module, "coeff", coeff)
93+
weight = (unpack(mask)*2-1) * coeff
94+
95+
module.weight.add_(weight.T.to(module.weight.dtype))
9496
elif name + ".weight" in diff_dict:
9597
module.weight = nn.Parameter(diff_dict[name + ".weight"].to(device).to(module.weight.dtype))
9698

0 commit comments

Comments
 (0)