Skip to content

Commit 3348585

Browse files
authored
Update README.md: Add Huggingface repo for 7B and 13B quantization (#142)
* Update README.md: Add Huggingface repo for 7B and 13B quantization * Update requirements.txt to pin PEFT and BNB version Reason - For BNB: tloen/alpaca-lora#350 For PEFT: huggingface/peft@c21afbe#diff-b3b90f453dea37bf90203fd395e9dedc21b21c9a38464c6b1572368c049ef8b2L116-L128
1 parent aca32f6 commit 3348585

File tree

2 files changed

+4
-3
lines changed

2 files changed

+4
-3
lines changed

large_language_models/alpaca-qlora/requirements.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,6 @@ loralib
33
sentencepiece
44
git+https://github.com/huggingface/transformers.git
55
accelerate
6-
bitsandbytes
7-
git+https://github.com/huggingface/peft.git
8-
gradio
6+
bitsandbytes==0.37.2
7+
peft==0.2.0
8+
gradio

large_language_models/llama/quantization/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
### Update News
2+
- LLaMA-7B and 13B quantization are also available [here](https://huggingface.co/cnbeining/sparsebit-llama-quantization-7b-13b).
23
- We have updated a llama-13b checkpoint with 3-bit 128-group quantization [here](https://drive.google.com/file/d/1LjZmOU8tr2VT6HdAP_WbuX8cqmrs5DrR). For config_cache and tokenizer_cache, the files can be found [here in huggingface](https://huggingface.co/decapoda-research/llama-13b-hf).
34
- We implemented a cuda kernel for groupsize=128(int3/int4) & groupsize=64(int2). In our experiments, setting groupsize=128(int3) can make all quantization models achieve a significant increase in ppl compared to groupsize=-1. All results are updated in Table A.
45
- We add `--single_device_mode` to support all quant models run in a single GPU(i.e. 2080ti). Please refer to the inference section for details.

0 commit comments

Comments
 (0)