Skip to content

Commit ee8943a

Browse files
authored
Fix grammar/typos (#1520)
1 parent 53dd26f commit ee8943a

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

overview-quantization-transformers.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ We will use the following setup:
8686

8787
### Inference speed (forward pass only)
8888

89-
This benchmark measures only the prefill step, which corresponds to the foward pass during training. It was run on a single NVIDIA A100-SXM4-80GB GPU with a prompt length of 512. The model we used was `meta-llama/Llama-2-13b-hf`.
89+
This benchmark measures only the prefill step, which corresponds to the forward pass during training. It was run on a single NVIDIA A100-SXM4-80GB GPU with a prompt length of 512. The model we used was `meta-llama/Llama-2-13b-hf`.
9090

9191
with batch size = 1:
9292

@@ -113,7 +113,7 @@ The following benchmarks measure the generation speed of the model during infere
113113
#### use_cache
114114
Let's test `use_cache` to better understand the impact of caching the hidden state during the generation.
115115

116-
The benchmark was run on a A100 with a prompt length of 30 and we generated exactly 30 tokens. The model we used was `meta-llama/Llama-2-7b-hf`.
116+
The benchmark was run on an A100 with a prompt length of 30 and we generated exactly 30 tokens. The model we used was `meta-llama/Llama-2-7b-hf`.
117117

118118
with `use_cache=True`
119119

@@ -169,7 +169,7 @@ From the result, we conclude that bitsandbytes is faster than GPTQ for fine-tuni
169169

170170
### Performance degradation
171171

172-
Quantization is great for reducing memory comsumption. However, it does come with performance degradation. Let's compare the performance using the [Open-LLM leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) !
172+
Quantization is great for reducing memory consumption. However, it does come with performance degradation. Let's compare the performance using the [Open-LLM leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) !
173173

174174
with 7b model:
175175

@@ -192,7 +192,7 @@ From the results above, we conclude that there is less degradation in bigger mod
192192

193193
## Conclusion and final words
194194

195-
In this blogpost, we compared bitsandbytes and GPTQ quantization across mutliple setups. We saw that bitsandbytes is better suited for fine-tuning while GPTQ is better for generation. From this observation, one way to get better merged models would be to:
195+
In this blogpost, we compared bitsandbytes and GPTQ quantization across multiple setups. We saw that bitsandbytes is better suited for fine-tuning while GPTQ is better for generation. From this observation, one way to get better merged models would be to:
196196

197197
- (1) quantize the base model using bitsandbytes (zero-shot quantization)
198198
- (2) add and fine-tune the adapters

0 commit comments

Comments
 (0)