Skip to content

Commit e08713e

Browse files
Small fixes to non_cuda_backends.mdx
1 parent 474f5c7 commit e08713e

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed

docs/source/non_cuda_backends.mdx

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,11 @@ Thank you for your support!
2727

2828
### Intel
2929

30-
The following performance data is collected from Intel 4th Gen Xeon (SPR) platform. The tables show speed-up and memory compared with different data types of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
31-
You can run `benchmarking/generation_benchmark.py` to reproduce the following model memory and inference results, please note that you need to binding cores if you are using CPU to benchmark. For example, run `numactl -C 0-55 -m 0 python generation_benchmark.py --quant_type nf4` on Intel 4th Gen Xeon with single socket.
32-
The finetune results are selected from [peft](https://github.com/huggingface/peft/blob/main/examples/olora_finetuning/olora_finetuning.py)
30+
The below performance data is collected from the Intel 4th Gen Xeon (SPR) platform. The tables show speed-up and memory compared with different data types of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
31+
32+
You may run `benchmarking/generation_benchmark.py` to reproduce the below model memory and inference results. Please note that you need to bind cores if you are using the CPU to benchmark. For example, run `numactl -C 0-55 -m 0 python generation_benchmark.py --quant_type nf4` on Intel 4th Gen Xeon with single socket.
33+
34+
The finetune results are selected from [peft](https://github.com/huggingface/peft/blob/main/examples/olora_finetuning/olora_finetuning.py).
3335

3436
#### Model memory (CPU)
3537
| Data Type | BF16 | INT8 | NF4 | FP4 |

0 commit comments

Comments
 (0)