Skip to content

Commit 41d927c

Browse files
committed
update docs
Signed-off-by: yiliu30 <[email protected]>
1 parent ee443d0 commit 41d927c

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,8 @@ model = GenericLoraKbitModel('tiiuae/falcon-7b')
7979
# Run the fine-tuning
8080
model.finetune(dataset)
8181
```
82-
4. __CPU inference__ - Now you can use just your CPU for inference of any LLM. For the CPU-only devices, we integrated [Itrex](https://github.com/intel/intel-extension-for-transformers) to conserve memory by compressing the model with [weight-only quantization algorithms](https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md) and accelerate the inference by leveraging its highly optimized kernel on Intel platforms.
82+
83+
4. __CPU inference__ - The CPU, including notebook CPUs, is now fully equipped to handle LLM inference. We integrated [Itrex](https://github.com/intel/intel-extension-for-transformers) to conserve memory by compressing the model with [weight-only quantization algorithms](https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md) and accelerate the inference by leveraging its highly optimized kernel on Intel platforms.
8384

8485
5. __Batch integration__ - By tweaking the 'batch_size' in the .generate() and .evaluate() functions, you can expedite results. Using a 'batch_size' greater than 1 typically enhances processing efficiency.
8586
```python

0 commit comments

Comments
 (0)