update docs

yiliu30 · yiliu30 · commit 41d927c18d14 · 2023-11-08T10:40:26.000+08:00
Signed-off-by: yiliu30 &lt;yi4.liu@intel.com&gt;
diff --git a/README.md b/README.md
@@ -79,7 +79,8 @@ model = GenericLoraKbitModel('tiiuae/falcon-7b')
 # Run the fine-tuning
 model.finetune(dataset)
 ```
-4. __CPU inference__ - Now you can use just your CPU for inference of any LLM. For the CPU-only devices, we integrated [Itrex](https://github.com/intel/intel-extension-for-transformers) to conserve memory by compressing the model with [weight-only quantization algorithms](https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md) and accelerate the inference by leveraging its highly optimized kernel on Intel platforms.
+
+4. __CPU inference__ - The CPU, including notebook CPUs, is now fully equipped to handle LLM inference. We integrated [Itrex](https://github.com/intel/intel-extension-for-transformers) to conserve memory by compressing the model with [weight-only quantization algorithms](https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md) and accelerate the inference by leveraging its highly optimized kernel on Intel platforms.
 
 5. __Batch integration__ - By tweaking the 'batch_size' in the .generate() and .evaluate() functions, you can expedite results. Using a 'batch_size' greater than 1 typically enhances processing efficiency.
 ```python