Skip to content

Commit 2f59bc7

Browse files
authored
update llm readme and scripts (#4656)
* update llm table in readme * update llm inference readme * update llm finetune/inference readme * update llm readme & docs/tutorials/llm.rst * remove installation.rst & blog publication * update vision and audio to 0.18.1 and 2.3.1 * remove dependency_version.yml * update the installation link * update llm inference README for accuracy & phi3-mini beam * add token-latency for phi-3 * change client gpu to MTL-H * remove comments in the script * use specific commit for itrex * add wandb * remove useless scripts * set inc to v3.0 * update llm dependencies version * update torch-ccl tag * update link to release rather than xpu-main
1 parent 7da1676 commit 2f59bc7

25 files changed

+242
-1005
lines changed

README.md

Lines changed: 45 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -21,19 +21,52 @@ The extension can be loaded as a Python module for Python programs or linked as
2121

2222
## Large Language Models (LLMs) Optimization
2323

24-
In the current technological landscape, Generative AI (GenAI) workloads and models have gained widespread attention and popularity. Large Language Models (LLMs) have emerged as the dominant models driving these GenAI applications. Starting from 2.1.0, specific optimizations for certain LLM models are introduced in the Intel® Extension for PyTorch\*. Check [LLM optimizations CPU](./examples/cpu/inference/python/llm) and [LLM optimizations GPU](./examples/gpu/inference/python/llm) for details.
24+
In the current technological landscape, Generative AI (GenAI) workloads and models have gained widespread attention and popularity. Large Language Models (LLMs) have emerged as the dominant models driving these GenAI applications. Starting from 2.1.0, specific optimizations for certain LLM models are introduced in the Intel® Extension for PyTorch\*. Check [LLM optimizations CPU](./examples/cpu/inference/python/llm) and [LLM optimizations GPU](./examples/gpu/llm) for details.
2525

2626
### Optimized Model List
2727

28-
| MODEL FAMILY | Verified < MODEL ID > (Huggingface hub)| FP16 | Weight only quantization INT4 | Optimized on Intel® Data Center GPU Max Series (1550/1100) | Optimized on Intel® Arc™ A-Series Graphics (A770) |
28+
#### LLM Inference
29+
30+
| MODEL FAMILY | Verified < MODEL ID > (Huggingface hub)| FP16 | Weight only quantization INT4 | Optimized on Intel® Data Center GPU Max Series (1550/1100) | Intel® Core™ Ultra Processors with Intel® Arc™ Graphics |
2931
|---|:---:|:---:|:---:|:---:|:---:|
30-
|Llama 2| "meta-llama/Llama-2-7b-hf", "meta-llama/Llama-2-13b-hf", "meta-llama/Llama-2-70b-hf" |||||
31-
|GPT-J| "EleutherAI/gpt-j-6b" |||||
32-
|Qwen|"Qwen/Qwen-7B"|||||
33-
|OPT|"facebook/opt-6.7b", "facebook/opt-30b"|||||
34-
|Bloom|"bigscience/bloom-7b1", "bigscience/bloom"|||||
35-
|ChatGLM3-6B|"THUDM/chatglm3-6b"|||||
36-
|Baichuan2-13B|"baichuan-inc/Baichuan2-13B-Chat"|||||
32+
|Llama 2| "meta-llama/Llama-2-7b-hf", "meta-llama/Llama-2-13b-hf", "meta-llama/Llama-2-70b-hf" |🟩| 🟩|🟩|🟩|
33+
|Llama 3| "meta-llama/Meta-Llama-3-8B", "meta-llama/Meta-Llama-3-70B" |🟩| 🟩|🟩|🟩|
34+
|Phi-3 mini| "microsoft/Phi-3-mini-128k-instruct" |🟩| 🟩|🟩|🟩|
35+
|GPT-J| "EleutherAI/gpt-j-6b" | 🟩 | 🟩 |🟩 | 🟩|
36+
|Qwen|"Qwen/Qwen-7B"|🟩 | 🟩 |🟩 | 🟩|
37+
|OPT|"facebook/opt-6.7b", "facebook/opt-30b"| 🟩 | 🟥 |🟩 | 🟥 |
38+
|Bloom|"bigscience/bloom-7b1", "bigscience/bloom"| 🟩 | 🟥 |🟩 | 🟥 |
39+
|ChatGLM3-6B|"THUDM/chatglm3-6b"| 🟩 | 🟥 |🟩 | 🟥 |
40+
|Baichuan2-13B|"baichuan-inc/Baichuan2-13B-Chat"| 🟩 | 🟥 |🟩 | 🟥 |
41+
42+
| Benchmark mode | FP16 | Weight only quantization INT4 |
43+
|---|:---:|:---:|
44+
|Single instance | 🟩 | 🟩 |
45+
| Distributed (autotp) | 🟩 | 🟥 |
46+
47+
#### LLM fine-tuning
48+
49+
**Note**:
50+
Intel® Data Center Max 1550 GPU: support all the models in the model list above. Intel® Core™ Ultra Processors with Intel® Arc™ Graphics: support Llama 2 7B, Llama 3 8B and Phi-3-Mini 3.8B.
51+
52+
| MODEL FAMILY | Verified < MODEL ID > (Hugging Face hub)| Mixed Precision (BF16+FP32) | Full fine-tuning | LoRA | Intel® Data Center Max 1550 GPU | Intel® Core™ Ultra Processors with Intel® Arc™ Graphics |
53+
|---|:---:|:---:|:---:|:---:|:---:|:---:|
54+
|Llama 2 7B| "meta-llama/Llama-2-7b-hf" | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 |
55+
|Llama 2 70B| "meta-llama/Llama-2-70b-hf" | 🟩 | 🟥 |🟩 | 🟩 | 🟥 |
56+
|Llama 3 8B| "meta-llama/Meta-Llama-3-8B" | 🟩 | 🟩 |🟩 | 🟩 | 🟩 |
57+
|Qwen 7B|"Qwen/Qwen-7B"| 🟩 | 🟩 |🟩 | 🟩| 🟥 |
58+
|Phi-3-mini 3.8B|"Phi-3-mini-4k-instruct"| 🟩 | 🟩 |🟩 | 🟥 | 🟩 |
59+
60+
61+
62+
| Benchmark mode | Full fine-tuning | LoRA |
63+
|---|:---:|:---:|
64+
|Single-GPU | 🟥 | 🟩 |
65+
|Multi-GPU (FSDP) | 🟩 | 🟩 |
66+
67+
- 🟩 signifies that it is supported.
68+
69+
- 🟥 signifies that it is not supported yet.
3770

3871

3972
## Installation
@@ -60,10 +93,9 @@ Compilation instruction of the latest CPU code base `main` branch can be found i
6093
You can install Intel® Extension for PyTorch\* for GPU via command below.
6194

6295
```bash
63-
python -m pip install torch==2.1.0.post2 torchvision==0.16.0.post2 torchaudio==2.1.0.post2 intel-extension-for-pytorch==2.1.30+xpu oneccl_bind_pt==2.1.300+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
96+
python -m pip install torch==2.3.1+cxx11.abi torchvision==0.18.1+cxx11.abi torchaudio==2.3.1+cxx11.abi intel-extension-for-pytorch==2.3.110+xpu oneccl_bind_pt==2.3.100+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
6497
# for PRC user, you can check with the following link
65-
python -m pip install torch==2.1.0.post2 torchvision==0.16.0.post2 torchaudio==2.1.0.post2 intel-extension-for-pytorch==2.1.30+xpu oneccl_bind_pt==2.1.300+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
66-
98+
python -m pip install torch==2.3.1+cxx11.abi torchvision==0.18.1+cxx11.abi torchaudio==2.3.1+cxx11.abi intel-extension-for-pytorch==2.3.110+xpu oneccl_bind_pt==2.3.100+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
6799
```
68100

69101
**Note:** The patched PyTorch 2.1.0 is required to work with Intel® Extension for PyTorch\* on Intel® graphics card for now.
@@ -126,3 +158,4 @@ for information on how to report a potential security issue or vulnerability.
126158
See also: [Security Policy](SECURITY.md)
127159

128160

161+

dependency_version.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
},
2121
"torch-ccl": {
2222
"version": "2.3.100+xpu",
23-
"commit": "master"
23+
"commit": "v2.3.100+xpu"
2424
},
2525
"basekit": {
2626
"dpcpp-cpp-rt": {

docs/index.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,20 +58,20 @@ The team tracks bugs and enhancement requests using `GitHub issues <https://gith
5858

5959
tutorials/introduction
6060
tutorials/features
61-
Large Language Models (LLM)<tutorials/llm>
61+
Large Language Models (LLM) <tutorials/llm>
6262
tutorials/performance
6363
tutorials/technical_details
6464
tutorials/releases
6565
tutorials/known_issues
66-
tutorials/blogs_publications
66+
Blogs & Publications <https://intel.github.io/intel-extension-for-pytorch/blogs.html>
6767
tutorials/license
6868

6969
.. toctree::
7070
:maxdepth: 3
7171
:caption: GET STARTED
7272
:hidden:
7373

74-
tutorials/installation
74+
Installation <https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu&version=v2.3.110%2bxpu>
7575
tutorials/getting_started
7676
tutorials/examples
7777

docs/tutorials/blogs_publications.md

Lines changed: 0 additions & 40 deletions
This file was deleted.

docs/tutorials/installation.rst

Lines changed: 0 additions & 7 deletions
This file was deleted.

docs/tutorials/llm.rst

Lines changed: 64 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,11 @@ These LLM-specific optimizations can be automatically applied with a single fron
1313

1414
llm/llm_optimize_transformers
1515

16-
Optimized Models
17-
----------------
16+
Optimized Models List
17+
---------------------
18+
19+
LLM Inference
20+
~~~~~~~~~~~~~
1821

1922
.. list-table::
2023
:widths: auto
@@ -28,6 +31,14 @@ Optimized Models
2831
- "meta-llama/Llama-2-7b-hf", "meta-llama/Llama-2-13b-hf", "meta-llama/Llama-2-70b-hf"
2932
- ✅
3033
- ✅
34+
* - Llama3
35+
- "meta-llama/Meta-Llama-3-8B"
36+
- ✅
37+
- ✅
38+
* - Phi-3 mini
39+
- "microsoft/Phi-3-mini-128k-instruct"
40+
- ✅
41+
- ✅
3142
* - GPT-J
3243
- "EleutherAI/gpt-j-6b"
3344
- ✅
@@ -56,7 +67,57 @@ Optimized Models
5667

5768
*Note*: The above verified models (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family) are well supported with all optimizations like indirect access KV cache, fused ROPE, and prepacked TPP Linear (fp16). For other LLMs families, we are working in progress to cover those optimizations, which will expand the model list above.
5869

59-
Check `LLM best known practice <https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.1.30/examples/gpu/inference/python/llm>`_ for instructions to install/setup environment and example scripts..
70+
LLM fine-tuning
71+
~~~~~~~~~~~~~~~
72+
73+
.. list-table::
74+
:widths: auto
75+
:header-rows: 1
76+
77+
* - Model Family
78+
- Verified < MODEL ID > (Huggingface hub)
79+
- Mixed Precision (BF16+FP32)
80+
- Full fine-tuning
81+
- LoRA
82+
- Intel® Data Center Max 1550 GPU
83+
- Intel® Core™ Ultra Processors with Intel® Arc™ Graphics
84+
* - Llama2
85+
- "meta-llama/Llama-2-7b-hf"
86+
- ✅
87+
- ✅
88+
- ✅
89+
- ✅
90+
- ✅
91+
* - Llama2
92+
- "meta-llama/Llama-2-70b-hf",
93+
- ✅
94+
- ❎
95+
- ✅
96+
- ✅
97+
- ❎
98+
* - Llama3
99+
- "meta-llama/Meta-Llama-3-8B"
100+
- ✅
101+
- ✅
102+
- ✅
103+
- ✅
104+
- ✅
105+
* - Qwen
106+
- "Qwen/Qwen-7B"
107+
- ✅
108+
- ✅
109+
- ✅
110+
- ✅
111+
- ❎
112+
* - Phi-3-mini 3.8B
113+
- "Phi-3-mini-4k-instruct"
114+
- ✅
115+
- ✅
116+
- ✅
117+
- ❎
118+
- ✅
119+
120+
Check `LLM best known practice <https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.3.110/examples/gpu/llm>`_ for instructions to install/setup environment and example scripts..
60121

61122
Optimization Methodologies
62123
--------------------------

examples/gpu/llm/README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,15 +100,17 @@ conda activate llm
100100
# Setup the environment with the provided script
101101
cd examples/gpu/llm
102102
# If you want to install Intel® Extension for PyTorch\* from source, use the commands below:
103-
bash ./tools/env_setup.sh 3 <DPCPP_ROOT> <ONEMKL_ROOT> <ONECCL_ROOT> <MPI_ROOT> <PTI_ROOT> <AOT>
104103
# e.g. bash ./tools/env_setup.sh 0x03 /opt/intel/oneapi/compiler/latest /opt/intel/oneapi/mkl/latest /opt/intel/oneapi/ccl/latest /opt/intel/oneapi/mpi/latest /opt/intel/oneapi/pti/latest pvc
104+
bash ./tools/env_setup.sh 3 <DPCPP_ROOT> <ONEMKL_ROOT> <ONECCL_ROOT> <MPI_ROOT> <PTI_ROOT> <AOT>
105+
105106
conda deactivate
106107
conda activate llm
107108
source ./tools/env_activate.sh [inference|fine-tuning]
108109
```
109110

110111
where <br />
111-
- `AOT` is a text string to enable `Ahead-Of-Time` compilation for specific GPU models. Check [tutorial](../../../../../docs/tutorials/technical_details/AOT.md) for details.<br />
112+
- `AOT` is a text string to enable `Ahead-Of-Time` compilation for specific GPU models. For example 'pvc,ats-m150' for the Platform Intel® Data Center GPU Max Series, Intel® Data Center GPU Flex Series and Intel® Arc™ A-Series Graphics (A770). Check [tutorial](../../../docs/tutorials/technical_details/AOT.md) for details.<br />
113+
112114

113115
<br />
114116

examples/gpu/llm/fine-tuning/Llama2/deepspeed_confg.json

Lines changed: 0 additions & 49 deletions
This file was deleted.

0 commit comments

Comments
 (0)