Skip to content

Commit 165c488

Browse files
authored
[Doc] update llm table (#5572)
* update llm table in llm.rst * save log_e2e for Llava model in run_benchmark * update order of model list
1 parent d1fb1b7 commit 165c488

File tree

3 files changed

+16
-15
lines changed

3 files changed

+16
-15
lines changed

docs/tutorials/llm.rst

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -36,13 +36,13 @@ LLM Inference
3636
- ✅
3737
- ✅
3838
* - Llama3
39-
- meta-llama/Meta-Llama-3-8B-Instruct, meta-llama/Meta-Llama-3-70B-Instruct
39+
- meta-llama/Meta-Llama-3-8B-Instruct, meta-llama/Meta-Llama-3-70B-Instruct, meta-llama/Llama-3.2-1B, meta-llama/Llama-3.2-3B,meta-llama/Llama-3.3-70B-Instruct
4040
- ✅
4141
- ✅
4242
- ✅
4343
- ✅
4444
* - Phi-3 mini
45-
- microsoft/Phi-3-mini-4k-instruct, microsoft/Phi-3-mini-128k-instruct
45+
- microsoft/Phi-3-mini-4k-instruct, microsoft/Phi-3-mini-128k-instruct, microsoft/Phi-3.5-mini-instruct
4646
- ✅
4747
- ✅
4848
- ✅
@@ -54,7 +54,7 @@ LLM Inference
5454
- ✅
5555
- ✅
5656
* - Qwen
57-
- Qwen/Qwen2-VL-7B-Instruct
57+
- Qwen/Qwen2-VL-7B-Instruct, Qwen/Qwen2.5-7B-Instruct
5858
- ✅
5959
- ✅
6060
- ✅
@@ -77,18 +77,18 @@ LLM Inference
7777
- ✅
7878
- ✅
7979
-
80-
* - Falcon
81-
- tiiuae/falcon-40b-instruct
82-
- ✅
83-
-
84-
- ✅
85-
-
8680
* - OPT
8781
- facebook/opt-6.7b, facebook/opt-30b
8882
- ✅
8983
-
9084
- ✅
9185
-
86+
* - Mixtral
87+
- mistralai/Mistral-7B-Instruct-v0.2
88+
- ✅
89+
- ✅
90+
- ✅
91+
- ✅
9292

9393
Platforms
9494
~~~~~~~~~~~~~
@@ -135,7 +135,8 @@ LLM fine-tuning on Intel® Data Center Max 1550 GPU
135135
- ✅
136136
- ✅
137137

138-
Check `LLM best known practice <https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.5.10/examples/gpu/llm>`_ for instructions to install/setup environment and example scripts..
138+
139+
Check `LLM best known practice <https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.7.10/examples/gpu/llm>`_ for instructions to install/setup environment and example scripts..
139140

140141
Optimization Methodologies
141142
--------------------------

examples/gpu/llm/inference/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,11 @@ Currently, only support Transformers 4.48.3. Support for newer versions of Trans
1414
| MODEL FAMILY | Verified < MODEL ID > (Huggingface hub)| FP16 | Weight only quantization INT4 | Optimized on Intel® Data Center GPU Max Series (1550/1100) | Optimized on Intel® Core™ Ultra Processors with Intel® Arc™ Graphics | Optimized on Intel® Arc™ B-Series Graphics (B580) |
1515
|---|:---:|:---:|:---:|:---:|:---:|:---:|
1616
|Llama 2| "meta-llama/Llama-2-7b-hf", "meta-llama/Llama-2-13b-hf", "meta-llama/Llama-2-70b-hf" |||||$✅^1$|
17-
|Llama 3| "meta-llama/Meta-Llama-3-8B", "meta-llama/Meta-Llama-3-70B" |||||$✅^2$|
18-
|Phi-3 mini| "microsoft/Phi-3-mini-128k-instruct", "microsoft/Phi-3-mini-4k-instruct" |||||$✅^3$|
17+
|Llama 3| "meta-llama/Meta-Llama-3-8B", "meta-llama/Meta-Llama-3-70B", "meta-llama/Llama-3.2-1B", "meta-llama/Llama-3.2-3B", "meta-llama/Llama-3.3-70B-Instruct" |||||$✅^2$|
18+
|Phi-3 mini| "microsoft/Phi-3-mini-128k-instruct", "microsoft/Phi-3-mini-4k-instruct", "microsoft/Phi-3.5-mini-instruct" |||||$✅^3$|
19+
|Mistral | "mistralai/Mistral-7B-Instruct-v0.2" ||||| |
1920
|GPT-J| "EleutherAI/gpt-j-6b" ||||| |
20-
|Qwen|"Qwen/Qwen2-7B"||||| |
21-
|Qwen|"Qwen/Qwen2-7B-Instruct"| | | | ||
21+
|Qwen|"Qwen/Qwen2-7B", "Qwen/Qwen2-7B-Instruct", "Qwen/Qwen2.5-7B-Instruct" ||||| |
2222
|OPT|"facebook/opt-6.7b", "facebook/opt-30b"|| || |
2323
|Bloom|"bigscience/bloom-7b1", "bigscience/bloom"|| || |
2424
|GLM4-9B|"THUDM/glm-4-9b"|| || |

examples/gpu/llm/inference/run_benchmark.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -259,7 +259,7 @@ Run_benchmark_Llava1.5-7b(){
259259
model=llava-hf/llava-1.5-7b-hf
260260
sub_model_name=llava
261261
dir=perf/${model}/beam${beam}_bs${bs}
262-
python -u run_generation.py --benchmark -m ${model} --sub-model-name ${sub_model_name} --num-beams 1 --num-iter ${iter} --device xpu --ipex --dtype float16 --batch-size ${bs} --vision-text-model
262+
python -u run_generation.py --benchmark -m ${model} --sub-model-name ${sub_model_name} --num-beams 1 --num-iter ${iter} --device xpu --ipex --dtype float16 --batch-size ${bs} --vision-text-model 2>&1 | tee log_e2e
263263
mv log_e2e ${dir}
264264
PROFILE=1 python -u run_generation.py --benchmark -m ${model} --sub-model-name ${sub_model_name} --num-beams 1 --num-iter ${iter} --device xpu --ipex --dtype float16 --batch-size ${bs} --vision-text-model
265265
mv profile*pt ${dir}

0 commit comments

Comments
 (0)