[Doc] update llm table (#5572)

ZhaoqiongZ · web-flow · commit 165c488edc3b · 2025-04-24T13:52:00.000+08:00
* update llm table in llm.rst

* save log_e2e for Llava model in run_benchmark

* update order of model list
diff --git a/docs/tutorials/llm.rst b/docs/tutorials/llm.rst
@@ -36,13 +36,13 @@ LLM Inference
      - ✅
      - ✅
    * - Llama3
-     - meta-llama/Meta-Llama-3-8B-Instruct, meta-llama/Meta-Llama-3-70B-Instruct
+     - meta-llama/Meta-Llama-3-8B-Instruct, meta-llama/Meta-Llama-3-70B-Instruct, meta-llama/Llama-3.2-1B, meta-llama/Llama-3.2-3B,meta-llama/Llama-3.3-70B-Instruct
      - ✅
      - ✅
      - ✅
      - ✅
    * - Phi-3 mini
-     - microsoft/Phi-3-mini-4k-instruct, microsoft/Phi-3-mini-128k-instruct
+     - microsoft/Phi-3-mini-4k-instruct, microsoft/Phi-3-mini-128k-instruct, microsoft/Phi-3.5-mini-instruct
      - ✅
      - ✅
      - ✅
@@ -54,7 +54,7 @@ LLM Inference
      - ✅
      - ✅
    * - Qwen 
-     - Qwen/Qwen2-VL-7B-Instruct
+     - Qwen/Qwen2-VL-7B-Instruct, Qwen/Qwen2.5-7B-Instruct
      - ✅
      - ✅
      - ✅
@@ -77,18 +77,18 @@ LLM Inference
      - ✅
      - ✅
      - 
-   * - Falcon
-     - tiiuae/falcon-40b-instruct
-     - ✅
-     - 
-     - ✅
-     - 
    * - OPT 
      - facebook/opt-6.7b, facebook/opt-30b
      - ✅
      - 
      - ✅
      - 
+   * - Mixtral
+     - mistralai/Mistral-7B-Instruct-v0.2
+     - ✅
+     - ✅
+     - ✅
+     - ✅
 
 Platforms
 ~~~~~~~~~~~~~
@@ -135,7 +135,8 @@ LLM fine-tuning on Intel® Data Center Max 1550 GPU
      - ✅
      - ✅
 
-Check `LLM best known practice <https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.5.10/examples/gpu/llm>`_ for instructions to install/setup environment and example scripts..
+
+Check `LLM best known practice <https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.7.10/examples/gpu/llm>`_ for instructions to install/setup environment and example scripts..
 
 Optimization Methodologies
 --------------------------
diff --git a/examples/gpu/llm/inference/README.md b/examples/gpu/llm/inference/README.md
@@ -14,11 +14,11 @@ Currently, only support Transformers 4.48.3. Support for newer versions of Trans
 | MODEL FAMILY | Verified < MODEL ID > (Huggingface hub)| FP16 | Weight only quantization INT4 | Optimized on Intel® Data Center GPU Max Series (1550/1100) | Optimized on Intel® Core™ Ultra Processors with Intel® Arc™ Graphics | Optimized on Intel® Arc™ B-Series Graphics (B580) | 
 |---|:---:|:---:|:---:|:---:|:---:|:---:|
 |Llama 2| "meta-llama/Llama-2-7b-hf", "meta-llama/Llama-2-13b-hf", "meta-llama/Llama-2-70b-hf" |✅| ✅|✅|✅|$✅^1$|
-|Llama 3| "meta-llama/Meta-Llama-3-8B", "meta-llama/Meta-Llama-3-70B" |✅| ✅|✅|✅|$✅^2$|
-|Phi-3 mini| "microsoft/Phi-3-mini-128k-instruct", "microsoft/Phi-3-mini-4k-instruct" |✅| ✅|✅|✅|$✅^3$|
+|Llama 3| "meta-llama/Meta-Llama-3-8B", "meta-llama/Meta-Llama-3-70B", "meta-llama/Llama-3.2-1B", "meta-llama/Llama-3.2-3B", "meta-llama/Llama-3.3-70B-Instruct" |✅| ✅|✅|✅|$✅^2$|
+|Phi-3 mini| "microsoft/Phi-3-mini-128k-instruct", "microsoft/Phi-3-mini-4k-instruct", "microsoft/Phi-3.5-mini-instruct" |✅| ✅|✅|✅|$✅^3$|
+|Mistral | "mistralai/Mistral-7B-Instruct-v0.2" | ✅| ✅| ✅ |✅ | |
 |GPT-J| "EleutherAI/gpt-j-6b" | ✅ | ✅ |✅ | ✅| |
-|Qwen|"Qwen/Qwen2-7B"|✅ | ✅ |✅ | ✅| |
-|Qwen|"Qwen/Qwen2-7B-Instruct"| | | | | ✅ |
+|Qwen|"Qwen/Qwen2-7B", "Qwen/Qwen2-7B-Instruct", "Qwen/Qwen2.5-7B-Instruct" |✅ | ✅ |✅ | ✅| |
 |OPT|"facebook/opt-6.7b", "facebook/opt-30b"| ✅ |  |✅ |  |
 |Bloom|"bigscience/bloom-7b1", "bigscience/bloom"| ✅ |  |✅ | |
 |GLM4-9B|"THUDM/glm-4-9b"| ✅ |  |✅ |  |
diff --git a/examples/gpu/llm/inference/run_benchmark.sh b/examples/gpu/llm/inference/run_benchmark.sh
@@ -259,7 +259,7 @@ Run_benchmark_Llava1.5-7b(){
 	model=llava-hf/llava-1.5-7b-hf
 	sub_model_name=llava
 	dir=perf/${model}/beam${beam}_bs${bs}
-	python -u run_generation.py --benchmark -m ${model} --sub-model-name ${sub_model_name} --num-beams 1  --num-iter ${iter} --device xpu --ipex --dtype float16 --batch-size ${bs} --vision-text-model
+	python -u run_generation.py --benchmark -m ${model} --sub-model-name ${sub_model_name} --num-beams 1  --num-iter ${iter} --device xpu --ipex --dtype float16 --batch-size ${bs} --vision-text-model 2>&1 | tee log_e2e
 	mv log_e2e ${dir}
 	PROFILE=1 python -u run_generation.py --benchmark -m ${model} --sub-model-name ${sub_model_name} --num-beams 1  --num-iter ${iter} --device xpu --ipex --dtype float16 --batch-size ${bs} --vision-text-model
 	mv profile*pt ${dir}