Skip to content

Commit 710080f

Browse files
ZailiWangjingxu10jianan-gu
authored
[21010] update LLM inf example doc (#2320)
* update LLM inf example doc * update LLM model names (removing model sizes); update "model zoo" to new name; update to new reference model release branch * setting KMP_BLOCKTIME to 1 * revert KMP_BLOCKTIME setting * update readme * Adding LLM performance results * update docker readme * adding Xeon CPU Max Series (HBM) instructions * update benchmarking conclusions * adding performance section link in LLM README.md * update version in dockerfile.prebuilt * Update README.md * fine tune usage of llm examples * adding explanations for numactl params and model-id * adding misc. tips section * misc corrections --------- Co-authored-by: Jing Xu <[email protected]> Co-authored-by: jianan-gu <[email protected]>
1 parent cdce912 commit 710080f

File tree

10 files changed

+258
-41
lines changed

10 files changed

+258
-41
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ python -m pip install intel_extension_for_pytorch
2222
```
2323

2424
```python
25-
python -m pip install intel_extension_for_pytorch -f https://developer.intel.com/ipex-whl-stable-cpu
25+
python -m pip install intel_extension_for_pytorch --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
2626
```
2727

2828
**Note:** Intel® Extension for PyTorch\* has PyTorch version requirement. Please check more detailed information via the URL below.
@@ -36,7 +36,7 @@ Compilation instruction of the latest CPU code base `main` branch can be found a
3636
You can install Intel® Extension for PyTorch\* for GPU via command below.
3737

3838
```python
39-
python -m pip install torch==2.0.1a0 torchvision==0.15.2a0 intel_extension_for_pytorch==2.0.110+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
39+
python -m pip install torch==2.0.1a0 torchvision==0.15.2a0 intel_extension_for_pytorch==2.0.110+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
4040
```
4141

4242
**Note:** The patched PyTorch 2.0.1a0 is required to work with Intel® Extension for PyTorch\* on Intel® graphics card for now.
@@ -89,9 +89,9 @@ with torch.no_grad():
8989
model(data)
9090
```
9191

92-
## Model Zoo
92+
## Intel® AI Reference Models
9393

94-
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r2.1-models). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r2.1-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
94+
Use cases that had already been optimized by Intel engineers are available at [Intel® AI Reference Models](https://github.com/IntelAI/models/tree/pytorch-r2.1.100-models) (former Model Zoo). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r2.1.100-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
9595

9696
## License
9797

docker/Dockerfile.prebuilt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,9 @@ RUN ${PYTHON} -m pip --no-cache-dir install --upgrade \
2828
RUN ln -s $(which ${PYTHON}) /usr/local/bin/python
2929

3030
ARG IPEX_VERSION=2.1.100
31-
ARG PYTORCH_VERSION=2.1.0
32-
ARG TORCHAUDIO_VERSION=2.1.0
33-
ARG TORCHVISION_VERSION=0.16.0
31+
ARG PYTORCH_VERSION=2.1.1
32+
ARG TORCHAUDIO_VERSION=2.1.1
33+
ARG TORCHVISION_VERSION=0.16.1
3434
ARG TORCH_CPU_URL=https://download.pytorch.org/whl/cpu/torch_stable.html
3535

3636
RUN \

docker/README.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,28 @@
1010

1111
```console
1212
$ cd $DOCKERFILE_DIR
13-
$ DOCKER_BUILDKIT=1 docker build -f Dockerfile.prebuilt -t intel-extension-for-pytorch:prebuilt .
14-
$ docker run --rm intel-extension-for-pytorch:prebuilt python -c "import torch; import intel_extension_for_pytorch as ipex; print('torch:', torch.__version__,' ipex:',ipex.__version__)"
13+
$ DOCKER_BUILDKIT=1 docker build -f Dockerfile.prebuilt -t intel-extension-for-pytorch:2.1.100 .
1514
```
1615

1716
Run the following commands to build a `conda` based container with Intel® Extension for PyTorch\* compiled from source:
1817

1918
```console
20-
$ cd $DOCKERFILE_DIR
21-
$ DOCKER_BUILDKIT=1 docker build -f Dockerfile.compile -t intel-extension-for-pytorch:compile .
22-
$ docker run --rm intel-extension-for-pytorch:compile python -c "import torch; import intel_extension_for_pytorch as ipex; print('torch:', torch.__version__,' ipex:',ipex.__version__)"
19+
$ git clone https://github.com/intel/intel-extension-for-pytorch.git
20+
$ cd intel-extension-for-pytorch
21+
$ git submodule sync
22+
$ git submodule update --init --recursive
23+
$ cd ..
24+
$ DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.compile -t intel-extension-for-pytorch:2.1.100 .
25+
```
26+
27+
* Sanity Test
28+
29+
When a docker image is built out, Run the command below to launch into a container:
30+
```console
31+
$ docker run --rm -it intel-extension-for-pytorch:2.1.100 bash
32+
```
33+
34+
Then run the command below inside the container to verify correct installation.
35+
```console
36+
# python -c "import torch; import intel_extension_for_pytorch as ipex; print('torch:', torch.__version__,' ipex: ',ipex.__version__)"
2337
```

docs/tutorials/examples.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -348,5 +348,5 @@ $ ldd example-app
348348

349349
## Intel® AI Reference Models
350350

351-
Use cases that have already been optimized by Intel engineers are available at [Intel® AI Reference Models](https://github.com/IntelAI/models/tree/pytorch-r2.1-models) (former Model Zoo). A number of PyTorch use cases for benchmarking are also available in the [benchmarks](https://github.com/IntelAI/models/tree/pytorch-r2.1-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-the-box by simply running scripts in the Intel® AI Reference Models.
351+
Use cases that have already been optimized by Intel engineers are available at [Intel® AI Reference Models](https://github.com/IntelAI/models/tree/pytorch-r2.1.100-models) (former Model Zoo). A number of PyTorch use cases for benchmarking are also available in the [benchmarks](https://github.com/IntelAI/models/tree/pytorch-r2.1.100-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-the-box by simply running scripts in the Intel® AI Reference Models.
352352

docs/tutorials/performance.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,45 @@ This page shows performance boost with Intel® Extension for PyTorch\* on severa
99

1010
Find the latest performance data for 4th gen Intel® Xeon® Scalable processors and 3rd gen Intel® Xeon® processors, including detailed hardware and software configurations, at [Intel® Developer Zone article](https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/performance.html).
1111

12+
## LLM Performance
13+
14+
We benchmarked LLaMA2 7B, 13B, GPT-J 6B with test input token length set to 256 and 1024 respectively. The tests were carried out on AWS M7i and M6i instances. CPUs of M6i instances are 3rd Gen Intel® Xeon® Processors which do not have AMX instructions for BF16 computing acceleration, so we take FP32 precision for benchmarking instead of BF16 on M6i instances.
15+
16+
![LLaMA2 7B Results](../../images/performance/m7i_m6i_comp_llama7b.png)
17+
18+
![LLaMA2 13B Results](../../images/performance/m7i_m6i_comp_llama13b.png)
19+
20+
![GPT-J 6B Results](../../images/performance/m7i_m6i_comp_gptj6b.png)
21+
22+
The LLM inference performances on M7i and M6i instances are compared based on the above results. M7i, with the 4th Gen Xeon® processors, has a remarkable performance advantage over M6i with the 3rd Gen Xeon® processors.
23+
24+
M7i performance boost ratio over M6i for non-quantized (BF16 or FP32) models:
25+
26+
| | Speedup | Throughput |
27+
|:----------:|:-------:|:----------:|
28+
| LLaMA2 7B | 2.47x | 2.62x |
29+
| LLaMA2 13B | 2.57x | 2.62x |
30+
| GPT-J 6B | 2.58x | 2.85x |
31+
32+
M7i performance boost ratio over M6i for INT8 quantized models:
33+
34+
| | Speedup | Throughput |
35+
|:----------:|:-------:|:----------:|
36+
| LLaMA2 7B | 1.27x | 1.38x |
37+
| LLaMA2 13B | 1.27x | 1.27x |
38+
| GPT-J 6B | 1.29x | 1.36x |
39+
40+
We can also conclude that **with a larger batch size the capacity of the model service can be improved at the cost of longer response latency for the individual sessions**. The following table exhibits that for INT8 quantized LLaMA2-7b model on M7i instances, input batch_size=8 would increase the total throughput by 6.47x compared with batch_size=1, whereas P90 token latency gets 1.26x longer.
41+
42+
| Batch size | Decoder latency | Total tokens per sec |
43+
|:----------:|:---------------:|:--------------------:|
44+
| 1 | 39 | 26.32 |
45+
| 8 | 49 | 170.21 |
46+
| | | |
47+
|***Ratio*** | 1.26x | 6.47x |
48+
49+
*Note:* Measured by Intel on 17th Aug 2023; M7i.16xLarge, M6i.16xLarge instances in US-west-2. OS-Ubuntu 22.04-lts, kernel 6.20.0-1009-aws, SW: PyTorch* 2.1 and Intel® Extension for PyTorch* 2.1/llm_feature_branch.
50+
1251
## INT8 with v1.11
1352

1453
### Performance Numbers

0 commit comments

Comments
 (0)