intel
diff --git a/‎README.md‎
Lines changed: 4 additions & 4 deletions b/‎README.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docker/Dockerfile.prebuilt‎
Lines changed: 3 additions & 3 deletions b/‎docker/Dockerfile.prebuilt‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docker/README.md‎
Lines changed: 19 additions & 5 deletions b/‎docker/README.md‎
Lines changed: 19 additions & 5 deletions
diff --git a/‎docs/tutorials/examples.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/tutorials/examples.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/tutorials/performance.md‎
Lines changed: 39 additions & 0 deletions b/‎docs/tutorials/performance.md‎
Lines changed: 39 additions & 0 deletions
@@ -22,7 +22,7 @@ python -m pip install intel_extension_for_pytorch
 ```
 
 ```python
-python -m pip install intel_extension_for_pytorch -f https://developer.intel.com/ipex-whl-stable-cpu
+python -m pip install intel_extension_for_pytorch --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
 ```
 
 **Note:** Intel® Extension for PyTorch\* has PyTorch version requirement. Please check more detailed information via the URL below.
@@ -36,7 +36,7 @@ Compilation instruction of the latest CPU code base `main` branch can be found a
 You can install Intel® Extension for PyTorch\* for GPU via command below.
 
 ```python
-python -m pip install torch==2.0.1a0 torchvision==0.15.2a0 intel_extension_for_pytorch==2.0.110+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
+python -m pip install torch==2.0.1a0 torchvision==0.15.2a0 intel_extension_for_pytorch==2.0.110+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
 **Note:** The patched PyTorch 2.0.1a0 is required to work with Intel® Extension for PyTorch\* on Intel® graphics card for now.
@@ -89,9 +89,9 @@ with torch.no_grad():
   model(data)
 ```
 
-## Model Zoo
+## Intel® AI Reference Models
 
-Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r2.1-models). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r2.1-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
+Use cases that had already been optimized by Intel engineers are available at [Intel® AI Reference Models](https://github.com/IntelAI/models/tree/pytorch-r2.1.100-models) (former Model Zoo). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r2.1.100-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
 
 ## License
 
 
@@ -28,9 +28,9 @@ RUN ${PYTHON} -m pip --no-cache-dir install --upgrade \
 RUN ln -s $(which ${PYTHON}) /usr/local/bin/python
 
 ARG IPEX_VERSION=2.1.100
-ARG PYTORCH_VERSION=2.1.0
-ARG TORCHAUDIO_VERSION=2.1.0
-ARG TORCHVISION_VERSION=0.16.0
+ARG PYTORCH_VERSION=2.1.1
+ARG TORCHAUDIO_VERSION=2.1.1
+ARG TORCHVISION_VERSION=0.16.1
 ARG TORCH_CPU_URL=https://download.pytorch.org/whl/cpu/torch_stable.html
 
 RUN \
 
@@ -10,14 +10,28 @@
 
   ```console
   $ cd $DOCKERFILE_DIR
-  $ DOCKER_BUILDKIT=1 docker build -f Dockerfile.prebuilt -t intel-extension-for-pytorch:prebuilt .
-  $ docker run --rm intel-extension-for-pytorch:prebuilt python -c "import torch; import intel_extension_for_pytorch as ipex; print('torch:', torch.__version__,' ipex:',ipex.__version__)"
+  $ DOCKER_BUILDKIT=1 docker build -f Dockerfile.prebuilt -t intel-extension-for-pytorch:2.1.100 .
   ```
 
   Run the following commands to build a `conda` based container with Intel® Extension for PyTorch\* compiled from source:
 
   ```console
-  $ cd $DOCKERFILE_DIR
-  $ DOCKER_BUILDKIT=1 docker build -f Dockerfile.compile -t intel-extension-for-pytorch:compile .
-  $ docker run --rm intel-extension-for-pytorch:compile python -c "import torch; import intel_extension_for_pytorch as ipex; print('torch:', torch.__version__,' ipex:',ipex.__version__)"
+  $ git clone https://github.com/intel/intel-extension-for-pytorch.git
+  $ cd intel-extension-for-pytorch
+  $ git submodule sync
+  $ git submodule update --init --recursive
+  $ cd ..
+  $ DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.compile -t intel-extension-for-pytorch:2.1.100 .
+  ```
+
+* Sanity Test
+
+  When a docker image is built out, Run the command below to launch into a container:
+  ```console
+  $ docker run --rm -it intel-extension-for-pytorch:2.1.100 bash
+  ```
+
+  Then run the command below inside the container to verify correct installation.
+  ```console
+  # python -c "import torch; import intel_extension_for_pytorch as ipex; print('torch:', torch.__version__,' ipex: ',ipex.__version__)"
   ```
@@ -348,5 +348,5 @@ $ ldd example-app
 
 ## Intel® AI Reference Models
 
-Use cases that have already been optimized by Intel engineers are available at [Intel® AI Reference Models](https://github.com/IntelAI/models/tree/pytorch-r2.1-models) (former Model Zoo). A number of PyTorch use cases for benchmarking are also available in the [benchmarks](https://github.com/IntelAI/models/tree/pytorch-r2.1-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-the-box by simply running scripts in the Intel® AI Reference Models.
+Use cases that have already been optimized by Intel engineers are available at [Intel® AI Reference Models](https://github.com/IntelAI/models/tree/pytorch-r2.1.100-models) (former Model Zoo). A number of PyTorch use cases for benchmarking are also available in the [benchmarks](https://github.com/IntelAI/models/tree/pytorch-r2.1.100-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-the-box by simply running scripts in the Intel® AI Reference Models.
 
@@ -9,6 +9,45 @@ This page shows performance boost with Intel® Extension for PyTorch\* on severa
 
 Find the latest performance data for 4th gen Intel® Xeon® Scalable processors and 3rd gen Intel® Xeon® processors, including detailed hardware and software configurations, at [Intel® Developer Zone article](https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/performance.html).
 
+## LLM Performance
+
+We benchmarked LLaMA2 7B, 13B, GPT-J 6B with test input token length set to 256 and 1024 respectively. The tests were carried out on AWS M7i and M6i instances. CPUs of M6i instances are 3rd Gen Intel® Xeon® Processors which do not have AMX instructions for BF16 computing acceleration, so we take FP32 precision for benchmarking instead of BF16 on M6i instances.
+
+![LLaMA2 7B Results](../../images/performance/m7i_m6i_comp_llama7b.png)
+
+![LLaMA2 13B Results](../../images/performance/m7i_m6i_comp_llama13b.png)
+
+![GPT-J 6B Results](../../images/performance/m7i_m6i_comp_gptj6b.png)
+
+The LLM inference performances on M7i and M6i instances are compared based on the above results. M7i, with the 4th Gen Xeon® processors, has a remarkable performance advantage over M6i with the 3rd Gen Xeon® processors.
+
+M7i performance boost ratio over M6i for non-quantized (BF16 or FP32) models:
+
+|            | Speedup | Throughput |
+|:----------:|:-------:|:----------:|
+|  LLaMA2 7B |  2.47x  |    2.62x   |
+| LLaMA2 13B |  2.57x  |    2.62x   |
+|  GPT-J 6B  |  2.58x  |    2.85x   |
+
+M7i performance boost ratio over M6i for INT8 quantized models:
+
+|            | Speedup | Throughput |
+|:----------:|:-------:|:----------:|
+|  LLaMA2 7B |  1.27x  |    1.38x   |
+| LLaMA2 13B |  1.27x  |    1.27x   |
+|  GPT-J 6B  |  1.29x  |    1.36x   |
+
+We can also conclude that **with a larger batch size the capacity of the model service can be improved at the cost of longer response latency for the individual sessions**. The following table exhibits that for INT8 quantized LLaMA2-7b model on M7i instances, input batch_size=8 would increase the total throughput by 6.47x compared with batch_size=1, whereas P90 token latency gets 1.26x longer.
+
+| Batch size | Decoder latency | Total tokens per sec |
+|:----------:|:---------------:|:--------------------:|
+|      1     |        39       |         26.32        |
+|      8     |        49       |        170.21        |
+|            |                 |                      |
+|***Ratio*** |      1.26x      |         6.47x        |
+
+*Note:* Measured by Intel on 17th Aug 2023; M7i.16xLarge, M6i.16xLarge instances in US-west-2. OS-Ubuntu 22.04-lts, kernel 6.20.0-1009-aws, SW: PyTorch* 2.1 and Intel® Extension for PyTorch* 2.1/llm_feature_branch.
+
 ## INT8 with v1.11
 
 ### Performance Numbers
Original file line number	Diff line number	Diff line change
`@@ -348,5 +348,5 @@ $ ldd example-app`
`348`	`348`
`349`	`349`	`## Intel® AI Reference Models`
`350`	`350`
`351`		`-Use cases that have already been optimized by Intel engineers are available at [Intel® AI Reference Models](https://github.com/IntelAI/models/tree/pytorch-r2.1-models) (former Model Zoo). A number of PyTorch use cases for benchmarking are also available in the [benchmarks](https://github.com/IntelAI/models/tree/pytorch-r2.1-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-the-box by simply running scripts in the Intel® AI Reference Models.`
	`351`	`+Use cases that have already been optimized by Intel engineers are available at [Intel® AI Reference Models](https://github.com/IntelAI/models/tree/pytorch-r2.1.100-models) (former Model Zoo). A number of PyTorch use cases for benchmarking are also available in the [benchmarks](https://github.com/IntelAI/models/tree/pytorch-r2.1.100-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-the-box by simply running scripts in the Intel® AI Reference Models.`
`352`	`352`