intel
diff --git a/‎README.md‎
Lines changed: 11 additions & 3 deletions b/‎README.md‎
Lines changed: 11 additions & 3 deletions
diff --git a/‎dependency_version.yml‎
Lines changed: 8 additions & 3 deletions b/‎dependency_version.yml‎
Lines changed: 8 additions & 3 deletions
diff --git a/‎docker/Dockerfile.compile‎
Lines changed: 1 addition & 2 deletions b/‎docker/Dockerfile.compile‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎docker/build.sh‎
Lines changed: 3 additions & 3 deletions b/‎docker/build.sh‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/tutorials/api_doc.rst‎
Lines changed: 3 additions & 0 deletions b/‎docs/tutorials/api_doc.rst‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/tutorials/features.rst‎
Lines changed: 7 additions & 3 deletions b/‎docs/tutorials/features.rst‎
Lines changed: 7 additions & 3 deletions
diff --git a/‎docs/tutorials/features/hypertune.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/tutorials/features/hypertune.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/tutorials/installation.rst‎
Lines changed: 1 addition & 1 deletion b/‎docs/tutorials/installation.rst‎
Lines changed: 1 addition & 1 deletion
@@ -1,4 +1,12 @@
-# Intel® Extension for PyTorch\*
+<div align="center">
+  
+Intel® Extension for Pytorch*
+===========================
+
+[💻Examples](./docs/tutorials/examples.md)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[📖CPU Documentations](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[📖GPU Documentations](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/)
+</div>
+
+
 
 Intel® Extension for PyTorch\* extends PyTorch\* with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel X<sup>e</sup> Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, through PyTorch\* `xpu` device, Intel® Extension for PyTorch\* provides easy GPU acceleration for Intel discrete GPUs with PyTorch\*.
 
@@ -31,10 +39,10 @@ Compilation instruction of the latest CPU code base `main` branch can be found a
 You can install Intel® Extension for PyTorch\* for GPU via command below.
 
 ```bash
-python -m pip install torch==2.0.1a0 torchvision==0.15.2a0 intel_extension_for_pytorch==2.0.110+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
+python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 intel_extension_for_pytorch==2.1.10+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
-**Note:** The patched PyTorch 2.0.1 is required to work with Intel® Extension for PyTorch\* on Intel® graphics card for now.
+**Note:** The patched PyTorch 2.1.0 is required to work with Intel® Extension for PyTorch\* on Intel® graphics card for now.
 
 More installation methods can be found at [GPU Installation Guide](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/installation.html).
 
 
@@ -4,17 +4,17 @@ gcc:
 llvm:
   version: 16.0.6
 pytorch:
-  version: 2.0.1a0
+  version: 2.1.0a0
   commit: v2.1.0
 torchaudio:
-  version: 2.0.1a0
+  version: 2.1.0a0
   commit: v2.1.0
 torchvision:
   version: 0.16.0a0
   commit: v0.16.0
 torch-ccl:
   repo: https://github.com/intel/torch-ccl.git
-  commit: c8f89db1639558c1149c4d0eecf90c980064f609
+  commit: 5f20135ccf8f828738cb3bc5a5ae7816df8100ae
   version: 2.1.100+xpu
 deepspeed:
   repo: https://github.com/microsoft/DeepSpeed.git
@@ -28,3 +28,8 @@ transformers:
   commit: v4.31.0
 protobuf:
   version: 3.20.3
+basekit:
+  dpcpp-cpp-rt:
+    version: 2024.0.0
+  mkl-dpcpp:
+    version: 2024.0.0
@@ -27,8 +27,7 @@ RUN apt update && \
     gnupg \
     gpg-agent
 COPY ./tools/basekit_driver_install_helper.sh .
-RUN bash ./basekit_driver_install_helper.sh add-apt-repo && \
-    bash ./basekit_driver_install_helper.sh driver
+RUN bash ./basekit_driver_install_helper.sh driver
 
 ARG GID_RENDER=109
 RUN useradd -m -s /bin/bash ubuntu && \
 
@@ -15,9 +15,9 @@ if [[ ${IMAGE_NAME} != "" ]]; then
                  --build-arg LEVEL_ZERO_DEV_VER=1.13.1-719~22.04 \
                  --build-arg DPCPP_VER=2024.0.0-49819 \
                  --build-arg MKL_VER=2024.0.0-49656 \
-                 --build-arg TORCH_VERSION=2.0.1a0+cxx11.abi \
-                 --build-arg IPEX_VERSION=2.0.110+xpu \
-                 --build-arg TORCHVISION_VERSION=0.15.2a0+cxx11.abi \
+                 --build-arg TORCH_VERSION=2.1.0a0+cxx11.abi \
+                 --build-arg IPEX_VERSION=2.1.10+xpu \
+                 --build-arg TORCHVISION_VERSION=0.16.0a0+cxx11.abi \
                  --build-arg TORCH_WHL_URL=https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ \
                  --build-arg IPEX_WHL_URL=https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ \
                  --build-arg TORCHVISION_WHL_URL=https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ \
 
@@ -6,6 +6,7 @@ Device-Agnostic
 
 .. currentmodule:: intel_extension_for_pytorch
 .. autofunction:: optimize
+.. autofunction:: optimize_transformers
 .. autofunction:: get_fp32_math_mode
 .. autofunction:: set_fp32_math_mode
 .. autoclass:: verbose
@@ -39,6 +40,8 @@ Miscellaneous
 .. set_stream
 .. autofunction:: stream
 .. autofunction:: synchronize
+.. autofunction:: quantization._gptq
+.. autofunction:: fp8_autocast
 
 Random Number Generator
 =======================
 
@@ -43,21 +43,25 @@ Detailed information of AMP for GPU and CPU are available at `Auto Mixed Precisi
    features/amp_gpu
 
 
-INT8 Quantization
------------------
+Quantization
+------------
 
-Intel® Extension for PyTorch* provides built-in quantization recipes to deliver good statistical accuracy for most popular DL workloads including CNN, NLP and recommendation models on CPU side. On top of that, if users would like to tune for a higher accuracy than what the default recipe provides, a recipe tuning API powered by Intel® Neural Compressor is provided for users to try.
+Intel® Extension for PyTorch* provides built-in INT8 quantization recipes to deliver good statistical accuracy for most popular DL workloads including CNN, NLP and recommendation models on CPU side. On top of that, if users would like to tune for a higher accuracy than what the default recipe provides, a recipe tuning API powered by Intel® Neural Compressor is provided for users to try.
 
 Check more detailed information for `INT8 Quantization [CPU] <features/int8_overview.md>`_ and `INT8 recipe tuning API guide (Experimental, *NEW feature in 1.13.0* on CPU) <features/int8_recipe_tuning_api.md>`_ on CPU side.
 
 On Intel® GPUs, quantization usages follow PyTorch default quantization APIs. Check sample codes at `Examples <./examples.html#int8>`_ page.
 
+Intel® Extension for PyTorch* also provides INT4 and FP8 Quantization.  Check more detailed information for `FP8 Quantization <./features/float8.md>`_ and `INT4 Quantization <./features/int4.md>`_ 
+
 .. toctree::
    :hidden:
    :maxdepth: 1
 
    features/int8_overview
    features/int8_recipe_tuning_api
+   features/int4
+   features/float8
 
 
 Distributed Training
 
@@ -95,15 +95,15 @@ This is the script as an optimization function.
 'target_val'                               # optional. Target value of the objective function. Default is -float('inf')
 ```
 
-Have a look at the [example script](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py).
+Have a look at the [example script](https://github.com/intel/intel-extension-for-pytorch/tree/v2.1.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py).
 
 ## Usage Examples
 
 **Tuning `ncores_per_instance` for minimum `latency`**
 
 Suppose we want to tune `ncores_per_instance` for a single instance to minimize latency for resnet50 on a machine with two Intel(R) Xeon(R) Platinum 8180M CPUs. Each socket has 28 physical cores and another 28 logical cores.
 
-Run the following command with [example.yaml](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/example.yaml) and [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py):
+Run the following command with [example.yaml](https://github.com/intel/intel-extension-for-pytorch/tree/v2.1.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/example.yaml) and [resnet50.py](https://github.com/intel/intel-extension-for-pytorch/tree/v2.1.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/resnet50.py):
 ```
 python -m intel_extension_for_pytorch.cpu.hypertune --conf_file <hypertune_directory>/example/example.yaml <hypertune_directory>/example/resnet50.py
 ```
@@ -115,6 +115,6 @@ latency: 12.339081764221191
 ```
 15 `ncores_per_instance` gave the minimum latency.
 
-You will also find the tuning history in `<output_dir>/record.csv`. You can take [a sample csv file](https://github.com/intel/intel-extension-for-pytorch/tree/v2.0.100+cpu/intel_extension_for_pytorch/cpu/hypertune/example/record.csv) as a reference.
+You will also find the tuning history in `<output_dir>/record.csv`. You can take [a sample csv file](https://github.com/intel/intel-extension-for-pytorch/tree/v2.1.0+cpu/intel_extension_for_pytorch/cpu/hypertune/example/record.csv) as a reference.
 
 Hypertune can also optimize multi-objective function. Add as many objectives as you would like to your script.
@@ -1,6 +1,6 @@
 Installation
 ============
 
-Select your preferences and follow the installation instructions provided on the `Installation page <../../../index.html#installation?platform=gpu&version=v2.1.0%2Bxpu>`_.
+Select your preferences and follow the installation instructions provided on the `Installation page <../../../index.html#installation?platform=gpu&version=v2.1.10%2Bxpu>`_.
 
 After successful installation, refer to the `Quick Start <getting_started.md>`_ and `Examples <examples.md>`_ sections to start using the extension in your code.