diff --git a/recipes_source/intel_neural_compressor_for_pytorch.rst b/recipes_source/intel_neural_compressor_for_pytorch.rst
index 02ce3d7b378..ee569382343 100755
--- a/recipes_source/intel_neural_compressor_for_pytorch.rst
+++ b/recipes_source/intel_neural_compressor_for_pytorch.rst
@@ -4,37 +4,20 @@ Ease-of-use quantization for PyTorch with Intel® Neural Compressor
 Overview
 --------
 
-Most deep learning applications are using 32-bits of floating-point precision
-for inference. But low precision data types, especially int8, are getting more
-focus due to significant performance boost. One of the essential concerns on
-adopting low precision is how to easily mitigate the possible accuracy loss
-and reach predefined accuracy requirement.
+Most deep learning applications are using 32-bits of floating-point precision for inference. But low precision data types, such as fp8, are getting more focus due to significant performance boost. A key concern in adopting low precision is mitigating accuracy loss while meeting predefined requirements.
 
-Intel® Neural Compressor aims to address the aforementioned concern by extending
-PyTorch with accuracy-driven automatic tuning strategies to help user quickly find
-out the best quantized model on Intel hardware, including Intel Deep Learning
-Boost (`Intel DL Boost <https://www.intel.com/content/www/us/en/artificial-intelligence/deep-learning-boost.html>`_)
-and Intel Advanced Matrix Extensions (`Intel AMX <https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/intrinsics-for-amx-instructions/intrinsics-for-amx-tile-instructions.html>`_).
+Intel® Neural Compressor aims to address the aforementioned concern by extending PyTorch with accuracy-driven automatic tuning strategies to help user quickly find out the best quantized model on Intel hardware.
 
-Intel® Neural Compressor has been released as an open-source project
-at `Github <https://github.com/intel/neural-compressor>`_.
+Intel® Neural Compressor is an open-source project at `Github <https://github.com/intel/neural-compressor>`_.
 
 Features
 --------
 
-- **Ease-of-use Python API:** Intel® Neural Compressor provides simple frontend
-  Python APIs and utilities for users to do neural network compression with few
-  line code changes.
-  Typically, only 5 to 6 clauses are required to be added to the original code.
+- **Ease-of-use API:** Intel® Neural Compressor is re-using the PyTorch ``prepare``, ``convert`` API for user usage.
 
-- **Quantization:** Intel® Neural Compressor supports accuracy-driven automatic
-  tuning process on post-training static quantization, post-training dynamic
-  quantization, and quantization-aware training on PyTorch fx graph mode and
-  eager model.
+- **Accuracy-driven Tuning:** Intel® Neural Compressor supports accuracy-driven automatic tuning process, provides ``autotune`` API for user usage.
 
-*This tutorial mainly focuses on the quantization part. As for how to use Intel®
-Neural Compressor to do pruning and distillation, please refer to corresponding
-documents in the Intel® Neural Compressor github repo.*
+- **Kinds of Quantization:** Intel® Neural Compressor supports a variety of quantization methods, including classic INT8 quantization, weight-only quantization and the popular FP8 quantization. Neural compressor also provides the latest research in simulation work, such as MX data type emulation quantization. For more details, please refer to `Supported Matrix <https://github.com/intel/neural-compressor/blob/master/docs/source/3x/PyTorch.md#supported-matrix>`_.
 
 Getting Started
 ---------------
@@ -45,329 +28,134 @@ Installation
 .. code:: bash
 
     # install stable version from pip
-    pip install neural-compressor
+    pip install neural-compressor-pt
+..
 
-    # install nightly version from pip
-    pip install -i https://test.pypi.org/simple/ neural-compressor
+**Note**: Neural Compressor provides automatic accelerator detection, including HPU, Intel GPU, CUDA, and CPU. To specify the target device, ``INC_TARGET_DEVICE`` is suggested, e.g., ``export INC_TARGET_DEVICE=cpu``.
 
-    # install stable version from from conda
-    conda install neural-compressor -c conda-forge -c intel
 
-*Supported python versions are 3.6 or 3.7 or 3.8 or 3.9*
+Examples
+~~~~~~~~~~~~
+
+This section shows examples of kinds of quantization with Intel® Neural compressor
 
-Usages
-~~~~~~
+FP8 Quantization
+^^^^^^^^^^^^^^^^
 
-Minor code changes are required for users to get started with Intel® Neural Compressor
-quantization API. Both PyTorch fx graph mode and eager mode are supported.
+**FP8 Quantization** is supported by Intel® Gaudi®2&3 AI Accelerator (HPU). To prepare the environment, please refer to `Intel® Gaudi® Documentation <https://docs.habana.ai/en/latest/index.html>`_.
 
-Intel® Neural Compressor takes a FP32 model and a yaml configuration file as inputs.
-To construct the quantization process, users can either specify the below settings via
-the yaml configuration file or python APIs:
+Run the example,
 
-1. Calibration Dataloader (Needed for static quantization)
-2. Evaluation Dataloader
-3. Evaluation Metric
+.. code-block:: python
 
-Intel® Neural Compressor supports some popular dataloaders and evaluation metrics. For
-how to configure them in yaml configuration file, user could refer to `Built-in Datasets
-<https://github.com/intel/neural-compressor/blob/master/docs/dataset.md>`_.
+    # FP8 Quantization Example
+    from neural_compressor.torch.quantization import (
+        FP8Config,
+        prepare,
+        convert,
+    )
 
-If users want to use a self-developed dataloader or evaluation metric, Intel® Neural
-Compressor supports this by the registration of customized dataloader/metric using python code.
+    import torch
+    import torchvision.models as models
 
-For the yaml configuration file format please refer to `yaml template
-<https://github.com/intel/neural-compressor/blob/master/neural_compressor/template/ptq.yaml>`_.
+    # Load a pre-trained ResNet18 model
+    model = models.resnet18()
 
-The code changes that are required for *Intel® Neural Compressor* are highlighted with
-comments in the line above.
+    # Configure FP8 quantization
+    qconfig = FP8Config(fp8_config="E4M3")
+    model = prepare(model, qconfig)
 
-Model
-^^^^^
+    # Perform calibration (replace with actual calibration data)
+    calibration_data = torch.randn(1, 3, 224, 224).to("hpu")
+    model(calibration_data)
 
-In this tutorial, the LeNet model is used to demonstrate how to deal with *Intel® Neural Compressor*.
+    # Convert the model to FP8
+    model = convert(model)
 
-.. code-block:: python3
+    # Perform inference
+    input_data = torch.randn(1, 3, 224, 224).to("hpu")
+    output = model(input_data).to("cpu")
+    print(output)
 
-    # main.py
-    import torch
-    import torch.nn as nn
-    import torch.nn.functional as F
-
-    # LeNet Model definition
-    class Net(nn.Module):
-        def __init__(self):
-            super(Net, self).__init__()
-            self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
-            self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
-            self.conv2_drop = nn.Dropout2d()
-            self.fc1 = nn.Linear(320, 50)
-            self.fc1_drop = nn.Dropout()
-            self.fc2 = nn.Linear(50, 10)
-
-        def forward(self, x):
-            x = F.relu(F.max_pool2d(self.conv1(x), 2))
-            x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
-            x = x.reshape(-1, 320)
-            x = F.relu(self.fc1(x))
-            x = self.fc1_drop(x)
-            x = self.fc2(x)
-            return F.log_softmax(x, dim=1)
-
-    model = Net()
-    model.load_state_dict(torch.load('./lenet_mnist_model.pth', weights_only=True))
-
-The pretrained model weight `lenet_mnist_model.pth` comes from
-`here <https://drive.google.com/drive/folders/1fn83DF14tWmit0RTKWRhPq5uVXt73e0h?usp=sharing>`_.
-
-Accuracy driven quantization
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Intel® Neural Compressor supports accuracy-driven automatic tuning to generate the optimal
-int8 model which meets a predefined accuracy goal.
-
-Below is an example of how to quantize a simple network on PyTorch
-`FX graph mode <https://pytorch.org/docs/stable/fx.html>`_ by auto-tuning.
-
-.. code-block:: yaml
-
-    # conf.yaml
-    model:
-        name: LeNet
-        framework: pytorch_fx
-
-    evaluation:
-        accuracy:
-            metric:
-                topk: 1
-
-    tuning:
-      accuracy_criterion:
-        relative: 0.01
-
-.. code-block:: python3
-
-    # main.py
-    model.eval()
-
-    from torchvision import datasets, transforms
-    test_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('./data', train=False, download=True,
-                       transform=transforms.Compose([
-                           transforms.ToTensor(),
-                       ])),
-        batch_size=1)
-
-    # launch code for Intel® Neural Compressor
-    from neural_compressor.experimental import Quantization
-    quantizer = Quantization("./conf.yaml")
-    quantizer.model = model
-    quantizer.calib_dataloader = test_loader
-    quantizer.eval_dataloader = test_loader
-    q_model = quantizer()
-    q_model.save('./output')
-
-In the `conf.yaml` file, the built-in metric `top1` of Intel® Neural Compressor is specified as
-the evaluation method, and `1%` relative accuracy loss is set as the accuracy target for auto-tuning.
-Intel® Neural Compressor will traverse all possible quantization config combinations on per-op level
-to find out the optimal int8 model that reaches the predefined accuracy target.
-
-Besides those built-in metrics, Intel® Neural Compressor also supports customized metric through
-python code:
-
-.. code-block:: yaml
-
-    # conf.yaml
-    model:
-        name: LeNet
-        framework: pytorch_fx
-
-    tuning:
-        accuracy_criterion:
-            relative: 0.01
-
-.. code-block:: python3
-
-    # main.py
-    model.eval()
-
-    from torchvision import datasets, transforms
-    test_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('./data', train=False, download=True,
-                       transform=transforms.Compose([
-                           transforms.ToTensor(),
-                       ])),
-        batch_size=1)
-
-    # define a customized metric
-    class Top1Metric(object):
-        def __init__(self):
-            self.correct = 0
-        def update(self, output, label):
-            pred = output.argmax(dim=1, keepdim=True)
-            self.correct += pred.eq(label.view_as(pred)).sum().item()
-        def reset(self):
-            self.correct = 0
-        def result(self):
-            return 100. * self.correct / len(test_loader.dataset)
-
-    # launch code for Intel® Neural Compressor
-    from neural_compressor.experimental import Quantization
-    quantizer = Quantization("./conf.yaml")
-    quantizer.model = model
-    quantizer.calib_dataloader = test_loader
-    quantizer.eval_dataloader = test_loader
-    quantizer.metric = Top1Metric()
-    q_model = quantizer()
-    q_model.save('./output')
-
-In the above example, a `class` which contains `update()` and `result()` function is implemented
-to record per mini-batch result and calculate final accuracy at the end.
-
-Quantization aware training
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Besides post-training static quantization and post-training dynamic quantization, Intel® Neural
-Compressor supports quantization-aware training with an accuracy-driven automatic tuning mechanism.
-
-Below is an example of how to do quantization aware training on a simple network on PyTorch
-`FX graph mode <https://pytorch.org/docs/stable/fx.html>`_.
-
-.. code-block:: yaml
-
-    # conf.yaml
-    model:
-        name: LeNet
-        framework: pytorch_fx
-
-    quantization:
-        approach: quant_aware_training
-
-    evaluation:
-        accuracy:
-            metric:
-                topk: 1
-
-    tuning:
-        accuracy_criterion:
-            relative: 0.01
-
-.. code-block:: python3
-
-    # main.py
-    model.eval()
-
-    from torchvision import datasets, transforms
-    train_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('./data', train=True, download=True,
-                       transform=transforms.Compose([
-                           transforms.ToTensor(),
-                           transforms.Normalize((0.1307,), (0.3081,))
-                       ])),
-        batch_size=64, shuffle=True)
-    test_loader = torch.utils.data.DataLoader(
-        datasets.MNIST('./data', train=False, download=True,
-                       transform=transforms.Compose([
-                           transforms.ToTensor(),
-                           transforms.Normalize((0.1307,), (0.3081,))
-                       ])),
-        batch_size=1)
-
-    import torch.optim as optim
-    optimizer = optim.SGD(model.parameters(), lr=0.0001, momentum=0.1)
-
-    def training_func(model):
-        model.train()
-        for epoch in range(1, 3):
-            for batch_idx, (data, target) in enumerate(train_loader):
-                optimizer.zero_grad()
-                output = model(data)
-                loss = F.nll_loss(output, target)
-                loss.backward()
-                optimizer.step()
-                print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
-                      epoch, batch_idx * len(data), len(train_loader.dataset),
-                      100. * batch_idx / len(train_loader), loss.item()))
-
-    # launch code for Intel® Neural Compressor
-    from neural_compressor.experimental import Quantization
-    quantizer = Quantization("./conf.yaml")
-    quantizer.model = model
-    quantizer.q_func = training_func
-    quantizer.eval_dataloader = test_loader
-    q_model = quantizer()
-    q_model.save('./output')
-
-Performance only quantization
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Intel® Neural Compressor supports directly yielding int8 model with dummy dataset for the
-performance benchmarking purpose.
-
-Below is an example of how to quantize a simple network on PyTorch
-`FX graph mode <https://pytorch.org/docs/stable/fx.html>`_ with a dummy dataset.
-
-.. code-block:: yaml
-
-    # conf.yaml
-    model:
-        name: lenet
-        framework: pytorch_fx
-
-.. code-block:: python3
-
-    # main.py
-    model.eval()
-
-    # launch code for Intel® Neural Compressor
-    from neural_compressor.experimental import Quantization, common
-    from neural_compressor.experimental.data.datasets.dummy_dataset import DummyDataset
-    quantizer = Quantization("./conf.yaml")
-    quantizer.model = model
-    quantizer.calib_dataloader = common.DataLoader(DummyDataset([(1, 1, 28, 28)]))
-    q_model = quantizer()
-    q_model.save('./output')
-
-Quantization outputs
-~~~~~~~~~~~~~~~~~~~~
-
-Users could know how many ops get quantized from log printed by Intel® Neural Compressor
-like below:
-
-::
-
-    2021-12-08 14:58:35 [INFO] |********Mixed Precision Statistics*******|
-    2021-12-08 14:58:35 [INFO] +------------------------+--------+-------+
-    2021-12-08 14:58:35 [INFO] |        Op Type         | Total  |  INT8 |
-    2021-12-08 14:58:35 [INFO] +------------------------+--------+-------+
-    2021-12-08 14:58:35 [INFO] |  quantize_per_tensor   |   2    |   2   |
-    2021-12-08 14:58:35 [INFO] |         Conv2d         |   2    |   2   |
-    2021-12-08 14:58:35 [INFO] |       max_pool2d       |   1    |   1   |
-    2021-12-08 14:58:35 [INFO] |          relu          |   1    |   1   |
-    2021-12-08 14:58:35 [INFO] |       dequantize       |   2    |   2   |
-    2021-12-08 14:58:35 [INFO] |       LinearReLU       |   1    |   1   |
-    2021-12-08 14:58:35 [INFO] |         Linear         |   1    |   1   |
-    2021-12-08 14:58:35 [INFO] +------------------------+--------+-------+
-
-The quantized model will be generated under `./output` directory, in which there are two files:
-1. best_configure.yaml
-2. best_model_weights.pt
-
-The first file contains the quantization configurations of each op, the second file contains
-int8 weights and zero point and scale info of activations.
-
-Deployment
-~~~~~~~~~~
-
-Users could use the below code to load quantized model and then do inference or performance benchmark.
-
-.. code-block:: python3
-
-    from neural_compressor.utils.pytorch import load
-    int8_model = load('./output', model)
+..
+
+Weight-only Quantization
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+**Weight-only Quantization** is also supported on Intel® Gaudi®2&3 AI Accelerator. The quantized model could be loaded as below.
+
+.. code-block:: python
+
+    from neural_compressor.torch.quantization import load
+
+    # The model name comes from HuggingFace Model Hub.
+    model_name = "TheBloke/Llama-2-7B-GPTQ"
+    model = load(
+        model_name_or_path=model_name,
+        format="huggingface",
+        device="hpu",
+        torch_dtype=torch.bfloat16,
+    )
+..
+
+**Note:** Intel Neural Compressor will convert the model format from auto-gptq to hpu format on the first load and save hpu_model.safetensors to the local cache directory for the next load. So it may take a while to load for the first time.
+
+Static Quantization with PT2E Backend
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The PT2E path uses ``torch.dynamo`` to capture the eager model into an FX graph model, and then inserts the observers and Q/QD pairs on it. Finally it uses the ``torch.compile`` to perform the pattern matching and replace the Q/DQ pairs with optimized quantized operators.
+
+There are four steps to perform W8A8 static quantization with PT2E backend: ``export``, ``prepare``, ``convert`` and ``compile``.
+
+.. code-block:: python
+
+   import torch
+   from neural_compressor.torch.export import export
+   from neural_compressor.torch.quantization import StaticQuantConfig, prepare, convert
+
+   # Prepare the float model and example inputs for export model
+   model = UserFloatModel()
+   example_inputs = ...
+
+   # Export eager model into FX graph model
+   exported_model = export(model=model, example_inputs=example_inputs)
+   # Quantize the model
+   quant_config = StaticQuantConfig()
+   prepared_model = prepare(exported_model, quant_config=quant_config)
+   # Calibrate
+   run_fn(prepared_model)
+   q_model = convert(prepared_model)
+   # Compile the quantized model and replace the Q/DQ pattern with Q-operator
+   from torch._inductor import config
+
+   config.freezing = True
+   opt_model = torch.compile(q_model)
+..
+
+Accuracy-driven Tuning
+^^^^^^^^^^^^^^^^^^^^^^
+
+To leverage accuracy-driven automatic tuning, a specified tuning space is necessary. The ``autotune`` iterates the tuning space and applies the configuration on given high-precision model then records and compares its evaluation result with the baseline. The tuning process stops when meeting the exit policy.
+
+
+.. code-block:: python
+
+   from neural_compressor.torch.quantization import RTNConfig, TuningConfig, autotune
+
+
+   def eval_fn(model) -> float:
+       return ...
+
+
+   tune_config = TuningConfig(
+       config_set=RTNConfig(use_sym=[False, True], group_size=[32, 128]),
+       tolerable_loss=0.2,
+       max_trials=10,
+   )
+   q_model = autotune(model, tune_config=tune_config, eval_fn=eval_fn)
+..
 
 Tutorials
 ---------
 
-Please visit `Intel® Neural Compressor Github repo <https://github.com/intel/neural-compressor>`_
-for more tutorials.
+More detailed tutorials are available in the official Intel® Neural Compressor `doc <https://intel.github.io/neural-compressor/latest/docs/source/Welcome.html>`_.