intel
diff --git a/‎docs/tutorials/api_doc.rst‎
Lines changed: 3 additions & 2 deletions b/‎docs/tutorials/api_doc.rst‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎docs/tutorials/cheat_sheet.md‎
Lines changed: 23 additions & 22 deletions b/‎docs/tutorials/cheat_sheet.md‎
Lines changed: 23 additions & 22 deletions
diff --git a/‎docs/tutorials/features.rst‎
Lines changed: 14 additions & 13 deletions b/‎docs/tutorials/features.rst‎
Lines changed: 14 additions & 13 deletions
diff --git a/‎docs/tutorials/features/advanced_configuration.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/tutorials/features/advanced_configuration.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/tutorials/features/codeless_optimization.md‎
Lines changed: 2 additions & 1 deletion b/‎docs/tutorials/features/codeless_optimization.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/tutorials/features/compute_engine.md‎
Lines changed: 2 additions & 1 deletion b/‎docs/tutorials/features/compute_engine.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/tutorials/features/float8.md‎
Lines changed: 2 additions & 1 deletion b/‎docs/tutorials/features/float8.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/tutorials/features/graph_capture.md‎
Lines changed: 2 additions & 1 deletion b/‎docs/tutorials/features/graph_capture.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/tutorials/features/horovod.md‎
Lines changed: 2 additions & 1 deletion b/‎docs/tutorials/features/horovod.md‎
Lines changed: 2 additions & 1 deletion
@@ -115,14 +115,14 @@ Miscellaneous
 .. currentmodule:: intel_extension_for_pytorch
 .. autofunction:: enable_onednn_fusion
 
-Quantization
+Quantization (Prototype)
 ============
 
 .. automodule:: intel_extension_for_pytorch.quantization
 .. autofunction:: prepare
 .. autofunction:: convert
 
-Experimental API, introduction is avaiable at `feature page <./features/int8_recipe_tuning_api.md>`_.
+Introduction is avaiable at `feature page <./features/int8_recipe_tuning_api.md>`_.
 
 .. autofunction:: autotune
 
@@ -140,3 +140,4 @@ CPU Runtime
 
 .. .. automodule:: intel_extension_for_pytorch.quantization
 ..    :members:
+
@@ -1,22 +1,23 @@
-Cheat Sheet
-===========
-
-Get started with Intel® Extension for PyTorch\* using the following commands:
-
-|Description    | Command |
-| -------- | ------- |
-| Basic CPU Installation | `python -m pip install intel_extension_for_pytorch`    |
-| Basic GPU Installation | `pip install torch==<version> -f https://developer.intel.com/ipex-whl-stable-xpu`<br>`pip install intel_extension_for_pytorch==<version> -f https://developer.intel.com/ipex-whl-stable-xpu`|
-| Import Intel® Extension for PyTorch\*    | `import intel_extension_for_pytorch as ipex`|
-| Capture a Verbose Log (Command Prompt)    | `export ONEDNN_VERBOSE=1`   |
-| Optimization During Training   | `model = ...`<br>`optimizer = ...`<br>`model.train()`<br>`model, optimizer = ipex.optimize(model, optimizer=optimizer)`|
-| Optimization During Inference  | `model = ...`<br>`model.eval()`<br>`model = ipex.optimize(model)`   |
-| Optimization Using the Low-Precision Data Type bfloat16 <br>During Training (Default FP32) | `model = ...`<br>`optimizer = ...`<br>`model.train()`<br/><br/>`model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.bfloat16)`<br/><br/>`with torch.no_grad():`<br>`    with torch.cpu.amp.autocast():`<br>`        model(data)`   |
-| Optimization Using the Low-Precision Data Type bfloat16 <br>During Inference (Default FP32) | `model = ...`<br>`model.eval()`<br/><br/>`model = ipex.optimize(model, dtype=torch.bfloat16)`<br/><br/>`with torch.cpu.amp.autocast():`<br>`    model(data)`
-| [Experimental] Fast BERT Optimization | `from transformers import BertModel`<br>`model = BertModel.from_pretrained("bert-base-uncased")`<br>`model.eval()`<br/><br/>`model = ipex.fast_bert(model, dtype=torch.bfloat16)`|
-| Run CPU Launch Script (Command Prompt): <br>Automate Configuration Settings for Performance | `ipexrun [knobs] <your_pytorch_script> [args]`|
-| [Experimental] Run HyperTune to perform hyperparameter/execution configuration search | `python -m intel_extension_for_pytorch.cpu.hypertune --conf-file <your_conf_file> <your_python_script> [args]`|
-| [Experimental] Enable Graph capture | `model = …`<br>`model.eval()`<br>`model = ipex.optimize(model, graph_mode=True)`|
-| Post-Training INT8 Quantization (Static)  | `model = …`<br>`model.eval()`<br>`data = …`<br/><br/>`qconfig = ipex.quantization.default_static_qconfig`<br/><br/>`prepared_model = ipex.quantization.prepare(model, qconfig, example_inputs=data, anyplace=False)`<br/><br/>`for d in calibration_data_loader():`<br>`  prepared_model(d)`<br/><br/>`converted_model = ipex.quantization.convert(prepared_model)`|
-| Post-Training INT8 Quantization (Dynamic) | `model = …`<br>`model.eval()`<br>`data = …`<br/><br/>`qconfig = ipex.quantization.default_dynamic_qconfig`<br/><br/>`prepared_model = ipex.quantization.prepare(model, qconfig, example_inputs=data)`<br/><br/>`converted_model = ipex.quantization.convert(prepared_model)` |
-| [Experimental] Post-Training INT8 Quantization (Tuning Recipe): | `model = …`<br>`model.eval()`<br>`data = …`<br/><br/>`qconfig = ipex.quantization.default_static_qconfig`<br/><br/>`prepared_model = ipex.quantization.prepare(model, qconfig, example_inputs=data, inplace=False)`<br/><br/>`tuned_model = ipex.quantization.autotune(prepared_model, calibration_data_loader, eval_function, sampling_sizes=[100],`<br>`    accuracy_criterion={'relative': .01}, tuning_time=0)`<br/><br/>`convert_model = ipex.quantization.convert(tuned_model)`|
+Cheat Sheet
+===========
+
+Get started with Intel® Extension for PyTorch\* using the following commands:
+
+|Description    | Command |
+| -------- | ------- |
+| Basic CPU Installation | `python -m pip install intel_extension_for_pytorch`    |
+| Basic GPU Installation | `pip install torch==<version> -f https://developer.intel.com/ipex-whl-stable-xpu`<br>`pip install intel_extension_for_pytorch==<version> -f https://developer.intel.com/ipex-whl-stable-xpu`|
+| Import Intel® Extension for PyTorch\*    | `import intel_extension_for_pytorch as ipex`|
+| Capture a Verbose Log (Command Prompt)    | `export ONEDNN_VERBOSE=1`   |
+| Optimization During Training   | `model = ...`<br>`optimizer = ...`<br>`model.train()`<br>`model, optimizer = ipex.optimize(model, optimizer=optimizer)`|
+| Optimization During Inference  | `model = ...`<br>`model.eval()`<br>`model = ipex.optimize(model)`   |
+| Optimization Using the Low-Precision Data Type bfloat16 <br>During Training (Default FP32) | `model = ...`<br>`optimizer = ...`<br>`model.train()`<br/><br/>`model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.bfloat16)`<br/><br/>`with torch.no_grad():`<br>`    with torch.cpu.amp.autocast():`<br>`        model(data)`   |
+| Optimization Using the Low-Precision Data Type bfloat16 <br>During Inference (Default FP32) | `model = ...`<br>`model.eval()`<br/><br/>`model = ipex.optimize(model, dtype=torch.bfloat16)`<br/><br/>`with torch.cpu.amp.autocast():`<br>`    model(data)`
+| [Prototype] Fast BERT Optimization | `from transformers import BertModel`<br>`model = BertModel.from_pretrained("bert-base-uncased")`<br>`model.eval()`<br/><br/>`model = ipex.fast_bert(model, dtype=torch.bfloat16)`|
+| Run CPU Launch Script (Command Prompt): <br>Automate Configuration Settings for Performance | `ipexrun [knobs] <your_pytorch_script> [args]`|
+| [Prototype] Run HyperTune to perform hyperparameter/execution configuration search | `python -m intel_extension_for_pytorch.cpu.hypertune --conf-file <your_conf_file> <your_python_script> [args]`|
+| [Prototype] Enable Graph capture | `model = …`<br>`model.eval()`<br>`model = ipex.optimize(model, graph_mode=True)`|
+| Post-Training INT8 Quantization (Static)  | `model = …`<br>`model.eval()`<br>`data = …`<br/><br/>`qconfig = ipex.quantization.default_static_qconfig`<br/><br/>`prepared_model = ipex.quantization.prepare(model, qconfig, example_inputs=data, anyplace=False)`<br/><br/>`for d in calibration_data_loader():`<br>`  prepared_model(d)`<br/><br/>`converted_model = ipex.quantization.convert(prepared_model)`|
+| Post-Training INT8 Quantization (Dynamic) | `model = …`<br>`model.eval()`<br>`data = …`<br/><br/>`qconfig = ipex.quantization.default_dynamic_qconfig`<br/><br/>`prepared_model = ipex.quantization.prepare(model, qconfig, example_inputs=data)`<br/><br/>`converted_model = ipex.quantization.convert(prepared_model)` |
+| [Prototype] Post-Training INT8 Quantization (Tuning Recipe): | `model = …`<br>`model.eval()`<br>`data = …`<br/><br/>`qconfig = ipex.quantization.default_static_qconfig`<br/><br/>`prepared_model = ipex.quantization.prepare(model, qconfig, example_inputs=data, inplace=False)`<br/><br/>`tuned_model = ipex.quantization.autotune(prepared_model, calibration_data_loader, eval_function, sampling_sizes=[100],`<br>`    accuracy_criterion={'relative': .01}, tuning_time=0)`<br/><br/>`convert_model = ipex.quantization.convert(tuned_model)`|
+
@@ -48,7 +48,7 @@ Quantization
 
 Intel® Extension for PyTorch* provides built-in INT8 quantization recipes to deliver good statistical accuracy for most popular DL workloads including CNN, NLP and recommendation models on CPU side. On top of that, if users would like to tune for a higher accuracy than what the default recipe provides, a recipe tuning API powered by Intel® Neural Compressor is provided for users to try.
 
-Check more detailed information for `INT8 Quantization [CPU] <features/int8_overview.md>`_ and `INT8 recipe tuning API guide (Experimental, *NEW feature in 1.13.0* on CPU) <features/int8_recipe_tuning_api.md>`_ on CPU side.
+Check more detailed information for `INT8 Quantization [CPU] <features/int8_overview.md>`_ and `INT8 recipe tuning API guide (Prototype, *NEW feature in 1.13.0* on CPU) <features/int8_recipe_tuning_api.md>`_ on CPU side.
 
 Check more detailed information for `INT8 Quantization [XPU] <features/int8_overview_xpu.md>`_. 
 
@@ -68,9 +68,9 @@ On Intel® GPUs, Intel® Extension for PyTorch* also provides INT4 and FP8 Quant
 Distributed Training
 --------------------
 
-To meet demands of large scale model training over multiple devices, distributed training on Intel® GPUs and CPUs are supported. Two alternative methodologies are available. Users can choose either to use PyTorch native distributed training module, `Distributed Data Parallel (DDP) <https://pytorch.org/docs/stable/notes/ddp.html>`_, with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support via `Intel® oneCCL Bindings for PyTorch (formerly known as torch_ccl) <https://github.com/intel/torch-ccl>`_ or use Horovod with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support (Experimental).
+To meet demands of large scale model training over multiple devices, distributed training on Intel® GPUs and CPUs are supported. Two alternative methodologies are available. Users can choose either to use PyTorch native distributed training module, `Distributed Data Parallel (DDP) <https://pytorch.org/docs/stable/notes/ddp.html>`_, with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support via `Intel® oneCCL Bindings for PyTorch (formerly known as torch_ccl) <https://github.com/intel/torch-ccl>`_ or use Horovod with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support (Prototype).
 
-For more detailed information, check `DDP <features/DDP.md>`_ and `Horovod (Experimental) <features/horovod.md>`_.
+For more detailed information, check `DDP <features/DDP.md>`_ and `Horovod (Prototype) <features/horovod.md>`_.
 
 .. toctree::
    :hidden:
@@ -149,7 +149,7 @@ For more detailed information, check `Inductor <features/torch_compile_gpu.md>`_
 
    features/torch_compile_gpu
 
-Legacy Profiler Tool (Experimental)
+Legacy Profiler Tool (Prototype)
 -----------------------------------
 
 The legacy profiler tool is an extension of PyTorch* legacy profiler for profiling operators' overhead on XPU devices. With this tool, you can get the information in many fields of the run models or code scripts. Build Intel® Extension for PyTorch* with profiler support as default and enable this tool by adding a `with` statement before the code segment.
@@ -162,7 +162,7 @@ For more detailed information, check `Legacy Profiler Tool <features/profiler_le
 
    features/profiler_legacy
 
-Simple Trace Tool (Experimental)
+Simple Trace Tool (Prototype)
 --------------------------------
 
 Simple Trace is a built-in debugging tool that lets you control printing out the call stack for a piece of code. Once enabled, it can automatically print out verbose messages of called operators in a stack format with indenting to distinguish the context. 
@@ -175,7 +175,7 @@ For more detailed information, check `Simple Trace Tool <features/simple_trace.m
 
    features/simple_trace
 
-Kineto Supported Profiler Tool (Experimental)
+Kineto Supported Profiler Tool (Prototype)
 ---------------------------------------------
 
 The Kineto supported profiler tool is an extension of PyTorch\* profiler for profiling operators' executing time cost on GPU devices. With this tool, you can get information in many fields of the run models or code scripts. Build Intel® Extension for PyTorch\* with Kineto support as default and enable this tool using the `with` statement before the code segment.
@@ -189,10 +189,10 @@ For more detailed information, check `Profiler Kineto <features/profiler_kineto.
    features/profiler_kineto
 
 
-Compute Engine (Experimental feature for debug)
+Compute Engine (Prototype feature for debug)
 -----------------------------------------------
 
-Compute engine is a experimental feature which provides the capacity to choose specific backend for operators with multiple implementations.
+Compute engine is a prototype feature which provides the capacity to choose specific backend for operators with multiple implementations.
 
 For more detailed information, check `Compute Engine <features/compute_engine.md>`_.
 
@@ -228,7 +228,7 @@ Intel® Extension for PyTorch* Runtime Extension provides PyTorch frontend APIs
 - Spawn asynchronous tasks from both Python and C++ frontend.
 - Program core bindings for OpenMP threads from both Python and C++ frontend.
 
-.. note:: Intel® Extension for PyTorch* Runtime extension is still in the experimental stage. The API is subject to change. More detailed descriptions are available in the `API Documentation <api_doc.html>`_.
+.. note:: Intel® Extension for PyTorch* Runtime extension is still in the prototype stage. The API is subject to change. More detailed descriptions are available in the `API Documentation <api_doc.html>`_.
 
 For more detailed information, check `Runtime Extension <features/runtime_extension.md>`_.
 
@@ -239,7 +239,7 @@ For more detailed information, check `Runtime Extension <features/runtime_extens
    features/runtime_extension
 
 
-Codeless Optimization (Experimental, *NEW feature in 1.13.\**)
+Codeless Optimization (Prototype, *NEW feature in 1.13.\**)
 --------------------------------------------------------------
 
 This feature enables users to get performance benefits from Intel® Extension for PyTorch* without changing Python scripts. It hopefully eases the usage and has been verified working well with broad scope of models, though in few cases there could be small overhead comparing to applying optimizations with Intel® Extension for PyTorch* APIs.
@@ -253,7 +253,7 @@ For more detailed information, check `Codeless Optimization <features/codeless_o
    features/codeless_optimization.md
 
 
-Graph Capture (Experimental, *NEW feature in 1.13.0\**)
+Graph Capture (Prototype, *NEW feature in 1.13.0\**)
 -------------------------------------------------------
 
 Since graph mode is key for deployment performance, this feature automatically captures graphs based on set of technologies that PyTorch supports, such as TorchScript and TorchDynamo. Users won't need to learn and try different PyTorch APIs to capture graphs, instead, they can turn on a new boolean flag `--graph_mode` (default off) in `ipex.optimize` to get the best of graph optimization.
@@ -267,10 +267,10 @@ For more detailed information, check `Graph Capture <features/graph_capture.md>`
    features/graph_capture
 
 
-HyperTune (Experimental, *NEW feature in 1.13.0\**)
+HyperTune (Prototype, *NEW feature in 1.13.0\**)
 ---------------------------------------------------
 
-HyperTune is an experimental feature to perform hyperparameter/execution configuration searching. The searching is used in various areas such as optimization of hyperparameters of deep learning models. The searching is extremely useful in real situations when the number of hyperparameters, including configuration of script execution, and their search spaces are huge that manually tuning these hyperparameters/configuration is impractical and time consuming. Hypertune automates this process of execution configuration searching for the `launcher <performance_tuning/launch_script.md>`_ and Intel® Extension for PyTorch*.
+HyperTune is an prototype feature to perform hyperparameter/execution configuration searching. The searching is used in various areas such as optimization of hyperparameters of deep learning models. The searching is extremely useful in real situations when the number of hyperparameters, including configuration of script execution, and their search spaces are huge that manually tuning these hyperparameters/configuration is impractical and time consuming. Hypertune automates this process of execution configuration searching for the `launcher <performance_tuning/launch_script.md>`_ and Intel® Extension for PyTorch*.
 
 For more detailed information, check `HyperTune <features/hypertune.md>`_.
 
@@ -279,3 +279,4 @@ For more detailed information, check `HyperTune <features/hypertune.md>`_.
    :maxdepth: 1
 
    features/hypertune
+
@@ -48,3 +48,4 @@ IPEX_VERBOSE=1 python ResNet50.py
 ```bash
 IPEX_VERBOSE=1 IPEX_FP32_MATH_MODE=TF32 python ResNet50.py
 ```
+
@@ -1,4 +1,4 @@
-Codeless Optimization (Experimental)
+Codeless Optimization (Prototype)
 ====================================
 
 This feature aims to get inference performance benefits from Intel® Extension for PyTorch\* without changing code in your python scripts, which can raise Out-of-Box (OOB) experience to get started with Intel® Extension for PyTorch\* easily. Users who already known how to apply optimizations with Intel® Extension for PyTorch\* APIs are not targeted for this feature, due to the inevitable overhead and limitations we mentioned below.
@@ -104,3 +104,4 @@ with torch.no_grad():
 For 2 reasons:
 * The auto graph mode support has already been included in `ipex.optimize` with graph first API in 1.13.
 * Extra launch parameters and Monkey patches are needed to support above case. We will focus on the feasibility of first use case in TorchVision and HuggingFace workloads. 
+
@@ -1,4 +1,4 @@
-Compute Engine (Experimental feature for debug)
+Compute Engine (Prototype feature for debug)
 ===============================================
 
 ## Introduction
@@ -58,3 +58,4 @@ Step4: If the compute engine designated by user is not implemented/available, ex
 `UpsampleNearest`: `ONEDNN`, `BASIC` [Recommend] 
 
     The `ONEDNN` engine is always chosen if output shape is divisible by the input shape.
+
@@ -1,4 +1,4 @@
-Float8 Data Type Support [GPU] (Experimental)
+Float8 Data Type Support [GPU] (Prototype)
 ============================================
 
 ## Float8 Data Type
@@ -42,3 +42,4 @@ with torch.xpu.amp.autocast(enabled=True, dtype=optimize_dtype):
                         labels=masked_lm_labels,
                         next_sentence_label=next_sentence_labels)
 ```
+
@@ -1,4 +1,4 @@
-Graph Capture (Experimental)
+Graph Capture (Prototype)
 ============================
 
 ### Feature Description
@@ -9,3 +9,4 @@ This feature automatically applies a combination of TorchScript trace technique
 
 [//]: # (marker_feature_graph_capture)
 [//]: # (marker_feature_graph_capture)
+
@@ -1,4 +1,4 @@
-Horovod with PyTorch (Experimental)
+Horovod with PyTorch (Prototype)
 ===================================
 
 Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make distributed deep learning fast and easy to use. Horovod core principles are based on MPI concepts such as size, rank, local rank, allreduce, allgather, broadcast, and alltoall. To use Horovod with PyTorch, you need to install Horovod with Pytorch first, and make specific change for Horovod in your training script.
@@ -107,3 +107,4 @@ Example:
                print('Train Epoch: {} [{}/{}]\tLoss: {}'.format(
                    epoch, batch_idx * len(data), len(train_sampler), loss.item()))
 
+
-Original file line number
+Diff line change
 ```bash
 IPEX_VERBOSE=1 IPEX_FP32_MATH_MODE=TF32 python ResNet50.py
 ```
++