You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To align with stock PyTorch feature definition concept, we will use Prototype, Beta, Stable to replace experimental and product feature definition in documentation.
| Import Intel® Extension for PyTorch\*|`import intel_extension_for_pytorch as ipex`|
11
-
| Capture a Verbose Log (Command Prompt) |`export ONEDNN_VERBOSE=1`|
12
-
| Optimization During Training |`model = ...`<br>`optimizer = ...`<br>`model.train()`<br>`model, optimizer = ipex.optimize(model, optimizer=optimizer)`|
13
-
| Optimization During Inference |`model = ...`<br>`model.eval()`<br>`model = ipex.optimize(model)`|
14
-
| Optimization Using the Low-Precision Data Type bfloat16 <br>During Training (Default FP32) |`model = ...`<br>`optimizer = ...`<br>`model.train()`<br/><br/>`model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.bfloat16)`<br/><br/>`with torch.no_grad():`<br>` with torch.cpu.amp.autocast():`<br>` model(data)`|
15
-
| Optimization Using the Low-Precision Data Type bfloat16 <br>During Inference (Default FP32) | `model = ...`<br>`model.eval()`<br/><br/>`model = ipex.optimize(model, dtype=torch.bfloat16)`<br/><br/>`with torch.cpu.amp.autocast():`<br>` model(data)`
| Import Intel® Extension for PyTorch\*|`import intel_extension_for_pytorch as ipex`|
11
+
| Capture a Verbose Log (Command Prompt) |`export ONEDNN_VERBOSE=1`|
12
+
| Optimization During Training |`model = ...`<br>`optimizer = ...`<br>`model.train()`<br>`model, optimizer = ipex.optimize(model, optimizer=optimizer)`|
13
+
| Optimization During Inference |`model = ...`<br>`model.eval()`<br>`model = ipex.optimize(model)`|
14
+
| Optimization Using the Low-Precision Data Type bfloat16 <br>During Training (Default FP32) |`model = ...`<br>`optimizer = ...`<br>`model.train()`<br/><br/>`model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.bfloat16)`<br/><br/>`with torch.no_grad():`<br>` with torch.cpu.amp.autocast():`<br>` model(data)`|
15
+
| Optimization Using the Low-Precision Data Type bfloat16 <br>During Inference (Default FP32) | `model = ...`<br>`model.eval()`<br/><br/>`model = ipex.optimize(model, dtype=torch.bfloat16)`<br/><br/>`with torch.cpu.amp.autocast():`<br>` model(data)`
Copy file name to clipboardExpand all lines: docs/tutorials/features.rst
+14-13Lines changed: 14 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,7 +48,7 @@ Quantization
48
48
49
49
Intel® Extension for PyTorch* provides built-in INT8 quantization recipes to deliver good statistical accuracy for most popular DL workloads including CNN, NLP and recommendation models on CPU side. On top of that, if users would like to tune for a higher accuracy than what the default recipe provides, a recipe tuning API powered by Intel® Neural Compressor is provided for users to try.
50
50
51
-
Check more detailed information for `INT8 Quantization [CPU] <features/int8_overview.md>`_ and `INT8 recipe tuning API guide (Experimental, *NEW feature in 1.13.0* on CPU) <features/int8_recipe_tuning_api.md>`_ on CPU side.
51
+
Check more detailed information for `INT8 Quantization [CPU] <features/int8_overview.md>`_ and `INT8 recipe tuning API guide (Prototype, *NEW feature in 1.13.0* on CPU) <features/int8_recipe_tuning_api.md>`_ on CPU side.
52
52
53
53
Check more detailed information for `INT8 Quantization [XPU] <features/int8_overview_xpu.md>`_.
54
54
@@ -68,9 +68,9 @@ On Intel® GPUs, Intel® Extension for PyTorch* also provides INT4 and FP8 Quant
68
68
Distributed Training
69
69
--------------------
70
70
71
-
To meet demands of large scale model training over multiple devices, distributed training on Intel® GPUs and CPUs are supported. Two alternative methodologies are available. Users can choose either to use PyTorch native distributed training module, `Distributed Data Parallel (DDP) <https://pytorch.org/docs/stable/notes/ddp.html>`_, with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support via `Intel® oneCCL Bindings for PyTorch (formerly known as torch_ccl) <https://github.com/intel/torch-ccl>`_ or use Horovod with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support (Experimental).
71
+
To meet demands of large scale model training over multiple devices, distributed training on Intel® GPUs and CPUs are supported. Two alternative methodologies are available. Users can choose either to use PyTorch native distributed training module, `Distributed Data Parallel (DDP) <https://pytorch.org/docs/stable/notes/ddp.html>`_, with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support via `Intel® oneCCL Bindings for PyTorch (formerly known as torch_ccl) <https://github.com/intel/torch-ccl>`_ or use Horovod with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support (Prototype).
72
72
73
-
For more detailed information, check `DDP <features/DDP.md>`_ and `Horovod (Experimental) <features/horovod.md>`_.
73
+
For more detailed information, check `DDP <features/DDP.md>`_ and `Horovod (Prototype) <features/horovod.md>`_.
74
74
75
75
.. toctree::
76
76
:hidden:
@@ -149,7 +149,7 @@ For more detailed information, check `Inductor <features/torch_compile_gpu.md>`_
149
149
150
150
features/torch_compile_gpu
151
151
152
-
Legacy Profiler Tool (Experimental)
152
+
Legacy Profiler Tool (Prototype)
153
153
-----------------------------------
154
154
155
155
The legacy profiler tool is an extension of PyTorch* legacy profiler for profiling operators' overhead on XPU devices. With this tool, you can get the information in many fields of the run models or code scripts. Build Intel® Extension for PyTorch* with profiler support as default and enable this tool by adding a `with` statement before the code segment.
@@ -162,7 +162,7 @@ For more detailed information, check `Legacy Profiler Tool <features/profiler_le
162
162
163
163
features/profiler_legacy
164
164
165
-
Simple Trace Tool (Experimental)
165
+
Simple Trace Tool (Prototype)
166
166
--------------------------------
167
167
168
168
Simple Trace is a built-in debugging tool that lets you control printing out the call stack for a piece of code. Once enabled, it can automatically print out verbose messages of called operators in a stack format with indenting to distinguish the context.
@@ -175,7 +175,7 @@ For more detailed information, check `Simple Trace Tool <features/simple_trace.m
175
175
176
176
features/simple_trace
177
177
178
-
Kineto Supported Profiler Tool (Experimental)
178
+
Kineto Supported Profiler Tool (Prototype)
179
179
---------------------------------------------
180
180
181
181
The Kineto supported profiler tool is an extension of PyTorch\* profiler for profiling operators' executing time cost on GPU devices. With this tool, you can get information in many fields of the run models or code scripts. Build Intel® Extension for PyTorch\* with Kineto support as default and enable this tool using the `with` statement before the code segment.
@@ -189,10 +189,10 @@ For more detailed information, check `Profiler Kineto <features/profiler_kineto.
189
189
features/profiler_kineto
190
190
191
191
192
-
Compute Engine (Experimental feature for debug)
192
+
Compute Engine (Prototype feature for debug)
193
193
-----------------------------------------------
194
194
195
-
Compute engine is a experimental feature which provides the capacity to choose specific backend for operators with multiple implementations.
195
+
Compute engine is a prototype feature which provides the capacity to choose specific backend for operators with multiple implementations.
196
196
197
197
For more detailed information, check `Compute Engine <features/compute_engine.md>`_.
- Spawn asynchronous tasks from both Python and C++ frontend.
229
229
- Program core bindings for OpenMP threads from both Python and C++ frontend.
230
230
231
-
.. note:: Intel® Extension for PyTorch* Runtime extension is still in the experimental stage. The API is subject to change. More detailed descriptions are available in the `API Documentation <api_doc.html>`_.
231
+
.. note:: Intel® Extension for PyTorch* Runtime extension is still in the prototype stage. The API is subject to change. More detailed descriptions are available in the `API Documentation <api_doc.html>`_.
232
232
233
233
For more detailed information, check `Runtime Extension <features/runtime_extension.md>`_.
234
234
@@ -239,7 +239,7 @@ For more detailed information, check `Runtime Extension <features/runtime_extens
239
239
features/runtime_extension
240
240
241
241
242
-
Codeless Optimization (Experimental, *NEW feature in 1.13.\**)
242
+
Codeless Optimization (Prototype, *NEW feature in 1.13.\**)
This feature enables users to get performance benefits from Intel® Extension for PyTorch* without changing Python scripts. It hopefully eases the usage and has been verified working well with broad scope of models, though in few cases there could be small overhead comparing to applying optimizations with Intel® Extension for PyTorch* APIs.
@@ -253,7 +253,7 @@ For more detailed information, check `Codeless Optimization <features/codeless_o
253
253
features/codeless_optimization.md
254
254
255
255
256
-
Graph Capture (Experimental, *NEW feature in 1.13.0\**)
256
+
Graph Capture (Prototype, *NEW feature in 1.13.0\**)
Since graph mode is key for deployment performance, this feature automatically captures graphs based on set of technologies that PyTorch supports, such as TorchScript and TorchDynamo. Users won't need to learn and try different PyTorch APIs to capture graphs, instead, they can turn on a new boolean flag `--graph_mode` (default off) in `ipex.optimize` to get the best of graph optimization.
@@ -267,10 +267,10 @@ For more detailed information, check `Graph Capture <features/graph_capture.md>`
267
267
features/graph_capture
268
268
269
269
270
-
HyperTune (Experimental, *NEW feature in 1.13.0\**)
HyperTune is an experimental feature to perform hyperparameter/execution configuration searching. The searching is used in various areas such as optimization of hyperparameters of deep learning models. The searching is extremely useful in real situations when the number of hyperparameters, including configuration of script execution, and their search spaces are huge that manually tuning these hyperparameters/configuration is impractical and time consuming. Hypertune automates this process of execution configuration searching for the `launcher <performance_tuning/launch_script.md>`_ and Intel® Extension for PyTorch*.
273
+
HyperTune is an prototype feature to perform hyperparameter/execution configuration searching. The searching is used in various areas such as optimization of hyperparameters of deep learning models. The searching is extremely useful in real situations when the number of hyperparameters, including configuration of script execution, and their search spaces are huge that manually tuning these hyperparameters/configuration is impractical and time consuming. Hypertune automates this process of execution configuration searching for the `launcher <performance_tuning/launch_script.md>`_ and Intel® Extension for PyTorch*.
274
274
275
275
For more detailed information, check `HyperTune <features/hypertune.md>`_.
276
276
@@ -279,3 +279,4 @@ For more detailed information, check `HyperTune <features/hypertune.md>`_.
Copy file name to clipboardExpand all lines: docs/tutorials/features/codeless_optimization.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
Codeless Optimization (Experimental)
1
+
Codeless Optimization (Prototype)
2
2
====================================
3
3
4
4
This feature aims to get inference performance benefits from Intel® Extension for PyTorch\* without changing code in your python scripts, which can raise Out-of-Box (OOB) experience to get started with Intel® Extension for PyTorch\* easily. Users who already known how to apply optimizations with Intel® Extension for PyTorch\* APIs are not targeted for this feature, due to the inevitable overhead and limitations we mentioned below.
@@ -104,3 +104,4 @@ with torch.no_grad():
104
104
For 2 reasons:
105
105
* The auto graph mode support has already been included in `ipex.optimize` with graph first API in 1.13.
106
106
* Extra launch parameters and Monkey patches are needed to support above case. We will focus on the feasibility of first use case in TorchVision and HuggingFace workloads.
Copy file name to clipboardExpand all lines: docs/tutorials/features/horovod.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
Horovod with PyTorch (Experimental)
1
+
Horovod with PyTorch (Prototype)
2
2
===================================
3
3
4
4
Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make distributed deep learning fast and easy to use. Horovod core principles are based on MPI concepts such as size, rank, local rank, allreduce, allgather, broadcast, and alltoall. To use Horovod with PyTorch, you need to install Horovod with Pytorch first, and make specific change for Horovod in your training script.
0 commit comments