Skip to content

Commit 003ea5e

Browse files
authored
Sync index.rst, api_doc.rst, blogs_publications.md and features.rst, (#2118)
to cover CPU & GPU merged content
1 parent b3ea9a6 commit 003ea5e

9 files changed

+188
-657
lines changed

docs/images/Intel_Extension_for_PyTorch_Architecture.svg

Lines changed: 0 additions & 289 deletions
This file was deleted.
44.7 KB
Loading
-48.8 KB
Loading
31.8 KB
Loading

docs/images/intel_extension_for_pytorch_structure_gpu.svg

Lines changed: 0 additions & 282 deletions
This file was deleted.

docs/index.rst

Lines changed: 11 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -11,43 +11,35 @@ Intel® Extension for PyTorch* provides optimizations for both eager mode and gr
1111

1212
The extension can be loaded as a Python module for Python programs or linked as a C++ library for C++ programs. In Python scripts users can enable it dynamically by importing `intel_extension_for_pytorch`.
1313

14-
-------------------------------------
14+
Intel® Extension for PyTorch* is structured as shown in the following figure:
1515

16-
Intel® Extension for PyTorch* for CPU is structured as shown in the following figure:
17-
18-
.. figure:: ./images/intel_extension_for_pytorch_structure_cpu.png
16+
.. figure:: ./images/intel_extension_for_pytorch_structure.png
1917
:width: 800
2018
:align: center
21-
:alt: Structure of Intel® Extension for PyTorch* for CPU
22-
19+
:alt: Architecture of Intel® Extension for PyTorch*
2320

24-
PyTorch components are depicted with white boxes while Intel Extensions are with blue boxes. Extra performance of the extension is delivered via both custom addons and overriding existing PyTorch components. In eager mode, the PyTorch frontend is extended with custom Python modules (such as fusion modules), optimal optimizers and INT8 quantization API. Further performance boosting is available by converting the eager-mode model into graph mode via the extended graph fusion passes. Intel® Extension for PyTorch* dispatches the operators into their underlying kernels automatically based on ISA that it detects and leverages vectorization and matrix acceleration units available in Intel hardware, as much as possible. oneDNN library is used for computation intensive operations. Intel Extension for PyTorch runtime extension brings better efficiency with finer-grained thread runtime control and weight sharing.
21+
|
2522
26-
Intel® Extension for PyTorch* for CPU has been released as an open–source project at `Github master branch <https://github.com/intel/intel-extension-for-pytorch/tree/master>`_. Check `CPU tutorial <https://intel.github.io/intel-extension-for-pytorch/cpu/latest/>`_ for detailed information of Intel® Extension for PyTorch* for Intel® CPUs.
23+
Optimizations for both eager mode and graph mode contribute to extra performance accelerations with the extension. In eager mode, the PyTorch frontend is extended with custom Python modules (such as fusion modules), optimal optimizers, and INT8 quantization APIs. Further performance boost is available by converting the eager-mode model into graph mode via extended graph fusion passes. In the graph mode, the fusions reduce operator/kernel invocation overheads, and thus increase performance. On CPU, Intel® Extension for PyTorch* dispatches the operators into their underlying kernels automatically based on ISA that it detects and leverages vectorization and matrix acceleration units available on Intel hardware. Intel® Extension for PyTorch* runtime extension brings better efficiency with finer-grained thread runtime control and weight sharing. On GPU, optimized operators and kernels are implemented and registered through PyTorch dispatching mechanism. These operators and kernels are accelerated from native vectorization feature and matrix calculation feature of Intel GPU hardware. Intel® Extension for PyTorch* for GPU utilizes the `DPC++ <https://github.com/intel/llvm#oneapi-dpc-compiler>`_ compiler that supports the latest `SYCL* <https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html>`_ standard and also a number of extensions to the SYCL* standard, which can be found in the `sycl/doc/extensions <https://github.com/intel/llvm/tree/sycl/sycl/doc/extensions>`_ directory.
2724

28-
-------------------------------------
25+
.. note:: GPU features are not included in CPU only packages.
2926

30-
Intel® Extension for PyTorch* for GPU is structured as shown in the following figure:
31-
32-
.. figure:: ./images/intel_extension_for_pytorch_structure_gpu.svg
33-
:width: 800
34-
:align: center
35-
:alt: Architecture of Intel® Extension for PyTorch* for GPU
36-
37-
PyTorch components are depicted with white boxes and Intel extensions are with blue boxes. Extra performance of the extension comes from optimizations for both eager mode and graph mode. In eager mode, the PyTorch frontend is extended with custom Python modules (such as fusion modules), optimal optimizers, and INT8 quantization API. Further performance boosting is available by converting the eager-mode model into graph mode via extended graph fusion passes. On GPU, optimized operators and kernels are implemented and registered through PyTorch dispatching mechanism. These operators and kernels are accelerated from native vectorization feature and matrix calculation feature of Intel GPU hardware. In graph mode, further operator fusions are supported to reduce operator/kernel invocation overheads, and thus increase performance.
38-
39-
Intel® Extension for PyTorch* for GPU utilizes the `DPC++ <https://github.com/intel/llvm#oneapi-dpc-compiler>`_ compiler that supports the latest `SYCL* <https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html>`_ standard and also a number of extensions to the SYCL* standard, which can be found in the `sycl/doc/extensions <https://github.com/intel/llvm/tree/sycl/sycl/doc/extensions>`_ directory. Intel® Extension for PyTorch* also integrates `oneDNN <https://github.com/oneapi-src/oneDNN>`_ and `oneMKL <https://github.com/oneapi-src/oneMKL>`_ libraries and provides kernels based on that. The oneDNN library is used for computation intensive operations. The oneMKL library is used for fundamental mathematical operations.
27+
Intel® Extension for PyTorch* for CPU has been released as an open–source project at `Github master branch <https://github.com/intel/intel-extension-for-pytorch/tree/master>`_. Check `CPU tutorial <https://intel.github.io/intel-extension-for-pytorch/cpu/latest/>`_ for detailed information of Intel® Extension for PyTorch* for Intel® CPUs.
4028

4129
Intel® Extension for PyTorch* for GPU has been released as an open–source project on `GitHub xpu-master branch <https://github.com/intel/intel-extension-for-pytorch/tree/xpu-master>`_. Check `GPU tutorial <https://intel.github.io/intel-extension-for-pytorch/xpu/latest/>`_ for detailed information of Intel® Extension for PyTorch* for Intel® GPUs.
4230

4331
.. toctree::
4432
:hidden:
4533
:maxdepth: 1
4634

35+
tutorials/getting_started
4736
tutorials/features
4837
tutorials/releases
4938
tutorials/installation
5039
tutorials/examples
5140
tutorials/api_doc
41+
tutorials/performance_tuning
42+
tutorials/technical_details
43+
tutorials/blogs_publications
5244
tutorials/contribution
5345
tutorials/license

docs/tutorials/api_doc.rst

Lines changed: 52 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,21 @@
11
API Documentation
22
#################
33

4-
General
5-
*******
4+
Device-Agnostic
5+
***************
66

77
.. currentmodule:: intel_extension_for_pytorch
88
.. autofunction:: optimize
9+
.. autofunction:: get_fp32_math_mode
10+
.. autofunction:: set_fp32_math_mode
11+
.. autoclass:: verbose
12+
13+
GPU-Specific
14+
************
15+
16+
Miscellaneous
17+
=============
18+
919
.. currentmodule:: intel_extension_for_pytorch.xpu
1020
.. StreamContext
1121
.. can_device_access_peer
@@ -31,10 +41,8 @@ General
3141
.. autofunction:: stream
3242
.. autofunction:: synchronize
3343

34-
35-
3644
Random Number Generator
37-
***********************
45+
=======================
3846

3947
.. currentmodule:: intel_extension_for_pytorch.xpu
4048
.. autofunction:: get_rng_state
@@ -47,10 +55,8 @@ Random Number Generator
4755
.. autofunction:: seed_all
4856
.. autofunction:: initial_seed
4957

50-
51-
5258
Streams and events
53-
******************
59+
==================
5460

5561
.. currentmodule:: intel_extension_for_pytorch.xpu
5662
.. autoclass:: Stream
@@ -60,7 +66,7 @@ Streams and events
6066
:members:
6167

6268
Memory management
63-
*****************
69+
=================
6470

6571
.. currentmodule:: intel_extension_for_pytorch.xpu
6672
.. autofunction:: empty_cache
@@ -82,27 +88,50 @@ Memory management
8288
.. caching_allocator_alloc
8389
.. caching_allocator_delete
8490
85-
86-
8791
.. autofunction:: memory_stats_as_nested_dict
8892
.. autofunction:: reset_accumulated_memory_stats
8993

90-
Other
91-
*****
92-
93-
.. currentmodule:: intel_extension_for_pytorch.xpu
94-
.. autofunction:: get_fp32_math_mode
95-
.. autofunction:: set_fp32_math_mode
96-
97-
98-
.. .. automodule:: intel_extension_for_pytorch.quantization
99-
.. :members:
100-
10194
C++ API
102-
*******
95+
=======
10396

10497
.. doxygenenum:: xpu::FP32_MATH_MODE
10598

10699
.. doxygenfunction:: xpu::set_fp32_math_mode
107100

108101
.. doxygenfunction:: xpu::get_queue_from_stream
102+
103+
104+
CPU-Specific
105+
************
106+
107+
Miscellaneous
108+
=============
109+
110+
.. currentmodule:: intel_extension_for_pytorch
111+
.. autofunction:: enable_onednn_fusion
112+
113+
Quantization
114+
============
115+
116+
.. automodule:: intel_extension_for_pytorch.quantization
117+
.. autofunction:: prepare
118+
.. autofunction:: convert
119+
120+
Experimental API, introduction is avaiable at `feature page <./features/int8_recipe_tuning_api.md>`_.
121+
122+
.. autofunction:: autotune
123+
124+
CPU Runtime
125+
===========
126+
127+
.. automodule:: intel_extension_for_pytorch.cpu.runtime
128+
.. autofunction:: is_runtime_ext_enabled
129+
.. autoclass:: CPUPool
130+
.. autoclass:: pin
131+
.. autoclass:: MultiStreamModuleHint
132+
.. autoclass:: MultiStreamModule
133+
.. autoclass:: Task
134+
.. autofunction:: get_core_list_of_node_id
135+
136+
.. .. automodule:: intel_extension_for_pytorch.quantization
137+
.. :members:
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Blogs & Publications
2+
====================
3+
4+
* [Accelerating PyTorch with Intel® Extension for PyTorch\*](https://medium.com/pytorch/accelerating-pytorch-with-intel-extension-for-pytorch-3aef51ea3722)
5+
* [Intel and Facebook Accelerate PyTorch Performance with 3rd Gen Intel® Xeon® Processors and Intel® Deep Learning Boost’s new BFloat16 capability](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/intel-facebook-boost-bfloat16.html)
6+
* [Accelerate PyTorch with the extension and oneDNN using Intel BF16 Technology](https://medium.com/pytorch/accelerate-pytorch-with-ipex-and-onednn-using-intel-bf16-technology-dca5b8e6b58f)
7+
* *Note*: APIs mentioned in it are deprecated.
8+
* [Scaling up BERT-like model Inference on modern CPU - Part 1 by the launcher of the extension](https://huggingface.co/blog/bert-cpu-scaling-part-1)
9+
* [KT Optimizes Performance for Personalized Text-to-Speech](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/KT-Optimizes-Performance-for-Personalized-Text-to-Speech/post/1337757)

0 commit comments

Comments
 (0)