intel
diff --git a/‎docs/images/Intel_Extension_for_PyTorch_Architecture.svg‎
Lines changed: 0 additions & 289 deletions b/‎docs/images/Intel_Extension_for_PyTorch_Architecture.svg‎
Lines changed: 0 additions & 289 deletions
diff --git a/‎docs/images/intel_extension_for_pytorch_structure.png‎
44.7 KB b/‎docs/images/intel_extension_for_pytorch_structure.png‎
44.7 KB
diff --git a/‎docs/images/intel_extension_for_pytorch_structure_cpu.png‎
-48.8 KB b/‎docs/images/intel_extension_for_pytorch_structure_cpu.png‎
-48.8 KB
diff --git a/‎docs/images/intel_extension_for_pytorch_structure_gpu.png‎
31.8 KB b/‎docs/images/intel_extension_for_pytorch_structure_gpu.png‎
31.8 KB
diff --git a/‎docs/images/intel_extension_for_pytorch_structure_gpu.svg‎
Lines changed: 0 additions & 282 deletions b/‎docs/images/intel_extension_for_pytorch_structure_gpu.svg‎
Lines changed: 0 additions & 282 deletions
diff --git a/‎docs/index.rst‎
Lines changed: 11 additions & 19 deletions b/‎docs/index.rst‎
Lines changed: 11 additions & 19 deletions
diff --git a/‎docs/tutorials/api_doc.rst‎
Lines changed: 52 additions & 23 deletions b/‎docs/tutorials/api_doc.rst‎
Lines changed: 52 additions & 23 deletions
diff --git a/‎docs/tutorials/blogs_publications.md‎
Lines changed: 9 additions & 0 deletions b/‎docs/tutorials/blogs_publications.md‎
Lines changed: 9 additions & 0 deletions
@@ -11,43 +11,35 @@ Intel® Extension for PyTorch* provides optimizations for both eager mode and gr
 
 The extension can be loaded as a Python module for Python programs or linked as a C++ library for C++ programs. In Python scripts users can enable it dynamically by importing `intel_extension_for_pytorch`.
 
--------------------------------------
+Intel® Extension for PyTorch* is structured as shown in the following figure:
 
-Intel® Extension for PyTorch* for CPU is structured as shown in the following figure:
-
-.. figure:: ./images/intel_extension_for_pytorch_structure_cpu.png
+.. figure:: ./images/intel_extension_for_pytorch_structure.png
   :width: 800
   :align: center
-  :alt: Structure of Intel® Extension for PyTorch* for CPU
-
+  :alt: Architecture of Intel® Extension for PyTorch*
 
-PyTorch components are depicted with white boxes while Intel Extensions are with blue boxes. Extra performance of the extension is delivered via both custom addons and overriding existing PyTorch components. In eager mode, the PyTorch frontend is extended with custom Python modules (such as fusion modules), optimal optimizers and INT8 quantization API. Further performance boosting is available by converting the eager-mode model into graph mode via the extended graph fusion passes. Intel® Extension for PyTorch* dispatches the operators into their underlying kernels automatically based on ISA that it detects and leverages vectorization and matrix acceleration units available in Intel hardware, as much as possible. oneDNN library is used for computation intensive operations. Intel Extension for PyTorch runtime extension brings better efficiency with finer-grained thread runtime control and weight sharing.
+|
 
-Intel® Extension for PyTorch* for CPU has been released as an open–source project at `Github master branch <https://github.com/intel/intel-extension-for-pytorch/tree/master>`_. Check `CPU tutorial <https://intel.github.io/intel-extension-for-pytorch/cpu/latest/>`_ for detailed information of Intel® Extension for PyTorch* for Intel® CPUs.
+Optimizations for both eager mode and graph mode contribute to extra performance accelerations with the extension. In eager mode, the PyTorch frontend is extended with custom Python modules (such as fusion modules), optimal optimizers, and INT8 quantization APIs. Further performance boost is available by converting the eager-mode model into graph mode via extended graph fusion passes. In the graph mode, the fusions reduce operator/kernel invocation overheads, and thus increase performance. On CPU, Intel® Extension for PyTorch* dispatches the operators into their underlying kernels automatically based on ISA that it detects and leverages vectorization and matrix acceleration units available on Intel hardware. Intel® Extension for PyTorch* runtime extension brings better efficiency with finer-grained thread runtime control and weight sharing. On GPU, optimized operators and kernels are implemented and registered through PyTorch dispatching mechanism. These operators and kernels are accelerated from native vectorization feature and matrix calculation feature of Intel GPU hardware. Intel® Extension for PyTorch* for GPU utilizes the `DPC++ <https://github.com/intel/llvm#oneapi-dpc-compiler>`_ compiler that supports the latest `SYCL* <https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html>`_ standard and also a number of extensions to the SYCL* standard, which can be found in the `sycl/doc/extensions <https://github.com/intel/llvm/tree/sycl/sycl/doc/extensions>`_ directory.
 
--------------------------------------
+.. note:: GPU features are not included in CPU only packages.
 
-Intel® Extension for PyTorch* for GPU is structured as shown in the following figure:
-
-.. figure:: ./images/intel_extension_for_pytorch_structure_gpu.svg
-  :width: 800
-  :align: center
-  :alt: Architecture of Intel® Extension for PyTorch* for GPU
-
-PyTorch components are depicted with white boxes and Intel extensions are with blue boxes. Extra performance of the extension comes from optimizations for both eager mode and graph mode. In eager mode, the PyTorch frontend is extended with custom Python modules (such as fusion modules), optimal optimizers, and INT8 quantization API. Further performance boosting is available by converting the eager-mode model into graph mode via extended graph fusion passes. On GPU, optimized operators and kernels are implemented and registered through PyTorch dispatching mechanism. These operators and kernels are accelerated from native vectorization feature and matrix calculation feature of Intel GPU hardware. In graph mode, further operator fusions are supported to reduce operator/kernel invocation overheads, and thus increase performance.
-
-Intel® Extension for PyTorch* for GPU utilizes the `DPC++ <https://github.com/intel/llvm#oneapi-dpc-compiler>`_ compiler that supports the latest `SYCL* <https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html>`_ standard and also a number of extensions to the SYCL* standard, which can be found in the `sycl/doc/extensions <https://github.com/intel/llvm/tree/sycl/sycl/doc/extensions>`_ directory. Intel® Extension for PyTorch* also integrates `oneDNN <https://github.com/oneapi-src/oneDNN>`_ and `oneMKL <https://github.com/oneapi-src/oneMKL>`_ libraries and provides kernels based on that. The oneDNN library is used for computation intensive operations. The oneMKL library is used for fundamental mathematical operations.
+Intel® Extension for PyTorch* for CPU has been released as an open–source project at `Github master branch <https://github.com/intel/intel-extension-for-pytorch/tree/master>`_. Check `CPU tutorial <https://intel.github.io/intel-extension-for-pytorch/cpu/latest/>`_ for detailed information of Intel® Extension for PyTorch* for Intel® CPUs.
 
 Intel® Extension for PyTorch* for GPU has been released as an open–source project on `GitHub xpu-master branch <https://github.com/intel/intel-extension-for-pytorch/tree/xpu-master>`_. Check `GPU tutorial <https://intel.github.io/intel-extension-for-pytorch/xpu/latest/>`_ for detailed information of Intel® Extension for PyTorch* for Intel® GPUs.
 
 .. toctree::
    :hidden:
    :maxdepth: 1
 
+   tutorials/getting_started
    tutorials/features
    tutorials/releases
    tutorials/installation
    tutorials/examples
    tutorials/api_doc
+   tutorials/performance_tuning
+   tutorials/technical_details
+   tutorials/blogs_publications
    tutorials/contribution
    tutorials/license
@@ -1,11 +1,21 @@
 API Documentation
 #################
 
-General
-*******
+Device-Agnostic
+***************
 
 .. currentmodule:: intel_extension_for_pytorch
 .. autofunction:: optimize
+.. autofunction:: get_fp32_math_mode
+.. autofunction:: set_fp32_math_mode
+.. autoclass:: verbose
+
+GPU-Specific
+************
+
+Miscellaneous
+=============
+
 .. currentmodule:: intel_extension_for_pytorch.xpu
 .. StreamContext
 .. can_device_access_peer
@@ -31,10 +41,8 @@ General
 .. autofunction:: stream
 .. autofunction:: synchronize
 
-
-
 Random Number Generator
-***********************
+=======================
 
 .. currentmodule:: intel_extension_for_pytorch.xpu
 .. autofunction:: get_rng_state
@@ -47,10 +55,8 @@ Random Number Generator
 .. autofunction:: seed_all
 .. autofunction:: initial_seed
 
-
-
 Streams and events
-******************
+==================
 
 .. currentmodule:: intel_extension_for_pytorch.xpu
 .. autoclass:: Stream
@@ -60,7 +66,7 @@ Streams and events
     :members: 
 
 Memory management
-*****************
+=================
 
 .. currentmodule:: intel_extension_for_pytorch.xpu
 .. autofunction:: empty_cache
@@ -82,27 +88,50 @@ Memory management
 .. caching_allocator_alloc
 .. caching_allocator_delete
 
-
-
 .. autofunction:: memory_stats_as_nested_dict
 .. autofunction:: reset_accumulated_memory_stats
 
-Other
-*****
-
-.. currentmodule:: intel_extension_for_pytorch.xpu
-.. autofunction:: get_fp32_math_mode
-.. autofunction:: set_fp32_math_mode
-
-    
-.. .. automodule:: intel_extension_for_pytorch.quantization
-..    :members:
-
 C++ API
-*******
+=======
 
 .. doxygenenum:: xpu::FP32_MATH_MODE
 
 .. doxygenfunction:: xpu::set_fp32_math_mode
 
 .. doxygenfunction:: xpu::get_queue_from_stream
+
+
+CPU-Specific
+************
+
+Miscellaneous
+=============
+
+.. currentmodule:: intel_extension_for_pytorch
+.. autofunction:: enable_onednn_fusion
+
+Quantization
+============
+
+.. automodule:: intel_extension_for_pytorch.quantization
+.. autofunction:: prepare
+.. autofunction:: convert
+
+Experimental API, introduction is avaiable at `feature page <./features/int8_recipe_tuning_api.md>`_.
+
+.. autofunction:: autotune
+
+CPU Runtime
+===========
+
+.. automodule:: intel_extension_for_pytorch.cpu.runtime
+.. autofunction:: is_runtime_ext_enabled
+.. autoclass:: CPUPool
+.. autoclass:: pin
+.. autoclass:: MultiStreamModuleHint
+.. autoclass:: MultiStreamModule
+.. autoclass:: Task
+.. autofunction:: get_core_list_of_node_id
+
+.. .. automodule:: intel_extension_for_pytorch.quantization
+..    :members:
@@ -0,0 +1,9 @@
+Blogs & Publications
+====================
+
+* [Accelerating PyTorch with Intel® Extension for PyTorch\*](https://medium.com/pytorch/accelerating-pytorch-with-intel-extension-for-pytorch-3aef51ea3722)
+* [Intel and Facebook Accelerate PyTorch Performance with 3rd Gen Intel® Xeon® Processors and Intel® Deep Learning Boost’s new BFloat16 capability](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/intel-facebook-boost-bfloat16.html)
+* [Accelerate PyTorch with the extension and oneDNN using Intel BF16 Technology](https://medium.com/pytorch/accelerate-pytorch-with-ipex-and-onednn-using-intel-bf16-technology-dca5b8e6b58f)
+  * *Note*: APIs mentioned in it are deprecated.
+* [Scaling up BERT-like model Inference on modern CPU - Part 1 by the launcher of the extension](https://huggingface.co/blog/bert-cpu-scaling-part-1)
+* [KT Optimizes Performance for Personalized Text-to-Speech](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/KT-Optimizes-Performance-for-Personalized-Text-to-Speech/post/1337757)