Skip to content

Commit fc71104

Browse files
tye1ganyi1996ppojingxu10
authored
[Doc] Update releases.md for 1.13.10+xpu (#2142)
* Update releases.md * Add known issue md link * Add GPU known issues --------- Co-authored-by: Pleaplusone <[email protected]> Co-authored-by: Jing Xu <[email protected]>
1 parent 7585497 commit fc71104

File tree

4 files changed

+119
-7
lines changed

4 files changed

+119
-7
lines changed

docs/tutorials/AOT.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,17 @@ Ahead of Time (AOT) Compilation
77

88
## Use case
99

10-
Intel® Extension for PyTorch\* provides build option `USE_AOT_DEVLIST` for users who install Intel® Extension for PyTorch\* via source compilation to configure device list for AOT compilation. The target device in device list is specified by DEVICE type of the target. Multi-target AOT compilation is supported by using a comma (,) as a delimiter in device list. See below table for the AOT setting targeting [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html) & Intel® Arc™ series GPUs.
10+
Intel® Extension for PyTorch\* provides build option `USE_AOT_DEVLIST` for users who install Intel® Extension for PyTorch\* via source compilation to configure device list for AOT compilation. The target device in device list is specified by DEVICE type of the target. Multi-target AOT compilation is supported by using a comma (,) as a delimiter in device list. See below table for the AOT setting targeting [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html) & Intel® Arc™ A-Series GPUs.
1111

1212
| Supported HW | AOT Setting |
13-
| ------------ |---------------------|
14-
| Intel® Data Center GPU Flex Series 170 and <BR> Intel® Data Center GPU Max Series | USE_AOT_DEVLIST='ats-m150,pvc' |
15-
| Intel® Arc™ Series | USE_AOT_DEVLIST='dg2-g10'.<br />Depending on the driver-version, the AOT devlist string might even be `dg2-g10-c0` or `dg2`.<br />Please try `dg2-g10` first. If you would encounter a build error corresponding to AOT, please try one of the other two strings |
13+
| ------------ | ----------- |
14+
| Intel® Data Center GPU Flex Series 170 | USE_AOT_DEVLIST='ats-m150' |
15+
| Intel® Data Center GPU Max Series | USE_AOT_DEVLIST='pvc' |
16+
| Intel® Arc™ A-Series | USE_AOT_DEVLIST='ats-m150' |
1617

17-
Intel® Extension for PyTorch\* enables AOT compilation for Intel GPU target devices in prebuilt wheel files. Intel® Data Center GPU Flex Series 170 and Intel® Data Center GPU Max Series are the enabled target devices in current release, with Intel® Arc™ series GPUs having experimental support. If Intel® Extension for PyTorch\* is executed on a device which is not pre-configured in `USE_AOT_DEVLIST`, this application can still run because JIT compilation will be triggered automatically to allow execution on the current device. It causes additional compilation time during execution.
18+
**Note:** Multiple AOT settings can be used together by seperating setting texts with a comma (,) to make the compiled wheel file have multiple AOT supports. E.g. a wheel file built with `USE_AOT_DEVLIST='ats-m150,pvc'` has both `ats-m150` and `pvc` AOT enabled.
19+
20+
Intel® Extension for PyTorch\* enables AOT compilation for Intel GPU target devices in prebuilt wheel files. Intel® Data Center GPU Flex Series 170 and Intel® Data Center GPU Max Series are the enabled target devices in current release, with Intel® Arc™ A-Series GPUs having experimental support. If Intel® Extension for PyTorch\* is executed on a device which is not pre-configured in `USE_AOT_DEVLIST`, this application can still run because JIT compilation will be triggered automatically to allow execution on the current device. It causes additional compilation time during execution.
1821

1922
For more GPU platforms, please refer to [Use AOT for Integrated Graphics (Intel GPU)](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-cpp-compiler-dev-guide-and-reference/top/compilation/ahead-of-time-compilation.html).
2023

docs/tutorials/features/DPC++_Extension.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -204,9 +204,7 @@ Let’s go through the DPC++ code step by step:
204204

205205
```
206206
#include <torch/extension.h>
207-
208207
#include <ipex.h>
209-
210208
#include <vector>
211209
212210
template <typename scalar_t>

docs/tutorials/performance_tuning/known_issues.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,90 @@
11
Known Issues
22
============
33

4+
## Known Issues in GPU-Specific
5+
6+
- [CRITICAL ERROR] Kernel 'XXX' removed due to usage of FP64 instructions unsupported by the targeted hardware
7+
8+
FP64 is not natively supported by the [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html) platform. If you run any AI workload on that platform and receive this error message, it means a kernel requiring FP64 instructions is removed and not executed, hence the accuracy of the whole workload is wrong.
9+
10+
- symbol undefined caused by `_GLIBCXX_USE_CXX11_ABI`
11+
12+
```bash
13+
ImportError: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev
14+
```
15+
16+
DPC++ does not support `_GLIBCXX_USE_CXX11_ABI=0`, Intel® Extension for PyTorch\* is always compiled with `_GLIBCXX_USE_CXX11_ABI=1`. This symbol undefined issue appears when PyTorch\* is compiled with `_GLIBCXX_USE_CXX11_ABI=0`. Update PyTorch\* CMAKE file to set `_GLIBCXX_USE_CXX11_ABI=1` and compile PyTorch\* with particular compiler which supports `_GLIBCXX_USE_CXX11_ABI=1`. We recommend using prebuilt wheels in [download server](https://developer.intel.com/ipex-whl-stable-xpu) to avoid this issue.
17+
18+
- Can't find oneMKL library when build Intel® Extension for PyTorch\* without oneMKL
19+
20+
```bash
21+
/usr/bin/ld: cannot find -lmkl_sycl
22+
/usr/bin/ld: cannot find -lmkl_intel_ilp64
23+
/usr/bin/ld: cannot find -lmkl_core
24+
/usr/bin/ld: cannot find -lmkl_tbb_thread
25+
dpcpp: error: linker command failed with exit code 1 (use -v to see invocation)
26+
```
27+
28+
When PyTorch\* is built with oneMKL library and Intel® Extension for PyTorch\* is built without oneMKL library, this linker issue may occur. Resolve it by setting:
29+
30+
```bash
31+
export USE_ONEMKL=OFF
32+
export MKL_DPCPP_ROOT=${PATH_To_Your_oneMKL}/__release_lnx/mkl
33+
```
34+
35+
Then clean build Intel® Extension for PyTorch\*.
36+
37+
- undefined symbol: `mkl_lapack_dspevd`. Intel MKL FATAL ERROR: cannot load `libmkl_vml_avx512.so.2` or `libmkl_vml_def.so.2`
38+
39+
This issue may occur when Intel® Extension for PyTorch\* is built with oneMKL library and PyTorch\* is not build with any MKL library. The oneMKL kernel may run into CPU backend incorrectly and trigger this issue. Resolve it by installing MKL library from conda:
40+
41+
```bash
42+
conda install mkl
43+
conda install mkl-include
44+
```
45+
46+
then clean build PyTorch\*.
47+
48+
- OSError: `libmkl_intel_lp64.so.1`: cannot open shared object file: No such file or directory
49+
50+
Wrong MKL library is used when multiple MKL libraries exist in system. Preload oneMKL by:
51+
52+
```bash
53+
export LD_PRELOAD=${MKL_DPCPP_ROOT}/lib/intel64/libmkl_intel_lp64.so.2:${MKL_DPCPP_ROOT}/lib/intel64/libmkl_intel_ilp64.so.2:${MKL_DPCPP_ROOT}/lib/intel64/libmkl_gnu_thread.so.2:${MKL_DPCPP_ROOT}/lib/intel64/libmkl_core.so.2:${MKL_DPCPP_ROOT}/lib/intel64/libmkl_sycl.so.2
54+
```
55+
56+
If you continue seeing similar issues for other shared object files, add the corresponding files under `${MKL_DPCPP_ROOT}/lib/intel64/` by `LD_PRELOAD`. Note that the suffix of the libraries may change (e.g. from .1 to .2), if more than one oneMKL library is installed on the system.
57+
58+
- RuntimeError: Number of dpcpp devices should be greater than zero!
59+
60+
Running some AI models (e.g. 3D-Unet inference) on Ubuntu22.04 may trigger this runtime error, as oneAPI Base Toolkit 2023.0 fails to return available GPU device on ubuntu22.04 in such scenario. The workaround solution is to update the model script to make sure `import torch` and `import intel_extension_for_pytorch` happen before importing other libraries.
61+
62+
- OpenMP library could not be found
63+
64+
Build Intel® Extension for PyTorch\* on SLES15 SP3 using default GCC 7.5 and CentOS8 using default GCC 8.5 may trigger this build error.
65+
66+
```bash
67+
Make Error at third_party/ideep/mkl-dnn/third_party/oneDNN/cmake/OpenMP.cmake:118 (message):
68+
OpenMP library could not be found. Proceeding might lead to highly
69+
sub-optimal performance.
70+
Call Stack (most recent call first):
71+
third_party/ideep/mkl-dnn/third_party/oneDNN/CMakeLists.txt:117 (include)
72+
```
73+
74+
The root cause is GCC 7.5 or 8.5 does not support `-Wno-error=redundant-move` option. Uplift to GCC version >=9 can solve this issue.
75+
76+
- Unit test failures on Intel® Data Center GPU Flex Series 170
77+
78+
The following unit tests fail on Intel® Data Center GPU Flex Series 170.
79+
80+
test_groupnorm.py::TestTorchMethod::test_group_norm_backward
81+
test_groupnorm_channels_last.py::TestTorchMethod::test_group_norm_backward
82+
test_fusion.py::TestNNMethod::test_conv_binary_mul
83+
84+
The same test cases pass on Intel® Data Center GPU Max Series. The root cause of the failures is under investigation.
85+
86+
## Known Issues in CPU-Specific
87+
488
- If you found the workload runs with Intel® Extension for PyTorch\* occupies a remarkably large amount of memory, you can try to reduce the occupied memory size by setting the `--weights_prepack` parameter of the `ipex.optimize()` function to `False`.
589
690
- Supporting of EmbeddingBag with INT8 when bag size > 1 is working in progress.

docs/tutorials/releases.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,33 @@
11
Releases
22
=============
33

4+
## 1.13.10+xpu
5+
6+
Intel® Extension for PyTorch\* v1.13.10+xpu extends PyTorch\* 1.13 with up-to-date features and optimizations on `xpu` for an extra performance boost on Intel hardware. Optimizations take advantage of AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel Xe Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, through PyTorch* `xpu` device, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs with PyTorch*.
7+
8+
### Highlights
9+
10+
This release introduces specific XPU solution optimizations on Intel discrete GPUs which include Intel® Data Center GPU Flex Series 170 and Intel® Data Center GPU Max Series. Optimized operators and kernels are implemented and registered through PyTorch\* dispatching mechanism for the `xpu` device. These operators and kernels are accelerated on Intel GPU hardware from the corresponding native vectorization and matrix calculation features. In graph mode, additional operator fusions are supported to reduce operator/kernel invocation overheads, and thus increase performance.
11+
12+
This release provides the following features:
13+
- Usability and Performance Features listed in [Intel® Extension for PyTorch\* v1.13.0+cpu release](https://intel.github.io/intel-extension-for-pytorch/cpu/1.13.0+cpu/tutorials/releases.html#id1)
14+
- Distributed Training
15+
- support of distributed training with DistributedDataParallel (DDP) on Intel GPU hardware
16+
- support of distributed training with Horovod (experimental) on Intel GPU hardware
17+
- DLPack Solution
18+
- mechanism to share tensor data without copy when interoparate with other libraries on Intel GPU hardware
19+
- Legacy Profiler Tool
20+
- an extension of PyTorch* legacy profiler for profiling operators' overhead on Intel GPU hardware
21+
- Simple Trace Tool
22+
- built-in debugging tool to print out the call stack for a piece of code
23+
24+
This release adds the following fusion patterns in PyTorch\* JIT mode for Intel GPU:
25+
- `Conv2D` + UnaryOp(`abs`, `sqrt`, `square`, `exp`, `log`, `round`, `GeLU`, `Log_Sigmoid`, `Hardswish`, `Mish`, `HardSigmoid`, `Tanh`, `Pow`, `ELU`, `hardtanh`)
26+
- `Linear` + UnaryOp(`abs`, `sqrt`, `square`, `exp`, `log`, `round`, `Log_Sigmoid`, `Hardswish`, `HardSigmoid`, `Pow`, `ELU`, `SiLU`, `hardtanh`, `Leaky_relu`)
27+
### Known Issues
28+
29+
Please refer to [Known Issues webpage](./performance_tuning/known_issues.md).
30+
431
## 1.10.200+gpu
532

633
Intel® Extension for PyTorch\* v1.10.200+gpu extends PyTorch\* 1.10 with up-to-date features and optimizations on XPU for an extra performance boost on Intel Graphics cards. XPU is a user visible device that is a counterpart of the well-known CPU and CUDA in the PyTorch\* community. XPU represents an Intel-specific kernel and graph optimizations for various “concrete” devices. The XPU runtime will choose the actual device when executing AI workloads on the XPU device. The default selected device is Intel GPU. XPU kernels from Intel® Extension for PyTorch\* are written in [DPC++](https://github.com/intel/llvm#oneapi-dpc-compiler) that supports [SYCL language](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html) and also a number of [DPC++ extensions](https://github.com/intel/llvm/tree/sycl/sycl/doc/extensions).

0 commit comments

Comments
 (0)