Skip to content

Commit 082454e

Browse files
committed
Merge branch 'master' into chunyuan/lstm_dropout_fallback
Conflicts: intel_pytorch_extension_py/ops/lstm.py
2 parents f4eeffa + e904bb3 commit 082454e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+1854
-546
lines changed

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,6 @@
77
[submodule "third_party/xsmm"]
88
path = third_party/xsmm
99
url = https://github.com/hfp/libxsmm.git
10+
[submodule "third_party/torch_ccl"]
11+
path = third_party/torch_ccl
12+
url = https://github.com/intel/torch-ccl.git

CMakeLists.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,12 @@ set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
77

88
set(PLUGIN_NAME _torch_ipex)
99

10+
set(RPATH_VALUE $ORIGIN)
11+
set(CMAKE_SKIP_BUILD_RPATH FALSE)
12+
set(CMAKE_BUILD_WITH_INSTALL_RPATH TRUE)
13+
set(CMAKE_INSTALL_RPATH "${RPATH_VALUE}/lib/")
14+
set(CMAKE_INSTALL_RPATH_USE_LINK_PATH FALSE)
15+
1016
set(DPCPP_ROOT "${PROJECT_SOURCE_DIR}/torch_ipex/csrc")
1117
set(DPCPP_THIRD_PARTY_ROOT "${PROJECT_SOURCE_DIR}/third_party")
1218

README.md

Lines changed: 114 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,31 +3,33 @@
33
Intel Extension for PyTorch is a Python package to extend official PyTorch. It is designed to make the Out-of-Box user experience of PyTorch CPU better while achieving good performance. The extension also will be the PR(Pull-Request) buffer for the Intel PyTorch framework dev team. The PR buffer will not only contain functions, but also optimization (for example, take advantage of Intel's new hardware features).
44

55
- [Installation](#installation)
6-
- [Install PyTorch from Source](#install-pytorch-from-source)
7-
- [Install Intel Extension for PyTorch from Source](#install-intel-extension-for-pytorch-from-source)
6+
- [Install PyTorch from Source](#install-pytorch-from-source)
7+
- [Install Intel Extension for PyTorch from Source](#install-intel-extension-for-pytorch-from-source)
88
- [Getting Started](#getting-started)
99
- [Automatically Mix Precison](#automatically-mix-precision)
10+
- [BFloat16](#BFloat16)
11+
- [INT8](#int8-quantization)
1012
- [Contribution](#contribution)
1113
- [License](#license)
1214

1315
## Installation
1416

1517
### Install PyTorch from Source
1618

17-
1. Get PyTorch v1.5.0-rc3 source(Refer to [PyTorch guide](https://github.com/pytorch/pytorch#get-the-pytorch-source) for more details)
19+
1. Get PyTorch v1.7.0 source(Refer to [PyTorch guide](https://github.com/pytorch/pytorch#get-the-pytorch-source) for more details)
1820
```bash
1921
git clone --recursive https://github.com/pytorch/pytorch
2022
cd pytorch
2123

2224
# checkout source code to the specified version
23-
git checkout v1.5.0-rc3
25+
git checkout v1.7.0
2426

2527
# update submodules for the specified PyTorch version
2628
git submodule sync
2729
git submodule update --init --recursive
2830
```
2931

30-
2. Get Intel PyTorch Extension source
32+
2. Get the source code of Intel Extension for PyTorch
3133
```bash
3234
git clone --recursive https://github.com/intel/intel-extension-for-pytorch
3335
cd intel-extension-for-pytorch
@@ -41,7 +43,7 @@ Intel Extension for PyTorch is a Python package to extend official PyTorch. It i
4143
```bash
4244
# Apply git patch to pytorch code
4345
cd ${pytorch_directory}
44-
git apply ${intel_extension_for_pytorch_directory}/torch_patches/dpcpp-v1.5-rc3.patch
46+
git apply ${intel_extension_for_pytorch_directory}/torch_patches/xpu-1.7.patch
4547
```
4648

4749
4. Build and install PyTorch (Refer to [PyTorch guide](https://github.com/pytorch/pytorch#install-pytorch) for more details)
@@ -109,6 +111,8 @@ res = model(input)
109111
In addition, Intel Extension for PyTorch supports the mixed precision. It means that some operators of a model may run with Float32 and some other operators may run with BFloat16 or INT8.
110112
In traditional, if you want to run a model with a low precision type, you need to convert the parameters and the input tensors to the low precision type manually. And if the model contains some operators that do not support the low precision type, then you have to convert back to Float32. Round after round until the model can run normally.
111113
The extension can simply the case, you just need to enable the auto-mix-precision as follows, then you can benefit from the low precision. Currently, the extension only supports BFloat16.
114+
115+
#### BFloat16
112116
```python
113117
import torch
114118
import torch.nn as nn
@@ -130,6 +134,110 @@ model = Model().to(ipex.DEVICE)
130134
131135
res = model(input)
132136
```
137+
#### INT8 Quantization
138+
Currently, Intel Extension for PyTorch has supported static and symmetric quantization. Development of dynamic quantization is undergoing. And asymmetric quantization will be enabled once oneDNN is upgraded to v2.0 or higher versions.
139+
140+
How to quantize the following model:
141+
```python
142+
import torch
143+
import torch.nn as nn
144+
145+
class Model(nn.Module):
146+
def __init__(self):
147+
super(Model, self).__init__()
148+
self.conv = nn.Conv2d(3, 64, 7, stride=2)
149+
150+
def forward(self, input):
151+
return self.conv(input).relu()
152+
```
153+
Firstly we need to do calibration step against a representative dataset (set ```running_mode``` to ```calibration```):
154+
```python
155+
# Convert the model to the Extension device
156+
model = Model().to(ipex.DEVICE)
157+
158+
# Create a configuration file to save quantization parameters.
159+
conf = ipex.AmpConf(torch.int8)
160+
with torch.no_grad():
161+
for x in cali_dataset:
162+
# Run the model under calibration mode to collect quantization parameters
163+
with ipex.AutoMixPrecision(conf, running_mode='calibration'):
164+
y = model(x.to(ipex.DEVICE))
165+
# Save the configuration file
166+
conf.save('configure.json')
167+
```
168+
The content of the configuration file is as follows.
169+
170+
```json
171+
[
172+
{
173+
"id": 0,
174+
"name": "Convolution",
175+
"algorithm": "min_max",
176+
"weight_granularity": "per_channel",
177+
"inputs_scale": [
178+
25.05583953857422
179+
],
180+
"outputs_scale": [
181+
43.98969650268555
182+
],
183+
"inputs_uint8_used": [
184+
false
185+
],
186+
"outputs_uint8_used": [
187+
false
188+
],
189+
"quantized": true
190+
},
191+
{
192+
"id": 1,
193+
"name": "Relu",
194+
"algorithm": "min_max",
195+
"weight_granularity": "per_channel",
196+
"inputs_scale": [
197+
43.98969650268555
198+
],
199+
"outputs_scale": [
200+
43.98969650268555
201+
],
202+
"inputs_uint8_used": [
203+
false
204+
],
205+
"outputs_uint8_used": [
206+
false
207+
],
208+
"quantized": true
209+
}
210+
]
211+
```
212+
- ```id``` is a sequence number of operators which were quantized statically in the calibration step.
213+
**Manually changing this value will cause unexpected behaviors**.
214+
- ```name``` is the name of the operator to be quantized.
215+
- ```algorithm``` indicates how to calculate the scales of the observed tensors. Currently only ```min_max``` is supported.
216+
- ```weight_granularity``` controls how to quantize the operator weights. The ```Convolution``` and ```Linear``` both supports ```per_channel``` and ```per_tensor```. And the other operators only supports ```per_tensor```.
217+
- ```inputs_scale``` and ```outputs_scale``` are the scales to quantize the input tensors and output tensors respectively.
218+
- ```inputs_uint8_used``` and ```outputs_uint8_used``` indicate whether to use ```int8``` or ```uint8```. Default value is ```false```, indicating that ```int8``` is used.
219+
- ```quantized``` determines whether this operator should be quantized or not during inference.
220+
221+
After doing calibration step, we can use the saved configuration json file to do evalution (set ```running_mode``` to ```inference```):
222+
```python
223+
conf = ipex.AmpConf(torch.int8, 'configure.json')
224+
with torch.no_grad():
225+
for x in cali_dataset:
226+
with ipex.AutoMixPrecision(conf, running_mode='inference'):
227+
y = model(x.to(ipex.DEVICE))
228+
```
229+
230+
Supported Quantization Operators:
231+
- ```Convoluton```
232+
- ```BatchNorm```
233+
- ```MaxPooling```
234+
- ```AvgPooling```
235+
- ```AdaptivePooling```
236+
- ```Linear```
237+
- ```convolution + relu```
238+
- ```convolution + sum```
239+
- ```convolution + sum + relu```
240+
- ```convolution + BatchNorm```
133241
134242
135243
## Contribution

cmake/CPU.cmake

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ SET(DNNL_LIBRARY_TYPE STATIC CACHE STRING "" FORCE)
1212

1313
set(DPCPP_CPU_ROOT "${PROJECT_SOURCE_DIR}/torch_ipex/csrc/cpu")
1414
add_subdirectory(${DPCPP_THIRD_PARTY_ROOT}/mkl-dnn)
15-
15+
find_package(TorchCCL REQUIRED)
1616
list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake/Modules)
1717

1818
FIND_PACKAGE(AVX)
@@ -125,9 +125,15 @@ set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-trapping-math")
125125

126126
# includes
127127

128+
# include mkl-dnn before PyTorch
129+
# Otherwise, path_to_pytorch/torch/include/dnnl.hpp will be used as the header
130+
include_directories(${PROJECT_SOURCE_DIR}/build/third_party/mkl-dnn/include)
131+
include_directories(${DPCPP_THIRD_PARTY_ROOT}/mkl-dnn/include)
132+
128133
# Set installed PyTorch dir
129134
if(DEFINED PYTORCH_INSTALL_DIR)
130135
include_directories(${PYTORCH_INSTALL_DIR}/include)
136+
include_directories(${PYTORCH_INSTALL_DIR}/include/torch/csrc/api/include/)
131137
else()
132138
message(FATAL_ERROR, "Cannot find installed PyTorch directory")
133139
endif()
@@ -136,9 +142,8 @@ include_directories(${PROJECT_SOURCE_DIR})
136142
include_directories(${PROJECT_SOURCE_DIR}/torch_ipex)
137143
include_directories(${PROJECT_SOURCE_DIR}/torch_ipex/csrc/)
138144
include_directories(${DPCPP_THIRD_PARTY_ROOT}/pybind11/include)
139-
include_directories(${PROJECT_SOURCE_DIR}/build/third_party/mkl-dnn/include)
140-
include_directories(${DPCPP_THIRD_PARTY_ROOT}/mkl-dnn/include)
141145
include_directories(${DPCPP_THIRD_PARTY_ROOT}/xsmm/include)
146+
include_directories(${TORCHCCL_INCLUDE_DIR})
142147

143148
# sources
144149
set(DPCPP_SRCS)
@@ -167,7 +172,7 @@ set(DPCPP_SRCS ${DPCPP_ATEN_SRCS} ${DPCPP_COMMON_SRCS} ${DPCPP_CPU_SRCS} ${DPCPP
167172
pybind11_add_module(${PLUGIN_NAME} SHARED ${DPCPP_SRCS})
168173
target_link_libraries(${PLUGIN_NAME} PRIVATE ${DPCPP_THIRD_PARTY_ROOT}/xsmm/lib/libxsmm.a)
169174

170-
link_directories(${PYTORCH_INSTALL_DIR}/lib)
175+
#link_directories(${PYTORCH_INSTALL_DIR}/lib)
171176
target_link_libraries(${PLUGIN_NAME} PUBLIC ${PYTORCH_INSTALL_DIR}/lib/libtorch_cpu.so)
172177
target_link_libraries(${PLUGIN_NAME} PUBLIC ${PYTORCH_INSTALL_DIR}/lib/libc10.so)
173178

@@ -184,12 +189,11 @@ else()
184189
endif()
185190

186191
add_dependencies(${PLUGIN_NAME} pybind11)
187-
192+
add_dependencies(${PLUGIN_NAME} torch_ccl)
188193
add_dependencies(${PLUGIN_NAME} dnnl)
189194
target_link_libraries(${PLUGIN_NAME} PUBLIC dnnl)
190-
191195
add_dependencies(${PLUGIN_NAME} xsmm)
192-
196+
target_link_libraries(${PLUGIN_NAME} PUBLIC torch_ccl)
193197
link_directories(${PYTORCH_INSTALL_DIR}/lib)
194198
target_link_libraries(${PLUGIN_NAME} PUBLIC ${PYTORCH_INSTALL_DIR}/lib/libtorch_python.so)
195199
target_link_libraries(${PLUGIN_NAME} PUBLIC ${PYTORCH_INSTALL_DIR}/lib/libtorch_cpu.so)

cmake/Modules/FindTorchCCL.cmake

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# - Try to find torch-ccl
2+
#
3+
# The following are set after configuration is done:
4+
# TORCHCCL_FOUND : set to true if oneCCL is found.
5+
# TORCHCCL_INCLUDE_DIR : path to oneCCL include dir.
6+
# TORCHCCL_LIBRARIES : list of libraries for oneCCL
7+
#
8+
# The following variables are used:
9+
# TORCHCCL_USE_NATIVE_ARCH : Whether native CPU instructions should be used in TORCHCCL. This should be turned off for
10+
# general packaging to avoid incompatible CPU instructions. Default: OFF.
11+
12+
IF (NOT TORCHCCL_FOUND)
13+
SET(TORCHCCL_FOUND OFF)
14+
15+
SET(TORCHCCL_LIBRARIES)
16+
SET(TORCHCCL_INCLUDE_DIR)
17+
18+
SET(TORCHCCL_ROOT "${PROJECT_SOURCE_DIR}/third_party/torch_ccl")
19+
20+
ADD_SUBDIRECTORY(${TORCHCCL_ROOT})
21+
IF(NOT TARGET torch_ccl)
22+
MESSAGE(FATAL_ERROR "Failed to include torch_ccl target")
23+
ENDIF()
24+
GET_TARGET_PROPERTY(INCLUDE_DIRS torch_ccl INCLUDE_DIRECTORIES)
25+
SET(TORCHCCL_INCLUDE_DIR ${INCLUDE_DIRS})
26+
SET(TORCHCCL_LIBRARIES torch_ccl)
27+
28+
ENDIF(NOT TORCHCCL_FOUND)

docker/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,9 +50,9 @@ RUN --mount=type=cache,target=/opt/ccache \
5050
cd intel-extension-for-pytorch && git submodule sync && \
5151
git submodule update --init --recursive && \
5252
git clone https://github.com/pytorch/pytorch && \
53-
cd pytorch && git checkout v1.5.1 && git submodule sync && \
53+
cd pytorch && git checkout v1.7.0 && git submodule sync && \
5454
git submodule update --init --recursive && \
55-
git apply ../torch_patches/dpcpp-v1.5.1.patch && \
55+
git apply ../torch_patches/xpu-1.7.patch && \
5656
USE_MKLDNN=1 USE_CUDA=0 USE_NNPACK=0 USE_CUDNN=0 \
5757
CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" pip install -v . && \
5858
cd .. && pip install -v . && rm -rf *

intel_pytorch_extension_py/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,9 @@
77
from .optim import *
88
from .ops import *
99
import _torch_ipex as core
10+
core.enable_torch_ccl()
1011

11-
DEVICE = 'dpcpp'
12+
DEVICE = 'xpu:0'
1213

1314
class AmpConf(object):
1415
def __init__(self, mixed_dtype = torch.bfloat16, configure_file = None):

intel_pytorch_extension_py/ops/gru.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
VF_gru = _VF.gru
88

99
def ipex_gru(input, hx, _flat_weights, bias, num_layers, dropout, training, bidirectional, batch_first):
10-
if input.device == torch.device('dpcpp') and (dropout == 0 or training == False):
10+
if input.device.type == 'xpu' and (dropout == 0 or training == False):
1111
return torch.ops.torch_ipex.gru(input, hx, _flat_weights, bias, num_layers, dropout, training, bidirectional, batch_first)
1212
else:
1313
return VF_gru(input, hx, _flat_weights, bias, num_layers, dropout, training, bidirectional, batch_first)

intel_pytorch_extension_py/ops/lstm.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,9 @@ def fallback_lstm(*args, device):
3434
else:
3535
item_cpu = item
3636
args_cpu.append(item_cpu)
37-
37+
3838
output = VF_lstm(*args_cpu)
39-
39+
4040
# move output to the original device
4141
output_device = []
4242
# output is a tuple which does not support item assignment

intel_pytorch_extension_py/ops/rnn.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,13 @@
1010
from torch import _VF
1111

1212
def rnn_tanh(input, hx, _flat_weights, bias, num_layers, dropout, training, bidirectional, batch_first):
13-
if input.device == torch.device('dpcpp') and (dropout == 0 or training == False):
13+
if input.device.type == 'xpu' and (dropout == 0 or training == False):
1414
return torch.ops.torch_ipex.rnn_tanh(input, hx, _flat_weights, bias, num_layers, dropout, training, bidirectional, batch_first)
1515
else:
1616
return _VF.rnn_tanh(input, hx, _flat_weights, bias, num_layers, dropout, training, bidirectional, batch_first)
1717

1818
def rnn_relu(input, hx, _flat_weights, bias, num_layers, dropout, training, bidirectional, batch_first):
19-
if input.device == torch.device('dpcpp') and (dropout == 0 or training == False):
19+
if input.device.type == 'xpu' and (dropout == 0 or training == False):
2020
return torch.ops.torch_ipex.rnn_relu(input, hx, _flat_weights, bias, num_layers, dropout, training, bidirectional, batch_first)
2121
else:
2222
return _VF.rnn_relu(input, hx, _flat_weights, bias, num_layers, dropout, training, bidirectional, batch_first)

0 commit comments

Comments
 (0)