Skip to content

Commit 32822b2

Browse files
committed
Merge remote-tracking branch 'ups/develop' into feature/libxsmm
2 parents 7bb67b6 + 0cefb94 commit 32822b2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+2012
-166
lines changed

cmake/external/grpc.cmake

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ ExternalProject_Add(
5050
UPDATE_COMMAND ""
5151
CONFIGURE_COMMAND ""
5252
BUILD_IN_SOURCE 1
53+
PATCH_COMMAND git apply ${PADDLE_SOURCE_DIR}/patches/grpc/fix_too_early_destory.patch
5354
# NOTE(yuyang18):
5455
# Disable -Werror, otherwise the compile will fail in MacOS.
5556
# It seems that we cannot configure that by make command.
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
Fixed-point quantization uses lower bits, for example, 2-bit, 3-bit or 8-bit fixed point to represent weights and activations, which usually are in singe-precision float-point with 32 bits. The fixed-point representation has advantages in reducing memory bandwidth, lowering power consumption and computational resources as well as the model storage requirements. It is especially important for the inference in embedded-device deployment.
2+
3+
According to some experiments, the apporach to quantize the model trained in float point directly works effectively on the large models, like the VGG model having many parameters. But the accuracy drops a lot for the small model. In order to improve the tradeoff between accuracy and latency, many quantized training apporaches are proposed.
4+
5+
This document is to design a quantized training framework on Fluid. The first part will introduce how to quantize, The second part will describe the quantized training framework. The last part will illustrate how to calculate the quantization scale.
6+
7+
8+
### How to quantize
9+
10+
There are many ways to quantize the float value to fixed-point value. For example:
11+
12+
$$ r = min(max(x, a), b)$$
13+
$$ s = \frac{b - a}{n - 1} $$
14+
$$ q = \left \lfloor \frac{r - a}{s} \right \rceil $$
15+
16+
where, $x$ is the float value to be quantized, $[a, b]$ is the quantization range, $a$ is the minimum value and $b$ is the maximal value. $\left \lfloor \right \rceil$ denotes rounding to the nearest integer. If the quantization level is $k$, $n$ is $2^k$, for example, $k$ is 8 and $n$ is 256. $q$ is the quantized integer.
17+
18+
19+
The quantization we applied is parameterized by the number of quantization levels and maximum absolute value:
20+
21+
$$ M = max(abs(x)) $$
22+
$$ q = \left \lfloor \frac{x}{M} * (n - 1) \right \rceil $$
23+
24+
where, $x$ is the float value to be quantized, $M$ is maximum absolute value. $\left \lfloor \right \rceil$ denotes rounding to the nearest integer. For 8 bit quantization, $n=2^{8}=256$. $q$ is the quantized integer.
25+
26+
27+
Wether the *min-max* quantization or *max-abs* quantization, they also can be represent:
28+
29+
$q = scale * r + b$
30+
31+
We call *min-max*, *max-abs* as the quantization arguments, also call them quantization scale or quantization range.
32+
33+
34+
How to calculate the quantization scale (or maximum absolute value) for inference will be described in the last part.
35+
36+
37+
### Training Framework
38+
39+
#### Forward pass
40+
41+
The forward pass is simulated quantization, see Figure 1.
42+
43+
The training framework is as following figure.
44+
45+
<p align="center">
46+
<img src="quantization_forward.png" width="300" height="340"><br/>
47+
Figure 1. Forward in training with simulated quantization.
48+
</p>
49+
50+
- Firstly, both input and weight will be quantized to 8-bit integers.
51+
- Second, do the multiplication (or convolution) operation with integers.
52+
- Third, dequantize the multiplication (or convolution) results to 32-bit float point.
53+
- Finally, do bias-addition in float type of 32 bit. Here, the bias is not quantized.
54+
55+
For general matrix multiplication (GEMM), quantize for $X$ and $W$:
56+
57+
$$ X_q = \left \lfloor \frac{X}{X_m} * (n - 1) \right \rceil $$
58+
$$ W_q = \left \lfloor \frac{W}{W_m} * (n - 1) \right \rceil $$
59+
60+
Do GEMM:
61+
62+
$$ Y = X_q * W_q $$
63+
64+
65+
Dequantize $Y$:
66+
67+
$$
68+
\begin{align}
69+
Y_{dq} &=\frac{Y}{(n - 1) * (n - 1)} * X_m * W_m \\\
70+
&=\frac{X_q * W_q}{(n - 1) * (n - 1)} * X_m * W_m \\\
71+
&=(\frac{X_q}{n - 1} * X_m) * (\frac{W_q}{n - 1} * W_m)
72+
\end{align}
73+
$$
74+
75+
From these formulas, dequantization also can be moved before GEMM, do dequantization for $Xq$ and $Wq$ at first, then do GEMM. The forward workflow in training is equivalent to following framework.
76+
77+
<p align="center">
78+
<img src="quantization_equivalent_forward.png" width="300" height="330"><br/>
79+
Figure 2. Equivalent forward in training with simulated quantization.
80+
</p>
81+
82+
We use this equivalent workflow in the training. In our desigin, there is a quantization transpiler to insert the quantization operator and the de-quantization operator in the Fluid `ProgramDesc`. Since the outputs of quantization and de-quantization operator are still in floating point, they are called faked quantization and de-quantization operator. And the training framework is called simulated quantization.
83+
84+
#### Backward pass
85+
86+
See Figure 3. The gradients are calculated by dequantized weights and activations. All inputs and outputs are float point with 32-bit. And in the weight updating process, the gradients will be added to the original weight, not the quantized or dequantized weights.
87+
88+
<p align="center">
89+
<img src="quantization_backward_and_optimization.png"><br/>
90+
Figure 3. Backward and weight updating in training with simulated quantization.
91+
</p>
92+
93+
So the quantization transipler will change some inputs of the corresponding backward operators.
94+
95+
### How to calculate quantization scale
96+
97+
There are two strategies to calculate quantization scale, we call them dynamic and static strategy. The dynamic strategy calculates the quantization scale value each iteration. The static strategy keeps the quantization scale for different inputs.
98+
99+
For weights, we apply the dynamic strategy in the training, that is to say, the quantization scale will be recalculated during each iteration until the traning is finished.
100+
101+
For activations, the quantization scales are estimated during training, then used in inference. There are several different ways to estimate them:
102+
103+
104+
1. Calculate the mean of maximum absolute during a window.
105+
2. Calculate the max of maximum absolute during a window.
106+
3. Calculate the running mean of maximum absolute during a window, as follows:
107+
108+
$$ Vt = (1 - k) * V + k * V_{t-1} $$
109+
110+
where, $V$ is the maximum absolute value of current batch, $Vt$ is the running mean value. $k$ is a factor, such as 0.9.
41.5 KB
Loading
32.2 KB
Loading
27.3 KB
Loading

paddle/contrib/inference/CMakeLists.txt

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,10 @@ endfunction(inference_api_test)
4545
cc_library(paddle_inference_api
4646
SRCS paddle_inference_api.cc paddle_inference_api_impl.cc
4747
DEPS ${FLUID_CORE_MODULES} ${GLOB_OP_LIB})
48+
if(NOT APPLE)
49+
set(LINK_FLAGS "-Wl,--retain-symbols-file ${CMAKE_CURRENT_SOURCE_DIR}/paddle_inference_api.sym")
50+
set_target_properties(paddle_inference_api PROPERTIES LINK_FLAGS "${LINK_FLAGS}")
51+
endif()
4852

4953
# Here the shared library doesn't depend on other fluid libraries, or double free will occur.
5054
cc_library(paddle_inference_api_shared SHARED
@@ -53,8 +57,19 @@ add_dependencies(paddle_inference_api_shared ${FLUID_CORE_MODULES} ${GLOB_OP_LIB
5357
set_target_properties(paddle_inference_api_shared PROPERTIES OUTPUT_NAME paddle_inference_api)
5458

5559
if(NOT APPLE)
56-
set(LINK_FLAGS "-fPIC -fvisibility=hidden")
60+
set(LINK_FLAGS "-Wl,--version-script ${CMAKE_CURRENT_SOURCE_DIR}/paddle_inference_api.map")
5761
set_target_properties(paddle_inference_api_shared PROPERTIES LINK_FLAGS "${LINK_FLAGS}")
62+
FILE(WRITE ${CMAKE_CURRENT_BINARY_DIR}/check_symbol.cmake
63+
"execute_process(COMMAND bash -c \"${CMAKE_CURRENT_SOURCE_DIR}/check_symbol.sh"
64+
" ${CMAKE_CURRENT_BINARY_DIR}/libpaddle_inference_api.so\" RESULT_VARIABLE symbol_res)\n"
65+
"if(NOT \"\${symbol_res}\" STREQUAL \"0\")\n"
66+
" message(FATAL_ERROR \"Check symbol failed.\")\n"
67+
"endif()\n")
68+
add_custom_command(
69+
OUTPUT "${CMAKE_CURRENT_BINARY_DIR}/.check_symbol"
70+
COMMAND ${CMAKE_COMMAND} -P "${CMAKE_CURRENT_BINARY_DIR}/check_symbol.cmake"
71+
DEPENDS paddle_inference_api_shared)
72+
add_custom_target(check_symbol ALL DEPENDS "${CMAKE_CURRENT_BINARY_DIR}/.check_symbol")
5873
endif()
5974

6075
cc_test(test_paddle_inference_api
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#!/bin/bash
2+
3+
lib=$1
4+
if [ $# -ne 1 ]; then echo "No input library"; exit -1 ; fi
5+
6+
num_paddle_syms=$(nm -D --defined-only ${lib} | grep paddle | wc -l)
7+
num_google_syms=$(nm -D --defined-only ${lib} | grep google | wc -l)
8+
9+
if [ $num_paddle_syms -le 0 ]; then echo "Have no paddle symbols"; exit -1 ; fi
10+
if [ $num_google_syms -ge 1 ]; then echo "Have some google symbols"; exit -1 ; fi
11+
12+
exit 0

paddle/contrib/inference/demo/CMakeLists.txt

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,6 @@
1313
# limitations under the License.
1414
#
1515

16-
inference_api_test(simple_on_word2vec ARGS test_word2vec)
17-
1816
option(WITH_INFERENCE_DEMO "Compile with Inference demo" OFF)
1917
if(NOT WITH_INFERENCE_DEMO)
2018
return()
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
cmake_minimum_required(VERSION 3.0)
2+
3+
project(cpp_inference_demo CXX C)
4+
5+
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
6+
7+
if(NOT DEFINED PADDLE_LIB)
8+
message(FATAL_ERROR "please set PADDLE_LIB with -DPADDLE_LIB=/path/paddle/lib")
9+
endif()
10+
if(NOT DEFINED DEMO_NAME)
11+
message(FATAL_ERROR "please set DEMO_NAME with -DDEMO_NAME=demo_name")
12+
endif()
13+
14+
option(WITH_MKL "Compile demo with MKL/OpenBlas support, default use MKL." ON)
15+
option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." OFF)
16+
option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." ON)
17+
18+
if(WITH_GPU)
19+
set(CUDA_LIB "/usr/local/cuda/lib64/" CACHE STRING "CUDA Library")
20+
endif()
21+
22+
include_directories("${PADDLE_LIB}")
23+
include_directories("${PADDLE_LIB}/third_party/install/protobuf/include")
24+
include_directories("${PADDLE_LIB}/third_party/install/glog/include")
25+
include_directories("${PADDLE_LIB}/third_party/install/gflags/include")
26+
include_directories("${PADDLE_LIB}/third_party/install/snappy/include")
27+
include_directories("${PADDLE_LIB}/third_party/install/snappystream/include")
28+
include_directories("${PADDLE_LIB}/third_party/install/zlib/include")
29+
30+
include_directories("${PADDLE_LIB}/third_party/boost")
31+
include_directories("${PADDLE_LIB}/third_party/eigen3")
32+
33+
link_directories("${PADDLE_LIB}/third_party/install/snappy/lib")
34+
link_directories("${PADDLE_LIB}/third_party/install/snappystream/lib")
35+
link_directories("${PADDLE_LIB}/third_party/install/protobuf/lib")
36+
link_directories("${PADDLE_LIB}/third_party/install/glog/lib")
37+
link_directories("${PADDLE_LIB}/third_party/install/gflags/lib")
38+
link_directories("${PADDLE_LIB}/third_party/install/zlib/lib")
39+
40+
add_executable(${DEMO_NAME} ${DEMO_NAME}.cc)
41+
42+
if(WITH_MKL)
43+
include_directories("${PADDLE_LIB}/third_party/install/mklml/include")
44+
set(MATH_LIB ${PADDLE_LIB}/third_party/install/mklml/lib/libmklml_intel.so
45+
${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5.so)
46+
set(MKLDNN_PATH "${PADDLE_LIB}/third_party/install/mkldnn")
47+
if(EXISTS ${MKLDNN_PATH})
48+
include_directories("${MKLDNN_PATH}/include")
49+
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
50+
endif()
51+
else()
52+
set(MATH_LIB ${PADDLE_LIB}/third_party/install/openblas/lib/libopenblas.a)
53+
endif()
54+
55+
if(WITH_STATIC_LIB)
56+
set(DEPS
57+
"-Wl,--whole-archive"
58+
${PADDLE_LIB}/paddle/fluid/inference/libpaddle_fluid.a
59+
"-Wl,--no-whole-archive"
60+
${PADDLE_LIB}/contrib/inference/libpaddle_inference_api.a)
61+
else()
62+
# Note: libpaddle_inference_api.so must put before libpaddle_fluid.so
63+
set(DEPS
64+
${PADDLE_LIB}/contrib/inference/libpaddle_inference_api.so
65+
${PADDLE_LIB}/paddle/fluid/inference/libpaddle_fluid.so)
66+
endif()
67+
set(EXTERNAL_LIB "-lrt -ldl -lpthread")
68+
69+
set(DEPS ${DEPS}
70+
${MATH_LIB} ${MKLDNN_LIB}
71+
glog gflags protobuf snappystream snappy z
72+
${EXTERNAL_LIB})
73+
if(WITH_GPU)
74+
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart.so)
75+
endif()
76+
77+
target_link_libraries(${DEMO_NAME} ${DEPS})
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
set -x
2+
PADDLE_ROOT=$1
3+
WITH_MKL=$2
4+
WITH_GPU=$3
5+
if [ $3 == "ON" ]; then
6+
use_gpu_list='true false'
7+
else
8+
use_gpu_list='false'
9+
fi
10+
11+
mkdir -p build
12+
cd build
13+
14+
for WITH_STATIC_LIB in false; do
15+
rm -rf *
16+
cmake .. -DPADDLE_LIB=${PADDLE_ROOT}/build/fluid_install_dir/ \
17+
-DWITH_MKL=$WITH_MKL \
18+
-DDEMO_NAME=simple_on_word2vec \
19+
-DWITH_GPU=$WITH_GPU \
20+
-DWITH_STATIC_LIB=$WITH_STATIC_LIB
21+
make
22+
for use_gpu in $use_gpu_list; do
23+
./simple_on_word2vec \
24+
--dirname=${PADDLE_ROOT}/build/python/paddle/fluid/tests/book/word2vec.inference.model \
25+
--use_gpu=$use_gpu
26+
done
27+
done
28+
if [ $? -eq 0 ]; then
29+
exit 0
30+
else
31+
echo "inference demo runs fail."
32+
exit 1
33+
fi
34+
set +x

0 commit comments

Comments
 (0)