Skip to content

Commit 628c8d5

Browse files
committed
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-boost-to-inference
2 parents 2ddca71 + 6af0593 commit 628c8d5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+1196
-321
lines changed

CMakeLists.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ message(STATUS "CXX compiler: ${CMAKE_CXX_COMPILER}, version: "
2525
message(STATUS "C compiler: ${CMAKE_C_COMPILER}, version: "
2626
"${CMAKE_C_COMPILER_ID} ${CMAKE_C_COMPILER_VERSION}")
2727

28-
find_package(Sphinx)
2928
if(NOT CMAKE_CROSSCOMPILING)
3029
find_package(CUDA QUIET)
3130
endif(NOT CMAKE_CROSSCOMPILING)
@@ -226,5 +225,7 @@ if(WITH_PYTHON)
226225
endif()
227226

228227
if(WITH_DOC)
228+
find_package(Sphinx REQUIRED)
229+
find_python_module(recommonmark REQUIRED)
229230
add_subdirectory(doc)
230231
endif()

doc/fluid/design/dist_train/async_update.md

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,34 +4,37 @@
44

55
For the typical synchronous distributed training, some significant steps are as follows:
66

7-
1. A Trainer will compute the gradients and SEND them to the Parameter Server(PServer) nodes.
8-
1. After the PServer node received gradients came from all the Trainers, It will aggregate the
7+
1. A trainer process will compute the gradients and **send** them to the parameter server (PS) nodes.
8+
1. After the PS node received gradients came from all the Trainers, It will aggregate the
99
gradient variables for the same parameter into one gradient variable and then apply the aggregated
1010
gradient to the respective parameter, finally using an optimize algorithms(SGD, Monument...)
1111
to update the parameters.
12-
1. The Trainer would wait for the PServers finished the optimize stage, and GET the parameters from PServer,
12+
1. The Trainer would wait for the PS finished the optimize stage, and GET the parameters from PS,
1313
so all the Trainers would get the same parameters.
1414

15-
In the synchronously distributed training, there should be a `Barrier` to synchronise the
16-
parameters after the optimizing stage. The performance of a distributed training job would
17-
depend on the slowest node if there were hundreds or thousands of training nodes in a
18-
Job, the performance of synchronously distributed training might be very poor because of
19-
the slow node. So this design doc would introduce an approach to implement
20-
*asynchronously* distributed training in PaddlePaddle Fluid.
15+
In Synchronous Distributed Training, there is a **barrier** on each PS to wait until all trainers processes
16+
have completed running current mini-batch. After that, all trainers can continue to run the next
17+
mini-batch. So, we can find that the overall performance of Synchronous Distributed Training depends
18+
on the slowest node.
19+
20+
In Asynchronous Distributed Training, we don't need to wait for a global mini-bach, the optimizer on
21+
the PS will run immediately when the gradient is uploaded to the PS from one trainer. This mode would
22+
train such models that achieve scaling, better throughput. In this design doc, we will introduce how to
23+
implement the Asynchronous Distributed Training base on PaddlePaddle Fluid.
2124

2225
## Design
2326

2427
<img src="./src/async_update.png" width="600"/>
2528

26-
As the figure above, we describe a global view of asynchronously update process and use
29+
As the figure above, we describe a global view of the asynchronous update process and use
2730
the parameter `w1` as an example to introduce the steps:
2831
1. For each gradient variables, they may distribute on different GPU card and aggregate
2932
them while they are all calculated.
30-
1. Split the gradient variable into multiple blocks according to the number of PServer
33+
1. Split the gradient variable into multiple blocks according to the number of PS
3134
instances and then send them.
32-
1. PServer would run an `Optimize Block` using a specified optimize algorithm to update
35+
1. PS would run an `Optimize Block` using a specified optimize algorithm to update
3336
the specified parameter.
34-
1. The trainer will fetch latest parameter from PServer before running forward Op which depends
37+
1. The trainer will fetch the latest parameter from PS before running forward Op which depends
3538
on the specified parameter.
3639
1. Broadcast the received variable into multiple GPU cards and continue to run the next
3740
mini-batch.
@@ -40,8 +43,8 @@ mini-batch.
4043

4144
- For the multiple devices distributed training, we need to aggregate the gradient
4245
variables which placed on different devices firstly and then schedule a `SendVars` Operator to
43-
send the gradient variables to the multiple PServer instances.
44-
- Schedule `FetchVars` operator to fetch the latest parameter from PServer before running
46+
send the gradient variables to the multiple PS instances.
47+
- Schedule `FetchVars` operator to fetch the latest parameter from PS before running
4548
the forward ops.
4649
- There could be a large number of gradient variables to be sent, so we need to use another
4750
thread pool(IO Threadpool) whose a number of the schedulable threads is larger than the

doc/v2/build_and_install/build_from_source_cn.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,9 @@
1919
----------------
2020

2121
PaddlePaddle需要使用Docker环境完成编译,这样可以免去单独安装编译依赖的步骤,可选的不同编译环境Docker镜像
22-
可以在 `这里 <https://hub.docker.com/r/paddlepaddle/paddle_manylinux_devel/tags/>`_ 找到。或者
23-
参考下述可选步骤,从源码中构建用于编译PaddlePaddle的Docker镜像。
22+
可以在 `这里 <https://hub.docker.com/r/paddlepaddle/paddle_manylinux_devel/tags/>`_ 找到,您也可以
23+
在 `这里 <https://github.com/PaddlePaddle/Paddle/tree/develop/tools/manylinux1/>`_ 找到 paddle_manylinux_devel
24+
镜像的编译以及使用方法。或者参考下述可选步骤,从源码中构建用于编译PaddlePaddle的Docker镜像。
2425

2526
如果您选择不使用Docker镜像,则需要在本机安装下面章节列出的 `编译依赖`_ 之后才能开始编译的步骤。
2627

doc/v2/build_and_install/build_from_source_en.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ How To Build
2222
You need to use Docker to build PaddlePaddle
2323
to avoid installing dependencies by yourself. We have several pre-built
2424
Docker images `here <https://hub.docker.com/r/paddlepaddle/paddle_manylinux_devel/tags/>`_ ,
25+
you can also find how to build and use paddle_manylinux_devel Docker image from
26+
`here <https://github.com/PaddlePaddle/Paddle/tree/develop/tools/manylinux1/>`_
2527
Or you can build your own image from source as the optional step below:
2628

2729
.. code-block:: bash

paddle/fluid/framework/CMakeLists.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,11 @@ proto_library(framework_proto SRCS framework.proto)
55
cc_library(ddim SRCS ddim.cc DEPS eigen3 boost)
66
cc_test(ddim_test SRCS ddim_test.cc DEPS ddim)
77
nv_test(dim_test SRCS dim_test.cu DEPS ddim)
8-
8+
cc_library(data_type SRCS data_type.cc DEPS framework_proto ddim device_context)
99
if(WITH_GPU)
10-
nv_library(tensor SRCS tensor.cc tensor_util.cu DEPS ddim place memory device_context framework_proto)
10+
nv_library(tensor SRCS tensor.cc tensor_util.cu DEPS place memory data_type)
1111
else()
12-
cc_library(tensor SRCS tensor.cc tensor_util.cc DEPS ddim place memory device_context framework_proto)
12+
cc_library(tensor SRCS tensor.cc tensor_util.cc DEPS place memory data_type)
1313
endif()
1414

1515
cc_test(tensor_test SRCS tensor_test.cc DEPS tensor)

paddle/fluid/framework/data_type.cc

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
#include "paddle/fluid/framework/data_type.h"
16+
#include <stdint.h>
17+
#include <string>
18+
#include <unordered_map>
19+
20+
namespace paddle {
21+
namespace framework {
22+
23+
struct DataTypeMap {
24+
std::unordered_map<std::type_index, proto::VarType::Type> cpp_to_proto_;
25+
std::unordered_map<int, std::type_index> proto_to_cpp_;
26+
std::unordered_map<int, std::string> proto_to_str_;
27+
std::unordered_map<std::type_index, size_t> cpp_to_size_;
28+
};
29+
30+
static DataTypeMap* InitDataTypeMap();
31+
static DataTypeMap& gDataTypeMap() {
32+
static DataTypeMap* g_data_type_map_ = InitDataTypeMap();
33+
return *g_data_type_map_;
34+
}
35+
36+
template <typename T>
37+
static inline void RegisterType(DataTypeMap* map,
38+
proto::VarType::Type proto_type,
39+
const std::string& name) {
40+
map->proto_to_cpp_.emplace(static_cast<int>(proto_type), typeid(T));
41+
map->cpp_to_proto_.emplace(typeid(T), proto_type);
42+
map->proto_to_str_.emplace(static_cast<int>(proto_type), name);
43+
map->cpp_to_size_.emplace(typeid(T), sizeof(T));
44+
}
45+
46+
static DataTypeMap* InitDataTypeMap() {
47+
auto retv = new DataTypeMap();
48+
49+
#define RegType(cc_type, proto_type) \
50+
RegisterType<cc_type>(retv, proto_type, #cc_type)
51+
52+
// NOTE: Add your customize type here.
53+
RegType(platform::float16, proto::VarType::FP16);
54+
RegType(float, proto::VarType::FP32);
55+
RegType(double, proto::VarType::FP64);
56+
RegType(int, proto::VarType::INT32);
57+
RegType(int64_t, proto::VarType::INT64);
58+
RegType(bool, proto::VarType::BOOL);
59+
RegType(size_t, proto::VarType::SIZE_T);
60+
RegType(int16_t, proto::VarType::INT16);
61+
62+
#undef RegType
63+
return retv;
64+
}
65+
66+
proto::VarType::Type ToDataType(std::type_index type) {
67+
auto it = gDataTypeMap().cpp_to_proto_.find(type);
68+
if (it != gDataTypeMap().cpp_to_proto_.end()) {
69+
return it->second;
70+
}
71+
PADDLE_THROW("Not support %s as tensor type", type.name());
72+
}
73+
74+
std::type_index ToTypeIndex(proto::VarType::Type type) {
75+
auto it = gDataTypeMap().proto_to_cpp_.find(static_cast<int>(type));
76+
if (it != gDataTypeMap().proto_to_cpp_.end()) {
77+
return it->second;
78+
}
79+
PADDLE_THROW("Not support proto::VarType::Type(%d) as tensor type",
80+
static_cast<int>(type));
81+
}
82+
83+
std::string DataTypeToString(const proto::VarType::Type type) {
84+
auto it = gDataTypeMap().proto_to_str_.find(static_cast<int>(type));
85+
if (it != gDataTypeMap().proto_to_str_.end()) {
86+
return it->second;
87+
}
88+
PADDLE_THROW("Not support proto::VarType::Type(%d) as tensor type",
89+
static_cast<int>(type));
90+
}
91+
92+
size_t SizeOfType(std::type_index type) {
93+
auto it = gDataTypeMap().cpp_to_size_.find(type);
94+
if (it != gDataTypeMap().cpp_to_size_.end()) {
95+
return it->second;
96+
}
97+
PADDLE_THROW("Not support %s as tensor type", type.name());
98+
}
99+
100+
} // namespace framework
101+
} // namespace paddle

paddle/fluid/framework/data_type.h

Lines changed: 5 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -17,51 +17,14 @@ limitations under the License. */
1717
#include <typeindex>
1818
#include "paddle/fluid/framework/framework.pb.h"
1919
#include "paddle/fluid/platform/enforce.h"
20+
2021
#include "paddle/fluid/platform/float16.h"
2122

2223
namespace paddle {
2324
namespace framework {
2425

25-
inline proto::VarType::Type ToDataType(std::type_index type) {
26-
if (typeid(platform::float16).hash_code() == type.hash_code()) {
27-
return proto::VarType::FP16;
28-
} else if (typeid(const float).hash_code() == type.hash_code()) {
29-
// CPPLint complains Using C-style cast. Use static_cast<float>() instead
30-
// One fix to this is to replace float with const float because
31-
// typeid(T) == typeid(const T)
32-
// http://en.cppreference.com/w/cpp/language/typeid
33-
return proto::VarType::FP32;
34-
} else if (typeid(const double).hash_code() == type.hash_code()) {
35-
return proto::VarType::FP64;
36-
} else if (typeid(const int).hash_code() == type.hash_code()) {
37-
return proto::VarType::INT32;
38-
} else if (typeid(const int64_t).hash_code() == type.hash_code()) {
39-
return proto::VarType::INT64;
40-
} else if (typeid(const bool).hash_code() == type.hash_code()) {
41-
return proto::VarType::BOOL;
42-
} else {
43-
PADDLE_THROW("Not supported");
44-
}
45-
}
46-
47-
inline std::type_index ToTypeIndex(proto::VarType::Type type) {
48-
switch (type) {
49-
case proto::VarType::FP16:
50-
return typeid(platform::float16);
51-
case proto::VarType::FP32:
52-
return typeid(float);
53-
case proto::VarType::FP64:
54-
return typeid(double);
55-
case proto::VarType::INT32:
56-
return typeid(int);
57-
case proto::VarType::INT64:
58-
return typeid(int64_t);
59-
case proto::VarType::BOOL:
60-
return typeid(bool);
61-
default:
62-
PADDLE_THROW("Not support type %d", type);
63-
}
64-
}
26+
extern proto::VarType::Type ToDataType(std::type_index type);
27+
extern std::type_index ToTypeIndex(proto::VarType::Type type);
6528

6629
template <typename Visitor>
6730
inline void VisitDataType(proto::VarType::Type type, Visitor visitor) {
@@ -89,32 +52,12 @@ inline void VisitDataType(proto::VarType::Type type, Visitor visitor) {
8952
}
9053
}
9154

92-
inline std::string DataTypeToString(const proto::VarType::Type type) {
93-
switch (type) {
94-
case proto::VarType::FP16:
95-
return "float16";
96-
case proto::VarType::FP32:
97-
return "float32";
98-
case proto::VarType::FP64:
99-
return "float64";
100-
case proto::VarType::INT16:
101-
return "int16";
102-
case proto::VarType::INT32:
103-
return "int32";
104-
case proto::VarType::INT64:
105-
return "int64";
106-
case proto::VarType::BOOL:
107-
return "bool";
108-
default:
109-
PADDLE_THROW("Not support type %d", type);
110-
}
111-
}
112-
55+
extern std::string DataTypeToString(const proto::VarType::Type type);
56+
extern size_t SizeOfType(std::type_index type);
11357
inline std::ostream& operator<<(std::ostream& out,
11458
const proto::VarType::Type& type) {
11559
out << DataTypeToString(type);
11660
return out;
11761
}
118-
11962
} // namespace framework
12063
} // namespace paddle

paddle/fluid/framework/framework.proto

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,8 @@ message VarType {
101101
FP16 = 4;
102102
FP32 = 5;
103103
FP64 = 6;
104+
// Tensor<size_t> is used in C++.
105+
SIZE_T = 19;
104106

105107
// Other types that may need additional descriptions
106108
LOD_TENSOR = 7;

paddle/fluid/framework/op_kernel_type_test.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ TEST(OpKernelType, ToString) {
2727
LibraryType::kCUDNN);
2828

2929
ASSERT_EQ(paddle::framework::KernelTypeToString(op_kernel_type),
30-
"data_type[float32]:data_layout[NCHW]:place[CPUPlace]:library_type["
30+
"data_type[float]:data_layout[NCHW]:place[CPUPlace]:library_type["
3131
"CUDNN]");
3232
}
3333

paddle/fluid/framework/operator.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,10 @@ class ExecutionContext {
192192
return op_.Attr<T>(name);
193193
}
194194

195+
bool HasInput(const std::string& name) const { return op_.HasInputs(name); }
196+
197+
bool HasOutput(const std::string& name) const { return op_.HasOutputs(name); }
198+
195199
size_t InputSize(const std::string& name) const {
196200
return op_.Inputs(name).size();
197201
}

0 commit comments

Comments
 (0)