Skip to content

Commit 03bfd76

Browse files
committed
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into lstm_fix
2 parents 1f53a72 + a343504 commit 03bfd76

File tree

77 files changed

+2075
-823
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+2075
-823
lines changed

benchmark/IntelOptimizedPaddle.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Benchmark
2+
3+
Machine:
4+
5+
- Server
6+
- Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, 2 Sockets, 20 Cores per socket
7+
- Laptop
8+
- DELL XPS15-9560-R1745: i7-7700HQ 8G 256GSSD
9+
- i5 MacBook Pro (Retina, 13-inch, Early 2015)
10+
- Desktop
11+
- i7-6700k
12+
13+
System: CentOS release 6.3 (Final), Docker 1.12.1.
14+
15+
PaddlePaddle: paddlepaddle/paddle:latest (TODO: will rerun after 0.11.0)
16+
17+
- MKL-DNN tag v0.10
18+
- MKLML 2018.0.20170720
19+
- OpenBLAS v0.2.20
20+
21+
On each machine, we will test and compare the performance of training on single node using MKL-DNN / MKLML / OpenBLAS respectively.
22+
23+
## Benchmark Model
24+
25+
### Server
26+
Test on batch size 64, 128, 256 on Intel(R) Xeon(R) Gold 6148M CPU @ 2.40GHz
27+
28+
Input image size - 3 * 224 * 224, Time: images/second
29+
30+
- VGG-19
31+
32+
| BatchSize | 64 | 128 | 256 |
33+
|--------------|-------| -----| --------|
34+
| OpenBLAS | 7.82 | 8.62 | 10.34 |
35+
| MKLML | 11.02 | 12.86 | 15.33 |
36+
| MKL-DNN | 27.69 | 28.8 | 29.27 |
37+
38+
39+
chart on batch size 128
40+
TBD
41+
42+
- ResNet
43+
- GoogLeNet
44+
45+
### Laptop
46+
TBD
47+
### Desktop
48+
TBD
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# 构建iOS平台上的PaddlePaddle库
2+
交叉编译iOS平台上适用的PaddlePaddle库,需要在MacOS系统上进行。本文的将介绍在MacOS上,从源码交叉编译iOS平台上适用的PaddlePaddle库。
3+
4+
## 准备交叉编译环境
5+
Apple官方为iOS开发提供了完整的交叉编译工具和集成开发环境,用户从App Store下载安装Xcode即可。也可自行前往官网下载,[Xcode](https://developer.apple.com/cn/xcode/)。安装完成之后,可在命令行执行`xcodebuild -version`,判断是否安装成功。
6+
7+
```bash
8+
$ xcodebuild -version
9+
Xcode 9.0
10+
Build version 9A235
11+
```
12+
13+
## 配置交叉编译参数
14+
15+
PaddlePaddle为交叉编译提供了工具链配置文档[cmake/cross_compiling/ios.cmake](https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cross_compiling/ios.cmake),以提供一些默认的编译器和编译参数配置。
16+
17+
交叉编译iOS版本的PaddlePaddle库时,有一些必须配置的参数:
18+
19+
- `CMAKE_SYSTEM_NAME`,CMake编译的目标平台,必须设置为`iOS`。在设置`CMAKE_SYSTEM_NAME=iOS`后,PaddlePaddle的CMake系统会自动编译所有的第三方依赖库,并且强制设置一些PaddlePaddle参数的值(`WITH_C_API=ON``WITH_GPU=OFF``WITH_AVX=OFF``WITH_PYTHON=OFF``WITH_RDMA=OFF`)。
20+
- `WITH_C_API`,是否编译C-API预测库,必须设置为ON。在iOS平台上只支持使用C-API来预测。
21+
- `WITH_SWIG_PY`,必须设置为ON。在iOS平台上不支持通过swig调用来训练或者预测。
22+
23+
iOS平台可选配置参数:
24+
25+
- `IOS_PLATFORM`,可设置为`OS/SIMULATOR`,默认值为`OS`
26+
- `OS`,构建目标为`arm`架构的iPhone或者iPad等物理设备。
27+
- `SIMULATOR`,构建目标为`x86`架构的模拟器平台。
28+
- `IOS_ARCH`,目标架构。针对不同的`IOS_PLATFORM`,可设置的目标架构如下表所示:
29+
30+
| IOS_PLATFORM | IOS_ARCH |
31+
|--------------|----------------------|
32+
| OS | armv7, armv7s, arm64 (默认) |
33+
| SIMULATOR | i386, x86_64 (默认) |
34+
35+
- `IOS_DEPLOYMENT_TARGET`,最小的iOS部署版本,默认值为`7.0`
36+
- `IOS_ENABLE_BITCODE`,是否使能[Bitcode](https://developer.apple.com/library/content/documentation/IDEs/Conceptual/AppDistributionGuide/AppThinning/AppThinning.html#//apple_ref/doc/uid/TP40012582-CH35-SW3),可设置`ON/OFF`,默认值为`ON`
37+
- `IOS_USE_VECLIB_FOR_BLAS`,是否使用[vecLib](https://developer.apple.com/documentation/accelerate/veclib)框架进行BLAS矩阵计算,可设置`ON/OFF`,默认值为`OFF`
38+
- `IOS_DEVELOPMENT_ROOT``Developer`目录,可显式指定为`/path/to/platform/Developer`。若未显式指定,PaddlePaddle将会根据`IOS_PLATFORM`自动选择`Xcode`对应`platform``Developer`目录。
39+
- `IOS_SDK_ROOT`,所使用`SDK`的根目录,可显式指定为`/path/to/platform/Developer/SDKs/SDK`。若未显式指定,PaddlePaddle将会自动选择`IOS_DEVELOPMENT_ROOT`目录下最新的`SDK`版本。
40+
41+
其他配置参数:
42+
43+
- `USE_EIGEN_FOR_BLAS`,是否使用Eigen库进行矩阵计算,在`IOS_USE_VECLIB_FOR_BLAS=OFF`时有效。可设置`ON/OFF`,默认值为`OFF`
44+
- `HOST_C/CXX_COMPILER`,宿主机的C/C++编译器。默认值为环境变量`CC/CXX`的值;若环境变量`CC/CXX`未设置,则使用`cc/c++`编译器。
45+
46+
常用的cmake配置如下:
47+
48+
```bash
49+
cmake -DCMAKE_SYSTEM_NAME=iOS \
50+
-DIOS_PLATFORM=OS \
51+
-DIOS_ARCH="arm64" \
52+
-DIOS_ENABLE_BITCODE=ON \
53+
-DIOS_USE_VECLIB_FOR_BLAS=ON \
54+
-DCMAKE_INSTALL_PREFIX=your/path/to/install \
55+
-DWITH_C_API=ON \
56+
-DWITH_TESTING=OFF \
57+
-DWITH_SWIG_PY=OFF \
58+
..
59+
```
60+
61+
```bash
62+
cmake -DCMAKE_SYSTEM_NAME=iOS \
63+
-DIOS_PLATFORM=SIMULATOR \
64+
-DIOS_ARCH="x86_64" \
65+
-DIOS_USE_VECLIB_FOR_BLAS=ON \
66+
-DCMAKE_INSTALL_PREFIX=your/path/to/install \
67+
-DWITH_C_API=ON \
68+
-DWITH_TESTING=OFF \
69+
-DWITH_SWIG_PY=OFF \
70+
..
71+
```
72+
73+
用户还可根据自己的需求设置其他编译参数。比如希望最小化生成库的大小,可以设置`CMAKE_BUILD_TYPE``MinSizeRel`;若希望得到最快的执行速度,则可设置`CMAKE_BUILD_TYPE``Release`。亦可以通过手动设置`CMAKE_C/CXX_FLAGS`来影响PaddlePaddle的编译过程。
74+
75+
**性能TIPS**,为了达到最快的计算速度,在CMake参数配置上,有以下建议:
76+
77+
- 设置`CMAKE_BUILD_TYPE``Release`
78+
- 设置`IOS_USE_VECLIB_FOR_BLAS=ON`,调用`vecLib`框架提供的BLAS函数进行矩阵计算。
79+
80+
## 编译和安装
81+
82+
CMake配置完成后,执行以下命令,PaddlePaddle将自动下载和编译所有第三方依赖库、编译和安装PaddlePaddle预测库。
83+
84+
```
85+
$ make
86+
$ make install
87+
```
88+
89+
注意:如果你曾在源码目录下编译过其他平台的PaddlePaddle库,请先使用`rm -rf`命令删除`third_party`目录和`build`目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。
90+
91+
执行完安装命令后,`your/path/to/install`目录中会包含以下内容:
92+
93+
- `include`目录,其中包含所有C-API的头文件
94+
- `lib`目录,其中包含PaddlePaddle的C-API静态库
95+
- `third_party`目录,其中包含所依赖的所有第三方库
96+
97+
注意,不同架构的PaddlePaddle库建议安装到不同的目录下,然后使用`lipo`工具将多个静态库合并成一个支持多个架构的fat库。
98+
99+
自此,PaddlePaddle库已经安装完成,用户可将合成的fat库用于深度学习相关的iOS App中,调用方法见C-API文档。

doc/howto/cross_compiling/cross_compiling_for_raspberry_cn.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,4 +59,4 @@ make install
5959

6060
注意:如果你曾经在源码目录下编译过其他平台的PaddlePaddle库,请先使用`rm -rf`命令删除`third_party`目录和`build`目录,以确保所有的第三方依赖库和PaddlePaddle代码都是针对新的CMake配置重新编译的。
6161

62-
执行完安装命令后,`your/path/to/install`目录中会包含`include``lib`目录,其中`include`中包含C-API的头文件,`lib`中包含一个Raspberry Pi版本的库。
62+
执行完安装命令后,`your/path/to/install`目录中会包含`include``lib`目录,其中`include`中包含C-API的头文件,`lib`中包含一个Raspberry Pi版本的库。

doc/howto/cross_compiling/cross_compiling_for_raspberry_en.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ cmake -DCMAKE_SYSTEM_NAME=RPi \
4444
..
4545
```
4646

47-
To build the inference library, please set the argument WITH_API to ON: `WITH_C_API=ON`.
47+
To build the inference library, please set the argument WITH\_C\_API to ON: `WITH_C_API=ON`.
4848

4949
You can add more arguments. For example, to minimize the size of the generated inference library, you may use `CMAKE_BUILD_TYPE=MinSizeRel`. For performance optimization, you may use `CMAKE_BUILD_TYPE=Release`.
5050

paddle/framework/attribute.cc

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ limitations under the License. */
1919
namespace paddle {
2020
namespace framework {
2121

22-
Attribute GetAttrValue(const OpDesc::Attr& attr_desc, ProgramDesc* program) {
22+
Attribute GetAttrValue(const OpDesc::Attr& attr_desc) {
2323
switch (attr_desc.type()) {
2424
case framework::AttrType::BOOLEAN: {
2525
return attr_desc.b();
@@ -61,13 +61,9 @@ Attribute GetAttrValue(const OpDesc::Attr& attr_desc, ProgramDesc* program) {
6161
}
6262
return val;
6363
}
64-
case framework::AttrType::BLOCK: {
65-
PADDLE_ENFORCE(program != nullptr,
66-
"Need to specify ProgramDesc when get a block attr");
67-
return program->mutable_blocks(attr_desc.block_idx());
68-
}
64+
default:
65+
PADDLE_THROW("Unsupport attr type %d", attr_desc.type());
6966
}
70-
PADDLE_ENFORCE(false, "Unknown OpDesc::AttrDesc::type !");
7167
return boost::blank();
7268
}
7369

paddle/framework/attribute.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ inline AttrType AttrTypeID() {
3232
return static_cast<AttrType>(tmp.which() - 1);
3333
}
3434

35-
Attribute GetAttrValue(const OpDesc::Attr& attr_desc, ProgramDesc* desc);
35+
Attribute GetAttrValue(const OpDesc::Attr& attr_desc);
3636

3737
class AttrReader {
3838
public:

paddle/framework/backward.cc

Lines changed: 30 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
#include <deque>
1919
#include <list>
2020
#include <memory>
21+
#include <unordered_set>
2122

2223
#include "paddle/framework/block_desc.h"
2324
#include "paddle/framework/op_registry.h"
@@ -285,6 +286,15 @@ static bool AllGradInSet(const std::vector<std::string>& names,
285286
return true;
286287
}
287288

289+
static std::string FwdName(const std::string& grad_name) {
290+
auto pos = grad_name.find("@GRAD");
291+
if (pos == std::string::npos) {
292+
return "";
293+
} else {
294+
return grad_name.substr(0, pos);
295+
}
296+
}
297+
288298
static void CreateGradVarInBlock(
289299
size_t grad_op_start_index,
290300
const std::unordered_map<std::string, std::string>& param_name_map,
@@ -294,15 +304,15 @@ static void CreateGradVarInBlock(
294304
for (size_t op_index = grad_op_start_index; op_index < ops.size();
295305
++op_index) {
296306
bool need_infer_shape = false;
307+
std::unordered_set<std::string> new_vars;
297308
ForEachVarName(ops[op_index]->Outputs(),
298309
[&](const std::string& grad_var_name) {
299310
if (block_desc->HasVar(grad_var_name)) {
300311
return false;
301312
}
302313
need_infer_shape = true;
303314
auto var = block_desc->Var(grad_var_name);
304-
// FIXME(qiao) infer the datatype
305-
var->SetDataType(framework::DataType::FP32);
315+
new_vars.insert(var->Name());
306316
auto it = param_name_map.find(grad_var_name);
307317
if (it == param_name_map.end()) {
308318
return false;
@@ -316,6 +326,21 @@ static void CreateGradVarInBlock(
316326
});
317327
if (need_infer_shape) {
318328
ops[op_index]->InferVarType(block_desc);
329+
for (auto& arg : ops[op_index]->OutputArgumentNames()) {
330+
if (new_vars.find(arg) == new_vars.end()) {
331+
continue;
332+
}
333+
auto pname = FwdName(arg);
334+
auto* param = block_desc->FindVar(pname);
335+
auto* grad = block_desc->FindVar(arg);
336+
if (param == nullptr) {
337+
LOG(WARNING) << "Cannot find forward variable of " << arg
338+
<< ". Set its gradient to FP32";
339+
grad->SetDataType(DataType::FP32);
340+
} else {
341+
grad->SetDataType(param->GetDataType());
342+
}
343+
}
319344
ops[op_index]->InferShape(*block_desc);
320345
}
321346
}
@@ -368,7 +393,7 @@ std::vector<std::unique_ptr<OpDescBind>> MakeBlockBackward(
368393
ProgramDescBind& program_desc, int block_idx,
369394
std::unordered_set<std::string>* no_grad_vars,
370395
std::unordered_map<std::string, std::string>* grad_to_var) {
371-
BlockDescBind* cur_block = program_desc.Block(block_idx);
396+
BlockDescBind* cur_block = program_desc.MutableBlock(block_idx);
372397
std::vector<OpDescBind*> op_descs = cur_block->AllOps();
373398
std::unordered_map<std::string, std::vector<size_t>> dup_out_ops;
374399
size_t grad_desc_idx = 0;
@@ -443,7 +468,7 @@ ParamGradInfoMap AppendBackward(
443468
}
444469

445470
const int root_block_idx = 0;
446-
auto root_block = program_desc.Block(root_block_idx);
471+
auto root_block = program_desc.MutableBlock(root_block_idx);
447472

448473
// insert fill one op for target
449474
// TODO(qiao) add some check to the target.
@@ -492,7 +517,7 @@ ParamGradInfoMap AppendBackward(
492517
CreateGradVarInBlock(forward_op_num, grad_to_var, root_block, &retv);
493518
for (size_t block_index = forward_block_num;
494519
block_index < program_desc.Size(); ++block_index) {
495-
CreateGradVarInBlock(0, grad_to_var, program_desc.Block(block_index),
520+
CreateGradVarInBlock(0, grad_to_var, program_desc.MutableBlock(block_index),
496521
&retv);
497522
}
498523
return retv;

paddle/framework/backward_test.cc

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -499,7 +499,7 @@ TEST(Backward, linear_net_intermediate_variable_has_no_grad) {
499499

500500
TEST(Backward, simple_single_op) {
501501
f::ProgramDescBind program;
502-
f::BlockDescBind *block = program.Block(0);
502+
f::BlockDescBind *block = program.MutableBlock(0);
503503

504504
f::OpDescBind *op = block->AppendOp();
505505
op->SetType("rowwise_add");
@@ -535,7 +535,7 @@ TEST(Backward, simple_single_op) {
535535

536536
TEST(Backward, default_attribute) {
537537
f::ProgramDescBind program;
538-
f::BlockDescBind *block = program.Block(0);
538+
f::BlockDescBind *block = program.MutableBlock(0);
539539
f::OpDescBind *op = block->AppendOp();
540540
op->SetType("mul");
541541
op->SetInput("X", {"x"});
@@ -561,7 +561,7 @@ TEST(Backward, default_attribute) {
561561

562562
TEST(Backward, simple_mult_op) {
563563
f::ProgramDescBind program;
564-
f::BlockDescBind *block = program.Block(0);
564+
f::BlockDescBind *block = program.MutableBlock(0);
565565
f::OpDescBind *op1 = block->AppendOp();
566566
op1->SetType("rowwise_add");
567567
op1->SetInput("X", {"x1"});
@@ -644,7 +644,7 @@ TEST(Backward, simple_mult_op) {
644644

645645
TEST(Backward, intermedia_var_no_grad) {
646646
f::ProgramDescBind program;
647-
f::BlockDescBind *block = program.Block(0);
647+
f::BlockDescBind *block = program.MutableBlock(0);
648648
f::OpDescBind *op1 = block->AppendOp();
649649
op1->SetType("rowwise_add");
650650
op1->SetInput("X", {"x1"});
@@ -714,7 +714,7 @@ TEST(Backward, intermedia_var_no_grad) {
714714

715715
TEST(Backward, var_no_grad) {
716716
f::ProgramDescBind program;
717-
f::BlockDescBind *block = program.Block(0);
717+
f::BlockDescBind *block = program.MutableBlock(0);
718718
f::OpDescBind *op1 = block->AppendOp();
719719
op1->SetType("mult_in_out");
720720
op1->SetInput("X", {"x1"});
@@ -790,7 +790,7 @@ TEST(Backward, var_no_grad) {
790790

791791
TEST(Backward, shared_var) {
792792
f::ProgramDescBind program;
793-
f::BlockDescBind *block = program.Block(0);
793+
f::BlockDescBind *block = program.MutableBlock(0);
794794
f::OpDescBind *op1 = block->AppendOp();
795795
op1->SetType("rowwise_add");
796796
op1->SetInput("X", {"x1"});
@@ -880,7 +880,7 @@ TEST(Backward, shared_var) {
880880

881881
TEST(Backward, half_backward) {
882882
f::ProgramDescBind program;
883-
f::BlockDescBind *block = program.Block(0);
883+
f::BlockDescBind *block = program.MutableBlock(0);
884884
auto *op1 = block->AppendOp();
885885
op1->SetType("minus");
886886
op1->SetInput("X", {"a"});

paddle/framework/block_desc.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ BlockDescBind *BlockDescBind::ParentBlock() const {
113113
if (this->desc_->parent_idx() == kNoneBlockIndex) {
114114
return nullptr;
115115
}
116-
return prog_->Block(static_cast<size_t>(this->desc_->parent_idx()));
116+
return prog_->MutableBlock(static_cast<size_t>(this->desc_->parent_idx()));
117117
}
118118

119119
BlockDesc *BlockDescBind::Proto() {

0 commit comments

Comments
 (0)