Skip to content

Commit 7bb67b6

Browse files
committed
Merge remote-tracking branch 'ups/develop' into feature/libxsmm
2 parents e8ae020 + 938920c commit 7bb67b6

32 files changed

+388
-275
lines changed

CMakeLists.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,11 @@ if(ANDROID OR IOS)
103103
add_definitions(-DPADDLE_MOBILE_INFERENCE)
104104
endif()
105105

106+
if (APPLE OR WIN32)
107+
set(WITH_MKL OFF CACHE STRING
108+
"Disable MKL for building on mac and windows" FORCE)
109+
endif()
110+
106111
set(THIRD_PARTY_PATH "${CMAKE_BINARY_DIR}/third_party" CACHE STRING
107112
"A path setting third party libraries download & build directories.")
108113

cmake/external/anakin.cmake

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,17 @@ set(ANAKIN_INSTALL_DIR "${THIRD_PARTY_PATH}/install/anakin" CACHE PATH
77
set(ANAKIN_INCLUDE "${ANAKIN_INSTALL_DIR}" CACHE STRING "root of Anakin header files")
88
set(ANAKIN_LIBRARY "${ANAKIN_INSTALL_DIR}" CACHE STRING "path of Anakin library")
99

10-
set(ANAKIN_COMPILE_EXTRA_FLAGS -Wno-error=unused-variable -Wno-error=format-extra-args -Wno-error=comment -Wno-error=format -Wno-error=switch -Wno-error=return-type -Wno-error=non-virtual-dtor -Wno-reorder -Wno-error=cpp)
10+
set(ANAKIN_COMPILE_EXTRA_FLAGS
11+
-Wno-error=unused-variable -Wno-unused-variable
12+
-Wno-error=format-extra-args -Wno-format-extra-args
13+
-Wno-error=comment -Wno-comment
14+
-Wno-error=format -Wno-format
15+
-Wno-error=switch -Wno-switch
16+
-Wno-error=return-type -Wno-return-type
17+
-Wno-error=non-virtual-dtor -Wno-non-virtual-dtor
18+
-Wno-sign-compare
19+
-Wno-reorder
20+
-Wno-error=cpp)
1121

1222
set(ANAKIN_LIBRARY_URL "https://github.com/pangge/Anakin/releases/download/3.0/anakin_release_simple.tar.gz")
1323

doc/v2/howto/capi/workflow_of_capi_cn.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,9 @@
2828

2929
### 准备预测模型
3030

31-
准备预测模型部分,我们以手写数字识别任务为例进行介绍。手写数字识别任务定义了一个含有[两个隐层的简单全连接网络](https://github.com/PaddlePaddle/book/blob/develop/02.recognize_digits/README.cn.md#softmax回归softmax-regression),网络接受一幅图片作为输入,将图片分类到 0 ~ 9 类别标签之一。完整代码可以查看[此目录](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/capi/examples/model_inference/dense) 中的相关脚本。
31+
准备预测模型部分,我们以手写数字识别任务为例进行介绍。手写数字识别任务定义了一个含有[两个隐层的简单全连接网络](https://github.com/PaddlePaddle/book/blob/develop/02.recognize_digits/README.cn.md#softmax回归softmax-regression),网络接受一幅图片作为输入,将图片分类到 0 ~ 9 类别标签之一。完整代码可以查看[此目录](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/legacy/capi/examples/model_inference/dense) 中的相关脚本。
3232

33-
调用C-API开发预测程序需要一个训练好的模型,运行[MNIST手写数字识别目录](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/capi/examples/model_inference/dense)下的[mnist_v2.py](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/examples/model_inference/dense/mnist_v2.py)脚本,在终端执行`python mnist_v2.py`,会使用 PaddlePaddle 内置的 [MNIST 数据集](http://yann.lecun.com/exdb/mnist/)进行训练。训练好的模型默认保存在当前运行目录下的`models`目录中。
33+
调用C-API开发预测程序需要一个训练好的模型,运行[MNIST手写数字识别目录](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/legacy/capi/examples/model_inference/dense)下的[mnist_v2.py](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/legacy/capi/examples/model_inference/dense/mnist_v2.py)脚本,在终端执行`python mnist_v2.py`,会使用 PaddlePaddle 内置的 [MNIST 数据集](http://yann.lecun.com/exdb/mnist/)进行训练。训练好的模型默认保存在当前运行目录下的`models`目录中。
3434

3535
下面,我们将训练结束后存储下来的模型转换成预测模型。
3636

@@ -48,7 +48,7 @@
4848
dump_v2_config(predict, "trainer_config.bin", True)
4949
```
5050

51-
对[手写数字识别](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/capi/examples/model_inference/dense)这个示例,[`mnist_v2.py`](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/capi/examples/model_inference/dense/mnist_v2.py)脚本集成了序列化神经网络结构的过程,可以直接运行 `python mnist_v2.py --task dump_config` 对神经网络结构进行序列化,结果会写入当前运行目录下的`trainer_config.bin`文件中。
51+
对[手写数字识别](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/legacy/capi/examples/model_inference/dense)这个示例,[`mnist_v2.py`](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/legacy/capi/examples/model_inference/dense/mnist_v2.py)脚本集成了序列化神经网络结构的过程,可以直接运行 `python mnist_v2.py --task dump_config` 对神经网络结构进行序列化,结果会写入当前运行目录下的`trainer_config.bin`文件中。
5252

5353
使用这种方式,需要**在运行时将神经网络的多个可学习参数放在同一个目录中**,C-API可以通过分别指定序列化后的网络结构文件和参数目录来加载训练好的模型。
5454

@@ -68,7 +68,7 @@
6868
merge_v2_model(net, param_file, output_file)
6969
```
7070

71-
对[手写数字识别](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/capi/examples/model_inference/dense)这个示例,可直接运行 `python` [merge_v2_model.py](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/capi/examples/model_inference/dense/merge_v2_model.py)。序列化结果会写入当前运行目录下的`output.paddle.model`文件中。使用这种方式,运行时C-API可以通过指定`output.paddle.model`文件的路径来加载预测模型。
71+
对[手写数字识别](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/legacy/capi/examples/model_inference/dense)这个示例,可直接运行 `python` [merge_v2_model.py](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/legacy/capi/examples/model_inference/dense/merge_v2_model.py)。序列化结果会写入当前运行目录下的`output.paddle.model`文件中。使用这种方式,运行时C-API可以通过指定`output.paddle.model`文件的路径来加载预测模型。
7272

7373
#### 注意事项
7474
1. 为使用C-API,在调用`dump_v2_config`序列化神经网络结构时,参数`binary`必须指定为`True`
@@ -77,19 +77,19 @@
7777

7878
### 编写预测代码
7979

80-
预测代码更多详细示例代码请参考[C-API使用示例](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/capi/examples/model_inference) 目录下的代码示例。这一节对图1中预测代码编写的5个步骤进行介绍和说明。
80+
预测代码更多详细示例代码请参考[C-API使用示例](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/legacy/capi/examples/model_inference) 目录下的代码示例。这一节对图1中预测代码编写的5个步骤进行介绍和说明。
8181

8282
#### step 1. 初始化PaddlePaddle运行环境
83-
第一步需调用[`paddle_init`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/main.h#L27) 初始化PaddlePaddle运行环境,该接口接受两个参数:参数的个数和参数列表。
83+
第一步需调用[`paddle_init`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/legacy/capi/main.h#L27) 初始化PaddlePaddle运行环境,该接口接受两个参数:参数的个数和参数列表。
8484

8585
#### step2. 加载模型
8686

8787
这里介绍C-API使用中的一个重要概念:Gradient Machine。
8888

8989
概念上,在 PaddlePaddle 内部,一个GradientMachine类的对象管理着一组计算层(PaddlePaddle Layers)来完成前向和反向计算,并处理与之相关的所有细节。在调用C-API预测时,只需进行前向计算而无需调用反向计算。这篇文档之后部分会使用`gradient machine`来特指调用PaddlePaddle C-API创建的GradientMachine类的对象。每一个 `gradient machine` 都会管理维护一份训练好的模型,下面是C-API提供的,两种常用的模型加载方式:
9090

91-
1. 调用[`paddle_gradient_machine_load_parameter_from_disk`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/gradient_machine.h#L61)接口,从磁盘加载预测模型。这时`gradient machine`会独立拥有一份训练好的模型;
92-
1. 调用[`paddle_gradient_machine_create_shared_param`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/gradient_machine.h#L88)接口,与其它`gradient machine`的共享已经加载的预测模型。这种情况多出现在使用多线程预测时,通过多个线程共享同一个模型来减少内存开销。可参考[此示例](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/examples/model_inference/multi_thread/main.c)。
91+
1. 调用[`paddle_gradient_machine_load_parameter_from_disk`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/legacy/capi/gradient_machine.h#L61)接口,从磁盘加载预测模型。这时`gradient machine`会独立拥有一份训练好的模型;
92+
1. 调用[`paddle_gradient_machine_create_shared_param`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/legacy/capi/gradient_machine.h#L88)接口,与其它`gradient machine`的共享已经加载的预测模型。这种情况多出现在使用多线程预测时,通过多个线程共享同一个模型来减少内存开销。可参考[此示例](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/legacy/capi/examples/model_inference/multi_thread/main.c)。
9393

9494
- 注意事项
9595

@@ -117,7 +117,7 @@ C-API支持的所有输入数据类型和他们的组织方式,请参考“输
117117

118118
#### step 4. 前向计算
119119

120-
完成上述准备之后,通过调用 [`paddle_gradient_machine_forward`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/gradient_machine.h#L73) 接口完成神经网络的前向计算。
120+
完成上述准备之后,通过调用 [`paddle_gradient_machine_forward`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/legacy/capi/gradient_machine.h#L73) 接口完成神经网络的前向计算。
121121

122122
#### step 5. 清理
123123

paddle/contrib/inference/CMakeLists.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,9 @@ cc_library(paddle_inference_api
4949
# Here the shared library doesn't depend on other fluid libraries, or double free will occur.
5050
cc_library(paddle_inference_api_shared SHARED
5151
SRCS paddle_inference_api.cc paddle_inference_api_impl.cc)
52+
add_dependencies(paddle_inference_api_shared ${FLUID_CORE_MODULES} ${GLOB_OP_LIB})
5253
set_target_properties(paddle_inference_api_shared PROPERTIES OUTPUT_NAME paddle_inference_api)
54+
5355
if(NOT APPLE)
5456
set(LINK_FLAGS "-fPIC -fvisibility=hidden")
5557
set_target_properties(paddle_inference_api_shared PROPERTIES LINK_FLAGS "${LINK_FLAGS}")

paddle/contrib/inference/test_paddle_inference_api_impl.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -249,7 +249,7 @@ void MainThreadsImageClassification(bool use_gpu) {
249249
const size_t len = local_outputs[0].data.length();
250250
float* data = static_cast<float*>(local_outputs[0].data.data());
251251
float* ref_data = refs[tid].data<float>();
252-
EXPECT_EQ(refs[tid].numel(), len / sizeof(float));
252+
EXPECT_EQ((size_t)refs[tid].numel(), len / sizeof(float));
253253
for (int i = 0; i < refs[tid].numel(); ++i) {
254254
EXPECT_NEAR(ref_data[i], data[i], 1e-3);
255255
}

paddle/fluid/framework/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ cc_test(lod_tensor_test SRCS lod_tensor_test.cc DEPS lod_tensor memory)
2727
nv_test(lod_tensor_gpu_test SRCS lod_tensor_test.cu DEPS lod_tensor)
2828

2929
cc_library(reader SRCS reader.cc DEPS lod_tensor ddim)
30+
cc_test(reader_test SRCS reader_test.cc DEPS reader)
3031

3132
cc_test(variable_test SRCS variable_test.cc)
3233

paddle/fluid/framework/reader.cc

Lines changed: 46 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -13,29 +13,61 @@
1313
// limitations under the License.
1414

1515
#include "paddle/fluid/framework/reader.h"
16+
#include <deque>
1617

1718
namespace paddle {
1819
namespace framework {
19-
ReaderBase::~ReaderBase() {}
2020

21-
FileReader::FileReader(const std::vector<DDim> &dims) : dims_(dims) {}
22-
23-
void FileReader::ReadNext(std::vector<LoDTensor> *out) {
21+
void ReaderBase::ReadNext(std::vector<LoDTensor> *out) {
22+
std::lock_guard<std::mutex> lock(mu_);
23+
PADDLE_ENFORCE_EQ(status_, ReaderStatus::kRunning);
2424
ReadNextImpl(out);
25-
if (out->empty()) {
26-
return;
27-
}
25+
}
2826

29-
PADDLE_ENFORCE_EQ(out->size(), dims_.size());
30-
for (size_t i = 0; i < dims_.size(); ++i) {
31-
auto &actual = (*out)[i].dims();
32-
auto &expect = dims_[i];
27+
void ReaderBase::InsertDecoratedReader(
28+
const std::shared_ptr<ReaderBase> &decorated_reader) {
29+
std::lock_guard<std::mutex> guard(mu_);
30+
decorated_readers_.emplace_back(decorated_reader);
31+
}
3332

34-
PADDLE_ENFORCE_EQ(actual.size(), expect.size());
35-
for (int j = 0; j < actual.size(); ++j) {
36-
// PADDLE_ENFORCE(actual[i] == expect[i] || expect[i] == -1);
33+
std::unordered_set<ReaderBase *> ReaderBase::GetEndPoints() {
34+
std::unordered_set<ReaderBase *> result;
35+
std::deque<ReaderBase *> queue;
36+
queue.emplace_back(this);
37+
while (!queue.empty()) { // BFS search
38+
auto *front = queue.front();
39+
queue.pop_front();
40+
if (front->decorated_readers_.empty()) {
41+
result.emplace(front);
42+
} else {
43+
for (auto &reader : front->decorated_readers_) {
44+
if (auto *reader_ptr = reader.lock().get()) {
45+
queue.emplace_back(reader_ptr);
46+
}
47+
}
3748
}
3849
}
50+
51+
return result;
3952
}
53+
54+
void ReaderBase::Shutdown() {
55+
std::lock_guard<std::mutex> lock(mu_);
56+
if (status_ != ReaderStatus::kStopped) {
57+
ShutdownImpl();
58+
status_ = ReaderStatus::kStopped;
59+
}
60+
}
61+
62+
void ReaderBase::Start() {
63+
std::lock_guard<std::mutex> lock(mu_);
64+
if (status_ != ReaderStatus::kRunning) {
65+
StartImpl();
66+
status_ = ReaderStatus::kRunning;
67+
}
68+
}
69+
70+
ReaderBase::~ReaderBase() { Shutdown(); }
71+
4072
} // namespace framework
4173
} // namespace paddle

paddle/fluid/framework/reader.h

Lines changed: 76 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
#pragma once
1616

1717
#include <memory>
18+
#include <unordered_set>
1819
#include <vector>
1920

2021
#include "paddle/fluid/framework/ddim.h"
@@ -24,61 +25,116 @@
2425
namespace paddle {
2526
namespace framework {
2627

28+
enum ReaderStatus { kRunning, kStopped };
29+
2730
class ReaderBase {
2831
public:
29-
virtual void ReadNext(std::vector<LoDTensor>* out) = 0;
32+
void ReadNext(std::vector<LoDTensor>* out);
33+
34+
void Shutdown();
3035

31-
virtual void ReInit() = 0;
36+
void Start();
37+
38+
// Return the readers which are the end of decorating chain. Basically
39+
// they are readers just before read op.
40+
std::unordered_set<ReaderBase*> GetEndPoints();
3241

3342
virtual ~ReaderBase();
43+
44+
protected:
45+
virtual void ReadNextImpl(std::vector<LoDTensor>* out) = 0;
46+
47+
virtual void ShutdownImpl() {}
48+
49+
virtual void StartImpl() {}
50+
51+
ReaderStatus status_{kRunning};
52+
53+
mutable std::mutex mu_;
54+
55+
private:
56+
friend class DecoratedReader;
57+
// These methods can be only invoked inside DecoratedReader to record the
58+
// decorating chain.
59+
void InsertDecoratedReader(
60+
const std::shared_ptr<ReaderBase>& decorated_reader);
61+
// A set of which readers that decorated this reader.
62+
std::vector<std::weak_ptr<ReaderBase>> decorated_readers_;
3463
};
3564

36-
class DecoratedReader : public ReaderBase {
65+
class DecoratedReader : public ReaderBase,
66+
public std::enable_shared_from_this<DecoratedReader> {
3767
public:
3868
explicit DecoratedReader(const std::shared_ptr<ReaderBase>& reader)
3969
: ReaderBase(), reader_(reader) {
4070
PADDLE_ENFORCE_NOT_NULL(reader_);
4171
}
4272

43-
void ReInit() override { reader_->ReInit(); }
73+
void RegisterDecorateChain() {
74+
reader_->InsertDecoratedReader(shared_from_this());
75+
}
4476

4577
protected:
46-
std::shared_ptr<ReaderBase> reader_;
47-
};
48-
49-
class FileReader : public ReaderBase {
50-
public:
51-
explicit FileReader(const std::vector<DDim>& dims);
52-
53-
void ReadNext(std::vector<LoDTensor>* out) override;
78+
void ShutdownImpl() override { reader_->Shutdown(); }
5479

55-
protected:
56-
virtual void ReadNextImpl(std::vector<LoDTensor>* out) = 0;
80+
void StartImpl() override { reader_->Start(); }
5781

58-
private:
59-
std::vector<DDim> dims_;
82+
std::shared_ptr<ReaderBase> reader_;
6083
};
6184

85+
// FileReader is just a conceptual class.
86+
class FileReader : public ReaderBase {};
87+
6288
// The ReaderHolder is used as reader' unified wrapper,
6389
// making it easier to access different type reader in Variables.
6490
class ReaderHolder {
6591
public:
66-
void Reset(ReaderBase* reader) { reader_.reset(reader); }
92+
template <typename T>
93+
void Reset(const std::shared_ptr<T>& reader) {
94+
auto reader_base = std::dynamic_pointer_cast<ReaderBase>(reader);
95+
PADDLE_ENFORCE_NOT_NULL(reader_base);
96+
reader_ = reader_base;
97+
}
6798

68-
std::shared_ptr<ReaderBase> Get() const { return reader_; }
99+
const std::shared_ptr<ReaderBase>& Get() const { return reader_; }
69100

70101
void ReadNext(std::vector<LoDTensor>* out) {
71102
PADDLE_ENFORCE_NOT_NULL(reader_);
72103
reader_->ReadNext(out);
73104
}
74-
void ReInit() {
105+
106+
void ResetAll() {
107+
auto end_readers = reader_->GetEndPoints();
108+
for (auto* reader : end_readers) {
109+
reader->Shutdown();
110+
}
111+
for (auto* reader : end_readers) {
112+
reader->Start();
113+
}
114+
}
115+
116+
void Shutdown() {
117+
PADDLE_ENFORCE_NOT_NULL(reader_);
118+
reader_->Shutdown();
119+
}
120+
121+
void Start() {
75122
PADDLE_ENFORCE_NOT_NULL(reader_);
76-
reader_->ReInit();
123+
reader_->Start();
77124
}
78125

126+
operator const std::shared_ptr<ReaderBase>&() const { return this->reader_; }
127+
79128
private:
80129
std::shared_ptr<ReaderBase> reader_;
81130
};
82131

132+
template <typename T, typename... ARGS>
133+
inline std::shared_ptr<DecoratedReader> MakeDecoratedReader(ARGS&&... args) {
134+
std::shared_ptr<DecoratedReader> reader(new T(std::forward<ARGS>(args)...));
135+
reader->RegisterDecorateChain();
136+
return reader;
137+
}
138+
83139
} // namespace framework
84140
} // namespace paddle

paddle/fluid/framework/reader_test.cc

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
#include "paddle/fluid/framework/reader.h"
16+
#include <memory>
17+
#include "gtest/gtest.h"
18+
19+
class StubDecoratedReader : public paddle::framework::DecoratedReader {
20+
public:
21+
explicit StubDecoratedReader(const std::shared_ptr<ReaderBase> &reader)
22+
: DecoratedReader(reader) {}
23+
24+
void ReadNextImpl(std::vector<paddle::framework::LoDTensor> *out) override {}
25+
};
26+
27+
class StubRootReader : public paddle::framework::ReaderBase {
28+
public:
29+
void ReadNextImpl(std::vector<paddle::framework::LoDTensor> *out) override {}
30+
};
31+
32+
TEST(READER, decorate_chain) {
33+
auto root = std::make_shared<StubRootReader>();
34+
auto end_point1 =
35+
paddle::framework::MakeDecoratedReader<StubDecoratedReader>(root);
36+
auto end_point2 =
37+
paddle::framework::MakeDecoratedReader<StubDecoratedReader>(root);
38+
39+
{
40+
auto endpoints = root->GetEndPoints();
41+
ASSERT_EQ(endpoints.size(), 2U);
42+
ASSERT_NE(endpoints.count(end_point1.get()), 0);
43+
ASSERT_NE(endpoints.count(end_point2.get()), 0);
44+
}
45+
46+
{
47+
auto end_point3 =
48+
paddle::framework::MakeDecoratedReader<StubDecoratedReader>(root);
49+
ASSERT_EQ(root->GetEndPoints().size(), 3U);
50+
}
51+
{ ASSERT_EQ(root->GetEndPoints().size(), 2U); }
52+
}

0 commit comments

Comments
 (0)