Add ParameterServerController for parameter server python api#1051
Add ParameterServerController for parameter server python api#1051jacquesqiao merged 11 commits intoPaddlePaddle:developfrom
Conversation
paddle/pserver/PServerUtil.h
Outdated
| @@ -0,0 +1,70 @@ | |||
| /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. | |||
There was a problem hiding this comment.
我现在对util这样的命名特别紧张——我发现一般都是大家懒得想明白应该叫什么的时候就叫util了。
在这里,看上去是想叫 PServerController 什么的。
There was a problem hiding this comment.
Cool,同意。
PServerController是个好名字。
感觉程序员起名字是一个非常痛苦的事情。一痛苦就会用一些比较常用的名字,比如Utils
paddle/pserver/PServerUtil.h
Outdated
| /** | ||
| * @brief start all pserver thread in this PServerUtil. | ||
| */ | ||
| void start(); |
There was a problem hiding this comment.
我们的naming convention指定的怎么样了? @reyoung
在这里constructor是camel形式,但是methods都是小写。显然不一致呀。
There was a problem hiding this comment.
这里因为构造函数和类名必须一致。类名必须是UpperCamelCase。比如 "SomeClass".而函数名是"lowerCamelCase",比如"someMethod"。
这样做的好处是,我们可以通过判断出一个东西是不是类型了。
比如
class SomeClass {
public:
class Helper {
};
static Helper helper();
};SomeClass::Helper 是类型,而SomeClass::helper是函数。
| @@ -0,0 +1,43 @@ | |||
| /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. | |||
There was a problem hiding this comment.
上面有文件叫 PServerUtils.*,这里叫ParameterServer,显然不一致呀。
There was a problem hiding this comment.
这个配置文件确实是用来配置parameter server的,目前的pserverutil封装了几个parameter server线程,根据config来创建这些线程。
There was a problem hiding this comment.
我的意思是到底应该叫 pserver 还是 parameter server 呢?
There was a problem hiding this comment.
已经按照命名规范修改为ParameterServerController
paddle/pserver/PServerUtil.h
Outdated
| @@ -0,0 +1,70 @@ | |||
| /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. | |||
There was a problem hiding this comment.
Cool,同意。
PServerController是个好名字。
感觉程序员起名字是一个非常痛苦的事情。一痛苦就会用一些比较常用的名字,比如Utils
paddle/pserver/PServerUtil.h
Outdated
| /** | ||
| * @brief start all pserver thread in this PServerUtil. | ||
| */ | ||
| void start(); |
There was a problem hiding this comment.
这里因为构造函数和类名必须一致。类名必须是UpperCamelCase。比如 "SomeClass".而函数名是"lowerCamelCase",比如"someMethod"。
这样做的好处是,我们可以通过判断出一个东西是不是类型了。
比如
class SomeClass {
public:
class Helper {
};
static Helper helper();
};SomeClass::Helper 是类型,而SomeClass::helper是函数。
paddle/pserver/PServerUtil.h
Outdated
| void join(); | ||
|
|
||
| private: | ||
| std::vector<std::shared_ptr<ParameterServer2>> pservers_; |
There was a problem hiding this comment.
std::vector<std::shared_ptr<ParameterServer2>> =>std::vector<std::unique_ptr<ParameterServer2>>
paddle/pserver/PServerUtil.h
Outdated
|
|
||
| namespace paddle { | ||
|
|
||
| class PServerUtil { |
There was a problem hiding this comment.
class PServerUtil => class PServerUtil final
paddle/pserver/PServerController.cpp
Outdated
| } | ||
|
|
||
| PServerController* PServerController::createByGflags() { | ||
| auto& pServerConfig = *paddle::PServerController::initConfigByGflags(); |
There was a problem hiding this comment.
这里直接用栈变量不可以么?也就是
ParameterServerConfig config;
config.set_nics(FLAGS_nics);
...
这样。同时,initConfigByGflags 只被 createByGFlags 调用,没必要extract成一个private的static member function了吧。
paddle/pserver/PServerController.cpp
Outdated
| PServerController::~PServerController() { this->join(); } | ||
|
|
||
| ParameterServerConfig* PServerController::initConfigByGflags() { | ||
| ParameterServerConfig* config = new ParameterServerConfig(); |
There was a problem hiding this comment.
尽量不要用C++的new关键词。
最省事的方法是:
auto config = std::make_shared<ParameterServerConfig>();或者是
auto config = std::make_unique<ParameterServerConfig>(); // since c++ 14虽然目前没有make_unique,不过回头我加上吧。。
There was a problem hiding this comment.
对了,勉强在Cpp里面和java new语意一致的东西是 std::make_shared<类型名>(参数)。只是std::make_unique会快一点。
There was a problem hiding this comment.
赞,多谢,已经修改了,不过make_unique打算如何引入?
There was a problem hiding this comment.
直接加上这个函数也可以。。http://stackoverflow.com/questions/17902405/how-to-implement-make-unique-function-in-c11
判断一下C++版本, if __cplusplus != 14,那么就加上make_unique。
… add-pserver-util
… add-pserver-util
|
|
||
| namespace paddle { | ||
|
|
||
| class ParameterServerController final { |
| required int32 trainer_id = 1; | ||
| } | ||
|
|
||
| message ParameterServerConfig { |
There was a problem hiding this comment.
这里应该有个注释说明这个proto message的用意。
|
|
||
| package paddle; | ||
|
|
||
| message ParameterClientConfig { |
There was a problem hiding this comment.
这里应该有个注释说明这个proto message的用意。
| } | ||
|
|
||
| void ParameterServerController::start() { | ||
| LOG(INFO) << "pserver sizes : " << pservers_.size(); |
There was a problem hiding this comment.
"pserver sizes" ==> "number of pserver instances"?
| LOG(INFO) << "pserver sizes : " << pservers_.size(); | ||
| int i = 0; | ||
| for (const auto& pserver : pservers_) { | ||
| LOG(INFO) << "pserver started : " << i; |
There was a problem hiding this comment.
LOG(INFO) << "Staring pserver " << i;
|
|
||
| ParameterServerController::~ParameterServerController() { this->join(); } | ||
|
|
||
| ParameterServerController* ParameterServerController::createByGflags() { |
| } | ||
| } | ||
|
|
||
| void ParameterServerController::join() { |
|
trainer和pserver除了日常的通信,还有一个同步控制(核心目的是为了实现async-sgd、sync-sgd、pass起始、batch 起始等核心逻辑)的逻辑,从这个ParameterServerController实现来看,主要是起到对启动pserver进程的控制,没有同步的控制逻辑。 问题:
|
|
现在版本中有一个耦合在trainer中的类似『ParameterServerController』的功能, 参见: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/trainer/RemoteParameterUpdater.cpp#L623 它主要强调对SGD算法(sync-sgd, async-sgd)层面的逻辑控制。 |
对的,目前ParameterServerController主要是对parameter server那些thread的封装,干了一些初始化和简单的控制功能,而且只能控制一个ps进程。
后面可以考虑,不过trainer的这个controller貌似是训练这边的一个逻辑,直接和ParameterServer2通信,后面的具体实现,可以再考虑。 |
* test=develop,adding a description of the CMAKE command
…#1051) * fix cmake by use abs path * add boost dependency Co-authored-by: Zeyu Chen <chenzeyu01@baidu.com>
1,code clean, ParameterServer2Main.cpp and TrainerMain.cpp have duplicated code.
2, the duplicated code use many gflags to control the init of parameter server, so add a wrapper for pserver, change gflags to proto config.
refs #1039
More about this PR (copied from IM chat with @jacquesqiao):