Skip to content

Commit fee90b5

Browse files
author
Yancey
authored
Add async sgd document (#8474)
* add async sgd document * fix ci * update by comment * update doc
1 parent e84615b commit fee90b5

File tree

3 files changed

+69
-18
lines changed

3 files changed

+69
-18
lines changed

doc/howto/cluster/cmd_argument_cn.md

Lines changed: 32 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,17 @@
1-
## 启动参数说明
1+
# 启动参数说明
22

33
下面以`doc/howto/cluster/src/word2vec`中的代码作为实例,介绍使用PaddlePaddle v2 API完成分布式训练。
44

5-
### 启动参数服务器
5+
## 启动参数服务器
6+
67
执行以下的命令启动一个参数服务器并等待和计算节点的数据交互
8+
79
```bash
810
$ paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1
911
```
1012

1113
如果希望可以在后台运行pserver程序,并保存输出到一个日志文件,可以运行:
14+
1215
```bash
1316
$ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1 &> pserver.log
1417
```
@@ -20,8 +23,10 @@ $ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num
2023
- ports_num_for_sparse:**必选,默认0**,用于稀疏类型参数通信的端口个数
2124
- num_gradient_servers:**必选,默认1**,当前训练任务pserver总数
2225

23-
### 启动计算节点
26+
## 启动计算节点
27+
2428
执行以下命令启动使用python编写的trainer程序(文件名为任意文件名,如train.py)
29+
2530
```bash
2631
$ python train.py
2732
```
@@ -67,7 +72,7 @@ paddle.init(
6772
- pservers:**必选,默认127.0.0.1**,当前训练任务启动的pserver的IP列表,多个IP使用“,”隔开
6873

6974

70-
### 准备数据集
75+
## 准备数据集
7176

7277
参考样例数据准备脚本[prepare.py](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/cluster/src/word2vec/prepare.py),准备训练数据和验证数据集,我们使用paddle.dataset.imikolov数据集,并根据分布式训练并发数(trainer节点个数),在`prepare.py`开头部分指定`SPLIT_COUNT`将数据切分成多份。
7378

@@ -84,7 +89,8 @@ for f in flist:
8489
```
8590

8691
示例程序`prepare.py`会把训练集和测试集分别分割成多个文件(例子中为3个,后缀为`-00000``-00001``-00002`):
87-
```
92+
93+
```bash
8894
train.txt
8995
train.txt-00000
9096
train.txt-00001
@@ -99,12 +105,13 @@ test.txt-00002
99105

100106
对于不同的训练任务,训练数据格式和训练程序的`reader()`会大不相同,所以开发者需要根据自己训练任务的实际场景完成训练数据的分割和`reader()`的编写。
101107

102-
### 准备训练程序
108+
## 准备训练程序
103109

104110
我们会对每个训练任务都会在每个节点上创建一个工作空间(workspace),其中包含了用户的训练程序、程序依赖、挂载或下载的训练数据分片。
105111

106112
最后,工作空间应如下所示:
107-
```
113+
114+
```bash
108115
.
109116
|-- my_lib.py
110117
|-- word_dict.pickle
@@ -133,3 +140,21 @@ test.txt-00002
133140

134141
- `train_data_dir`:包含训练数据的目录,可以是从分布式存储挂载过来的,也可以是在任务启动前下载到本地的。
135142
- `test_data_dir`:包含测试数据集的目录。
143+
144+
## 异步 SGD 更新
145+
146+
我们可以通过设置 `optimize` 的参数使之支持异步SGD更新。
147+
例如,设置 `AdaGrad` optimize 的 `is_async``async_lagged_grad_discard_ratio` 参数:
148+
149+
```python
150+
adagrad = paddle.optimizer.AdaGrad(
151+
is_async=True,
152+
async_lagged_grad_discard_ratio=1.6,
153+
learning_rate=3e-3,
154+
regularization=paddle.optimizer.L2Regularization(8e-4))
155+
```
156+
157+
- `is_async`: 是否为异步SGD更新模式。
158+
- `async_lagged_grad_discard_ratio`: 异步SGD更新的步长控制,接收到足够的gradient(
159+
`async_lagged_grad_discard_ratio * num_gradient_servers`)之后,后面的gradient
160+
将会被抛弃。

doc/howto/cluster/cmd_argument_en.md

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,19 @@
1-
## Command-line arguments
1+
# Command-line arguments
22

33
We'll take `doc/howto/cluster/src/word2vec` as an example to introduce distributed training using PaddlePaddle v2 API.
44

5-
### Starting parameter server
5+
## Starting parameter server
66

77
Type the below command to start a parameter server which will wait for trainers to connect:
88

99
```bash
10-
$ paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1
10+
$ paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1 --nics=eth0
1111
```
1212

1313
If you wish to run parameter servers in background, and save a log file, you can type:
14+
1415
```bash
15-
$ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1 &> pserver.log
16+
$ stdbuf -oL /usr/bin/nohup paddle pserver --port=7164 --ports_num=1 --ports_num_for_sparse=1 --num_gradient_servers=1 --nics=eth0 &> pserver.log &
1617
```
1718

1819
Parameter Description
@@ -21,8 +22,10 @@ Parameter Description
2122
- ports_num: **required, default 1**, total number of ports will listen on.
2223
- ports_num_for_sparse: **required, default 0**, number of ports which serves sparse parameter update.
2324
- num_gradient_servers: **required, default 1**, total number of gradient servers.
25+
- nics: **optional, default xgbe0,xgbe1**, network device name which paramter server will listen on.
26+
27+
## Starting trainer
2428

25-
### Starting trainer
2629
Type the command below to start the trainer(name the file whatever you want, like "train.py")
2730

2831
```bash
@@ -70,7 +73,7 @@ Parameter Description
7073
- trainer_id: **required, default 0**, ID for every trainer, start from 0.
7174
- pservers: **required, default 127.0.0.1**, list of IPs of parameter servers, separated by ",".
7275

73-
### Prepare Training Dataset
76+
## Prepare Training Dataset
7477

7578
Here's some example code [prepare.py](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/cluster/src/word2vec/prepare.py), it will download public `imikolov` dataset and split it into multiple files according to job parallelism(trainers count). Modify `SPLIT_COUNT` at the begining of `prepare.py` to change the count of output files.
7679

@@ -88,7 +91,7 @@ for f in flist:
8891

8992
Example code `prepare.py` will split training data and testing data into 3 files with digital suffix like `-00000`, `-00001` and`-00002`:
9093

91-
```
94+
```bash
9295
train.txt
9396
train.txt-00000
9497
train.txt-00001
@@ -103,13 +106,13 @@ When job started, every trainer needs to get it's own part of data. In some dist
103106

104107
Different training jobs may have different data format and `reader()` function, developers may need to write different data prepare scripts and `reader()` functions for their job.
105108

106-
### Prepare Training program
109+
## Prepare Training program
107110

108111
We'll create a *workspace* directory on each node, storing your training program, dependencies, mounted or downloaded dataset directory.
109112

110-
111113
Your workspace may looks like:
112-
```
114+
115+
```bash
113116
.
114117
|-- my_lib.py
115118
|-- word_dict.pickle
@@ -138,3 +141,21 @@ Your workspace may looks like:
138141

139142
- `train_data_dir`: containing training data. Mount from storage service or copy trainning data to here.
140143
- `test_data_dir`: containing testing data.
144+
145+
## Async SGD Update
146+
147+
We can set some parameters of the optimizer to make it support async SGD update.
148+
For example, we can set the `is_async` and `async_lagged_grad_discard_ratio` of the `AdaGrad` optimizer:
149+
150+
```python
151+
adagrad = paddle.optimizer.AdaGrad(
152+
is_async=True,
153+
async_lagged_grad_discard_ratio=1.6,
154+
learning_rate=3e-3,
155+
regularization=paddle.optimizer.L2Regularization(8e-4))
156+
```
157+
158+
- `is_async`: Is Async-SGD or not.
159+
- `async_lagged_grad_discard_ratio`: For async SGD gradient commit control.
160+
when `async_lagged_grad_discard_ratio * num_gradient_servers` commit passed,
161+
current async gradient will be discard silently.

python/paddle/trainer_config_helpers/optimizers.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -361,6 +361,7 @@ def settings(batch_size,
361361
learning_rate_decay_b=0.,
362362
learning_rate_schedule='poly',
363363
learning_rate_args='',
364+
async_lagged_grad_discard_ratio=1.5,
364365
learning_method=None,
365366
regularization=None,
366367
is_async=False,
@@ -396,6 +397,10 @@ def settings(batch_size,
396397
value larger than some value, will be
397398
clipped.
398399
:type gradient_clipping_threshold: float
400+
:param async_lagged_grad_discard_ratio: async SGD gradient commit control,
401+
when async_lagged_grad_discard_ratio * num_gradient_servers commit passed,
402+
the current async SGD gradient is discarded.
403+
:type async_lagged_grad_discard_ratio: float
399404
"""
400405
if isinstance(regularization, BaseRegularization):
401406
regularization = [regularization]
@@ -409,7 +414,7 @@ def settings(batch_size,
409414
args = [
410415
'batch_size', 'learning_rate', 'learning_rate_decay_a',
411416
'learning_rate_decay_b', 'learning_rate_schedule', 'learning_rate_args',
412-
'gradient_clipping_threshold'
417+
'gradient_clipping_threshold', 'async_lagged_grad_discard_ratio'
413418
]
414419
kwargs = dict()
415420
kwargs['algorithm'] = algorithm

0 commit comments

Comments
 (0)