Skip to content

Commit c42434d

Browse files
committed
collective
1 parent 9b286ef commit c42434d

File tree

4 files changed

+76
-3
lines changed

4 files changed

+76
-3
lines changed

doc/collective_mode.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Collective模式运行
2+
如果您希望可以同时使用多张GPU,更为快速的训练您的模型,可以尝试使用`单机多卡``多机多卡`模式运行。
3+
4+
## 版本要求
5+
用户需要确保已经安装paddlepaddle-2.0.0-rc-gpu及以上版本的飞桨开源框架
6+
7+
## 设置config.yaml
8+
首先需要在模型的yaml配置中,加入use_fleet参数,并把值设置成True。
9+
```yaml
10+
runner:
11+
# 通用配置不再赘述
12+
...
13+
# use fleet
14+
use_fleet: True
15+
```
16+
## 单机多卡训练
17+
18+
### 单机多卡模式下指定需要使用的卡号
19+
在没有进行设置的情况下将使用单机上所有gpu卡。若需要指定部分gpu卡执行,可以通过设置环境变量CUDA_VISIBLE_DEVICES来实现。
20+
例如单机上有8张卡,只打算用前4卡张训练,可以设置export CUDA_VISIBLE_DEVICES=0,1,2,3
21+
再执行训练脚本即可。
22+
23+
### 执行训练
24+
```bash
25+
# 动态图执行训练
26+
python -m paddle.distributed.launch ../../../tools/trainer.py -m config.yaml
27+
# 静态图执行训练
28+
python -m paddle.distributed.launch ../../../tools/static_trainer.py -m config.yaml
29+
```
30+
31+
注意:在使用静态图训练时,确保模型static_model.py程序中create_optimizer函数设置了分布式优化器。
32+
```python
33+
def create_optimizer(self, strategy=None):
34+
optimizer = paddle.optimizer.Adam(learning_rate=self.learning_rate, lazy_mode=True)
35+
# 通过Fleet API获取分布式优化器,将参数传入飞桨的基础优化器
36+
if strategy != None:
37+
import paddle.distributed.fleet as fleet
38+
optimizer = fleet.distributed_optimizer(optimizer, strategy)
39+
optimizer.minimize(self._cost)
40+
```
41+
42+
## 多机多卡训练
43+
使用多机多卡训练,您需要另外一台或多台能够互相ping通的机器。每台机器中都需要安装paddlepaddle-2.0.0-rc-gpu及以上版本的飞桨开源框架,同时将需要运行的paddlerec模型,数据集复制到每一台机器上。
44+
从单机多卡到多机多卡训练,在代码上不需要做任何改动,只需再额外指定ips参数即可。其内容为多机的ip列表,命令如下所示:
45+
```bash
46+
# 动态图
47+
# 动态图执行训练
48+
python -m paddle.distributed.launch --ips="xx.xx.xx.xx,yy.yy.yy.yy" --gpus 0,1,2,3,4,5,6,7 ../../../tools/trainer.py -m config.yaml
49+
# 静态图执行训练
50+
python -m paddle.distributed.launch --ips="xx.xx.xx.xx,yy.yy.yy.yy" --gpus 0,1,2,3,4,5,6,7 ../../../tools/static_trainer.py -m config.yaml
51+
```

models/rank/wide_deep/config.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@
1717
runner:
1818
train_data_dir: "data/sample_data/train"
1919
train_reader_path: "criteo_reader" # importlib format
20-
use_gpu: False
20+
use_gpu: True
2121
use_auc: True
22-
train_batch_size: 2
22+
train_batch_size: 50
2323
epochs: 3
2424
print_interval: 2
2525
#model_init_path: "output_model/0" # init model
@@ -34,6 +34,8 @@ runner:
3434
use_inference: False
3535
save_inference_feed_varnames: ["label","C1","C2","C3","C4","C5","C6","C7","C8","C9","C10","C11","C12","C13","C14","C15","C16","C17","C18","C19","C20","C21","C22","C23","C24","C25","C26","dense_input"]
3636
save_inference_fetch_varnames: ["cast_0.tmp_0"]
37+
#use fleet
38+
use_fleet: False
3739

3840
# hyper parameters of user-defined network
3941
hyper_parameters:

tools/static_trainer.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,9 +63,9 @@ def main(args):
6363
input_data_names = [data.name for data in input_data]
6464

6565
fetch_vars = static_model_class.net(input_data)
66+
6667
#infer_target_var = model.infer_target_var
6768
logger.info("cpu_num: {}".format(os.getenv("CPU_NUM")))
68-
static_model_class.create_optimizer()
6969

7070
use_gpu = config.get("runner.use_gpu", True)
7171
use_auc = config.get("runner.use_auc", False)
@@ -79,6 +79,7 @@ def main(args):
7979
model_init_path = config.get("runner.model_init_path", None)
8080
batch_size = config.get("runner.train_batch_size", None)
8181
reader_type = config.get("runner.reader_type", "DataLoader")
82+
use_fleet = config.get("runner.use_fleet", False)
8283
os.environ["CPU_NUM"] = str(config.get("runner.thread_num", 1))
8384
logger.info("**************common.configs**********")
8485
logger.info(
@@ -88,6 +89,16 @@ def main(args):
8889
logger.info("**************common.configs**********")
8990

9091
place = paddle.set_device('gpu' if use_gpu else 'cpu')
92+
93+
if use_fleet:
94+
from paddle.distributed import fleet
95+
strategy = fleet.DistributedStrategy()
96+
fleet.init(is_collective=True, strategy=strategy)
97+
if use_fleet:
98+
static_model_class.create_optimizer(strategy)
99+
else:
100+
static_model_class.create_optimizer()
101+
91102
exe = paddle.static.Executor(place)
92103
# initialize
93104
exe.run(paddle.static.default_startup_program())

tools/trainer.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ def main(args):
7979
train_batch_size = config.get("runner.train_batch_size", None)
8080
model_save_path = config.get("runner.model_save_path", "model_output")
8181
model_init_path = config.get("runner.model_init_path", None)
82+
use_fleet = config.get("runner.use_fleet", False)
8283

8384
logger.info("**************common.configs**********")
8485
logger.info(
@@ -102,6 +103,14 @@ def main(args):
102103
# to do : add optimizer function
103104
optimizer = dy_model_class.create_optimizer(dy_model, config)
104105

106+
# use fleet run collective
107+
if use_fleet:
108+
from paddle.distributed import fleet
109+
strategy = fleet.DistributedStrategy()
110+
fleet.init(is_collective=True, strategy=strategy)
111+
optimizer = fleet.distributed_optimizer(optimizer)
112+
dy_model = fleet.distributed_model(dy_model)
113+
105114
logger.info("read data")
106115
train_dataloader = create_data_loader(config=config, place=place)
107116

0 commit comments

Comments
 (0)