Skip to content

Commit 0abf173

Browse files
committed
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into update_simple_distranspiler
2 parents a4d88fb + 1af0b28 commit 0abf173

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1201
-621
lines changed

CONTRIBUTING.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,8 @@ PaddlePaddle uses this [Git branching model](http://nvie.com/posts/a-successful-
5858
create mode 100644 233
5959
```
6060

61+
NOTE: The `yapf` installed by `pip install pre-commit` and `conda install -c conda-forge pre-commit` is slightly different. Paddle developers use `pip install pre-commit`.
62+
6163
1. Build and test
6264

6365
Users can build PaddlePaddle natively on Linux and Mac OS X. But to unify the building environment and to make it easy for debugging, the recommended way is [using Docker](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/build_en.md).

benchmark/fluid/fluid_benchmark.py

Lines changed: 20 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,8 @@ def parse_args():
9898
'--use_fake_data',
9999
action='store_true',
100100
help='If set ommit the actual read data operators.')
101+
parser.add_argument(
102+
'--profile', action='store_true', help='If set, profile a few steps.')
101103
parser.add_argument(
102104
'--update_method',
103105
type=str,
@@ -108,8 +110,8 @@ def parse_args():
108110
return args
109111

110112

111-
def append_nccl2_prepare():
112-
if os.getenv("PADDLE_TRAINER_ID", None) != None:
113+
def append_nccl2_prepare(trainer_id):
114+
if trainer_id >= 0:
113115
# append gen_nccl_id at the end of startup program
114116
trainer_id = int(os.getenv("PADDLE_TRAINER_ID"))
115117
port = os.getenv("PADDLE_PSERVER_PORT")
@@ -136,12 +138,12 @@ def append_nccl2_prepare():
136138
})
137139
return nccl_id_var, num_trainers, trainer_id
138140
else:
139-
raise Exception(
140-
"must set PADDLE_TRAINER_ID env variables for dist train.")
141+
raise Exception("must set positive PADDLE_TRAINER_ID env variables for "
142+
"nccl-based dist train.")
141143

142144

143-
def dist_transpile():
144-
if "PADDLE_TRAINING_ROLE" not in os.environ:
145+
def dist_transpile(trainer_id):
146+
if trainer_id < 0:
145147
return None, None
146148

147149
# the port of all pservers, needed by both trainer and pserver
@@ -158,9 +160,6 @@ def dist_transpile():
158160
trainers = int(os.getenv("PADDLE_TRAINERS"))
159161
# the IP of the local machine, needed by pserver only
160162
current_endpoint = os.getenv("PADDLE_CURRENT_IP", "") + ":" + port
161-
# the unique trainer id, starting from 0, needed by trainer
162-
# only
163-
trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
164163
# the role, should be either PSERVER or TRAINER
165164
training_role = os.getenv("PADDLE_TRAINING_ROLE")
166165

@@ -295,6 +294,11 @@ def train_parallel(avg_loss, infer_prog, optimizer, train_reader, test_reader,
295294
iters = 0
296295
start_time = time.time()
297296
for batch_id, data in enumerate(train_reader()):
297+
if args.profile and pass_id == 0 and batch_id == 5:
298+
profiler.start_profiler("All")
299+
elif args.profile and pass_id == 0 and batch_id == 10:
300+
profiler.stop_profiler("total", "/tmp/profile_%d" % trainer_id)
301+
298302
if iters == args.skip_batch_num:
299303
start_time = time.time()
300304
num_samples = 0
@@ -334,7 +338,11 @@ def print_arguments(args):
334338
def main():
335339
args = parse_args()
336340
print_arguments(args)
337-
nccl_id_var, num_trainers, trainer_id = None, 1, 0
341+
342+
# the unique trainer id, starting from 0, needed by trainer
343+
# only
344+
nccl_id_var, num_trainers, trainer_id = (
345+
None, 1, int(os.getenv("PADDLE_TRAINER_ID", "-1")))
338346

339347
if args.use_cprof:
340348
pr = cProfile.Profile()
@@ -348,7 +356,7 @@ def main():
348356
fluid.memory_optimize(fluid.default_main_program())
349357

350358
if args.update_method == "pserver":
351-
train_prog, startup_prog = dist_transpile()
359+
train_prog, startup_prog = dist_transpile(trainer_id)
352360
if not train_prog:
353361
raise Exception(
354362
"Must configure correct environments to run dist train.")
@@ -364,7 +372,7 @@ def main():
364372
train_args.append(fluid.default_startup_program())
365373

366374
if args.update_method == "nccl2":
367-
nccl_id_var, num_trainers, trainer_id = append_nccl2_prepare()
375+
nccl_id_var, num_trainers, trainer_id = append_nccl2_prepare(trainer_id)
368376
if args.gpus == 1:
369377
# NOTE: parallel executor use profiler interanlly
370378
if args.use_nvprof and args.device == 'GPU':

doc/fluid/getstarted/Developer's_Guide_to_Paddle_Fluid.md

Lines changed: 34 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@
8686
<br>
8787

8888
<p align="center">
89-
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/fluid_compiler.png" width=100%>
89+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/fluid-compiler.png" width=100%>
9090
</p>
9191

9292
---
@@ -123,12 +123,12 @@
123123
<font size=5>
124124

125125
- 在科学计算领域,计算图是一种描述计算的经典方式。下图展示了从前向计算图(蓝色)开始,通过添加反向(红色)和优化算法相关(绿色)操作,构建出整个计算图的过程:
126-
-
126+
-
127127
<p align="center">
128128
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/graph_construction_example_all.png" width=60%>
129129
</p>
130130

131-
131+
132132
- Fluid ==使用`Program`而不是计算图==来描述模型和优化过程。`Program``Block``Operator``Variable`构成,相关概念会在后文详细展开。
133133
- 编译时 Fluid 接受前向计算(这里可以先简单的理解为是一段有序的计算流)`Program`,为这段前向计算按照:前向 -> 反向 -> 梯度 clip -> 正则 -> 优化 的顺序,添加相关 `Operator``Variable``Program`到完整的计算。
134134

@@ -328,7 +328,7 @@
328328
329329
</font>
330330

331-
---
331+
---
332332

333333
### 编译时概念 :==**[Transpiler](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/motivation/fluid_compiler.md)**==
334334
<font size=5>
@@ -402,7 +402,7 @@
402402
- `Scope`
403403

404404
- 计算相关
405-
- `Block`
405+
- `Block`
406406
- `Kernel``OpWithKernel``OpWithoutKernel`
407407

408408
<table>
@@ -439,7 +439,7 @@
439439
</tbody>
440440
</table>
441441

442-
- 执行相关 :`Executor`
442+
- 执行相关 :`Executor`
443443

444444
</font>
445445

@@ -798,7 +798,7 @@ class GPUAllocator : public SystemAllocator {
798798
799799
- step 1:添加Place类型,<span style="background-color:#DAB1D5;">由用户实现添加到框架</span>
800800
- 可以将Place类型理解为一个整数加上一个枚举型,包括:设备号 + 设备类型
801-
801+
802802
<p align="center">
803803
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/place.png" width=40%>
804804
</p>
@@ -824,7 +824,7 @@ class GPUAllocator : public SystemAllocator {
824824
1. DataType 执行数据类型 FP32/FP64/INT32/INT64
825825
1. Memory layout: 运行时 Tensor 在内存中的排布格式 NCHW、 NHWC
826826
1. 使用的库
827-
827+
828828
来区分Kernel,为同一个operator注册多个 Kernel。
829829
830830
```cpp
@@ -876,7 +876,7 @@ step 3: 运行时的 KernelType 推断和Kernel切换,<span style="background-
876876
namespace framework {
877877
using LoDTensorArray = std::vector<LoDTensor>;
878878
}
879-
}
879+
}
880880
```
881881
- 每一次循环,从原始输入中“切出”一个片段
882882
- LoDTensorArray 在Python端暴露,是Fluid支持的基础数据结构之一,用户可以直接创建并使用
@@ -910,7 +910,7 @@ void Run(const framework::Scope &scope,
910910
false /*create_local_scope*/);
911911
}
912912
}
913-
913+
914914
```
915915

916916
</font>
@@ -951,7 +951,7 @@ void Run(const framework::Scope &scope,
951951

952952
---
953953

954-
#### dynamicRNN 中的 Memory
954+
#### dynamicRNN 中的 Memory
955955

956956
<font size=5>
957957

@@ -961,7 +961,7 @@ void Run(const framework::Scope &scope,
961961
- `memory` 在 operator A 前向计算之后,进行前向计算
962962
-`memory` 的前向计算会 "指向" A 的输出 LoDTensor
963963
- `memory` 的输出可以是另一个 operator 的输入,于是形成了“循环”连接
964-
964+
965965
</font>
966966

967967
---
@@ -1107,7 +1107,7 @@ void Run(const framework::Scope &scope,
11071107
<td>
11081108
<p align="center">
11091109
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/fluid_module_1.png" width=60%>
1110-
</p>
1110+
</p>
11111111
</td>
11121112
<td>
11131113
<p align="center">
@@ -1127,13 +1127,13 @@ void Run(const framework::Scope &scope,
11271127
<font size=5>
11281128

11291129
- 设计概览
1130-
- 重构概览 [->](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/refactorization.md)
1131-
- fluid [->](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/fluid.md)
1130+
- 重构概览 [->](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/refactorization.md)
1131+
- fluid [->](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/fluid.md)
11321132
- fluid_compiler [->](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/motivation/fluid_compiler.md)
11331133
- 核心概念
11341134
- variable 描述 [->](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/var_desc.md)
11351135
- Tensor [->](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.md)
1136-
- LoDTensor [->](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md)
1136+
- LoDTensor [->](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md)
11371137
- TensorArray [->](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/tensor_array.md)
11381138
- Program [->](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md)
11391139
- Block [->](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md)
@@ -1152,7 +1152,7 @@ void Run(const framework::Scope &scope,
11521152
- 支持新设硬件设备库 [->](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/support_new_device.md)
11531153
- 添加新的Operator [->](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_cn.md)
11541154
- 添加新的Kernel [->](
1155-
https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_kernel_en.md)
1155+
https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_kernel_en.md)
11561156

11571157
</font>
11581158

@@ -1167,10 +1167,10 @@ https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/new_op_kernel_
11671167
<font size=5>
11681168

11691169
Docker编译PaddlePaddle源码: [->](http://www.paddlepaddle.org/docs/develop/documentation/fluid/zh/build_and_install/docker_install_cn.html)
1170-
1170+
11711171
PaddlePaddle 在 Dockerhub 地址:[->](
11721172
https://hub.docker.com/r/paddlepaddle/paddle/tags/)
1173-
1173+
11741174
1. 获取PaddlePaddle的Docker镜像
11751175
```bash
11761176
docker pull paddlepaddle/paddle:latest-dev
@@ -1183,7 +1183,7 @@ PaddlePaddle 在 Dockerhub 地址:[->](
11831183
```
11841184

11851185
1. 进入docker container后,从源码编译,请参考文档 [->]( http://www.paddlepaddle.org/docs/develop/documentation/fluid/zh/build_and_install/build_from_source_cn.html)
1186-
1186+
11871187
</font>
11881188

11891189
---
@@ -1196,7 +1196,7 @@ PaddlePaddle 在 Dockerhub 地址:[->](
11961196
1. 开发推荐使用tag为`latest-dev`的镜像,其中打包了所有编译依赖。`latest``lastest-gpu`是production镜像,主要用于运行PaddlePaddle程序。
11971197
2. 在Docker中运行GPU程序,推荐使用nvidia-docker,[否则需要将CUDA库和设备挂载到Docker容器内](http://www.paddlepaddle.org/docs/develop/documentation/fluid/zh/build_and_install/docker_install_cn.html)。
11981198
<font size=4>
1199-
1199+
12001200
```bash
12011201
nvidia-docker run -it -v $PWD/Paddle:/paddle paddlepaddle/paddle:latest-dev /bin/bash
12021202
```
@@ -1353,9 +1353,9 @@ Op注册实现在`.cc`文件;Kernel注册CPU实现在`.cc`文件中,CUDA实
13531353
}
13541354
};
13551355
```
1356-
1356+
13571357
</font>
1358-
1358+
13591359
---
13601360

13611361
###### 实现带Kernel的Operator <span style="background-color:#c4e1e1;">step2</span>: 定义Operator类
@@ -1420,11 +1420,11 @@ class ClipOp : public framework::OperatorWithKernel {
14201420
2. override InferShape函数(参考 [clip_op](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/clip_op.cc#L24))
14211421

14221422
1. 什么是`functor` ?
1423-
1423+
14241424
- 类或结构体仅重载了`()`,一般是可被多个kernel复用的计算函数。
14251425
14261426
<font size=4>
1427-
1427+
14281428
```cpp
14291429
template <typename T>
14301430
class CrossEntropyFunctor<platform::CPUDeviceContext, T> {
@@ -1438,9 +1438,9 @@ class ClipOp : public framework::OperatorWithKernel {
14381438
};
14391439
```
14401440
</font>
1441-
1441+
14421442
- 在 clip_op 内也会看到将一段计算函数抽象为functor的使用法: [->](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/clip_op.h#L27)。
1443-
1443+
14441444
</font>
14451445

14461446
---
@@ -1504,7 +1504,7 @@ class ClipKernel : public framework::OpKernel<T> {
15041504
- 需要注意,<span style="background-color:#e1c4c4;">Fluid中,不区分Cost Op和中间层Op,所有Op都必须正确处理接收到的梯度</span>
15051505
2. 反向Op的输出
15061506
- 对可学习参数的求导结果
1507-
- 对所有输入的求导结果
1507+
- 对所有输入的求导结果
15081508

15091509

15101510
</font>
@@ -1520,7 +1520,7 @@ class ClipKernel : public framework::OpKernel<T> {
15201520
1. 在`.cc`文件中注册前向、反向Op类,注册CPU Kernel。
15211521

15221522
<font size=4>
1523-
1523+
15241524
```cpp
15251525
namespace ops = paddle::operators;
15261526
REGISTER_OP(clip, ops::ClipOp, ops::ClipOpMaker<float>, clip_grad,
@@ -1530,13 +1530,13 @@ class ClipKernel : public framework::OpKernel<T> {
15301530
REGISTER_OP_CPU_KERNEL(
15311531
clip_grad, ops::ClipGradKernel<paddle::platform::CPUDeviceContext, float>);
15321532
```
1533-
1533+
15341534
- 在上面的代码片段中:
15351535

15361536
1. `REGISTER_OP` : 注册`ops::ClipOp`类,类型名为`clip`,该类的`ProtoMaker``ops::ClipOpMaker`,注册`ops::ClipOpGrad`,类型名为`clip_grad`
15371537
1. `REGISTER_OP_WITHOUT_GRADIENT` : 用于注册没有反向的Op,例如:优化算法相关的Op
15381538
1. `REGISTER_OP_CPU_KERNEL` :注册`ops::ClipKernel`类,并特化模板参数为`paddle::platform::CPUPlace``float`类型,同理,注册`ops::ClipGradKernel`
1539-
1539+
15401540
</font>
15411541
1. 按照同样方法,在`.cu`文件中注册GPU Kernel
15421542
- <span style="background-color:#e1c4c4;">如果CUDA Kernel的实现基于Eigen,需在 `.cu`的开始加上宏定义 `#define EIGEN_USE_GPU` </span>
@@ -1593,7 +1593,7 @@ class ClipKernel : public framework::OpKernel<T> {
15931593
```bash
15941594
make test ARGS="-R test_mul_op -V"
15951595
```
1596-
1596+
15971597
或者:
15981598

15991599
```
@@ -1613,7 +1613,7 @@ class ClipKernel : public framework::OpKernel<T> {
16131613
- 如果多个Op依赖一些共用的函数,可以创建非`*_op.*`格式的文件来存放,如`gather.h`文件。
16141614

16151615
</font>
1616-
1616+
16171617
---
16181618

16191619
### ==10.== 使用相关问题
@@ -1735,7 +1735,7 @@ class ClipKernel : public framework::OpKernel<T> {
17351735
y_data = np.random.randint(0, 8, [1]).astype("int32")
17361736
y_tensor = core.Tensor()
17371737
y_tensor.set(y_data, place)
1738-
1738+
17391739
x_data = np.random.uniform(0.1, 1, [11, 8]).astype("float32")
17401740
x_tensor = core.Tensor()
17411741
x_tensor.set(x_data, place)

doc/fluid/getstarted/index_cn.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,4 @@
1717
:maxdepth: 1
1818

1919
concepts/use_concepts_cn.rst
20+
developer's_guide_to_paddle_fluid.md

doc/fluid/getstarted/index_en.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,4 @@ Here is an example of linear regression. It introduces workflow of PaddlePaddle,
1616
:maxdepth: 1
1717

1818
concepts/index_en.rst
19+
developer's_guide_to_paddle_fluid.md

0 commit comments

Comments
 (0)