Skip to content

Commit 4e8fccf

Browse files
committed
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into seq_expand_op
2 parents d697b6a + 43d6981 commit 4e8fccf

File tree

144 files changed

+3992
-1698
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

144 files changed

+3992
-1698
lines changed

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ COPY ./paddle/scripts/docker/root/ /root/
2222

2323
RUN apt-get update && \
2424
apt-get install -y \
25-
git python-pip python-dev openssh-server bison \
25+
git python-pip python-dev openssh-server bison libnccl-dev \
2626
wget unzip unrar tar xz-utils bzip2 gzip coreutils ntp \
2727
curl sed grep graphviz libjpeg-dev zlib1g-dev \
2828
python-matplotlib gcc-4.8 g++-4.8 \

doc/design/block.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ OpDesc {
189189
inputs = {0} // the index of x in vars of BlockDesc above
190190
outputs = {5, 3} // indices of act and hidden_out in vars of BlockDesc above
191191
attrs {
192-
"memories" : {1} // the index of h
192+
"states" : {1} // the index of h
193193
"step_net" : <above step net>
194194
}
195195
};
500 Bytes
Binary file not shown.

doc/design/register_grad_op.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,17 @@
33

44
## The Problem Posed
55

6-
Currently, for each C++ operator class definition, there registers a *gradient operator creator* function, which takes a C++ operator instance and returns the corresponding gradient operator instance.
6+
Currently, for each C++ operator class definition, a *gradient operator creator* function is registered, which takes as input a C++ operator instance and returns the corresponding gradient operator instance.
77

8-
However, we noticed two problems with the current deisgn:
8+
However, we noticed two problems with the current design:
99

10-
1. As we decided to separate the *compilation* and *execution* phases, we need to change the creator to take an `OpDesc` protobuf message in a `ProgramDesc` and inserts corresponding `OpDesc` messages into the `ProgramDesc` message.
10+
1. As we decided to separate the *compilation* and the *execution* phases, we need to change the creator to take an `OpDesc` protobuf message in a `ProgramDesc` and inserts corresponding `OpDesc` messages into the `ProgramDesc` message.
1111

12-
1. Some operator's gradient computation requires more than one gradient operators. For example, the gradient of *minus* consists of two operators -- an identity operaotr and a scale operator. So we need to make the registration mechanism to support the mapping from an operator to a set of operators for gradient computation.
12+
1. For some operators, the gradient computation can be written in terms of existing operators. For example, the gradient of *minus* operator consists of two operators -- an *identity* operator followed by a *scale* operator. Hence the registration mechanism needs to support mapping from an operator to a set of operators for the gradient computation.
1313

1414
## The Current Implementation
1515

16-
The C++ class `OpInfos` store in a association map which key is the operator type. The `grad_op_type` indicate associated gradient operator type. Operator can create gradient operator by `OpInfo::creator_` of gradient. The pseudo code is
16+
Instances of the C++ class `OpInfo` are stored an associative map whose key is the operator type. The `grad_op_type` indicates the associated gradient operator type. An operator can create the gradient operator by invoking `OpInfo::creator_` of the gradient operator. The pseudo code is as follows
1717

1818
```cpp
1919
struct OpInfo {
@@ -31,16 +31,16 @@ OperatorBase* CreateGradientOperator(const OperatorBase& op) {
3131
3232
## Proposed Solution
3333
34-
The mapping relationship between an operator and its gradient operators is a function. The interface of that function is:
34+
The mapping relationship between an operator and its gradient operators is a function. The interface of this function is:
3535
3636
```cpp
3737
// (OpDesc) --> vector<OpDesc>
3838
std::function<std::vector<OpDescBind>(const OpDescBind&)>;
3939
```
4040

41-
The function takes an `OpDescBind` of the forward operator and returns one or many gradient operator descriptions. `OpDescBind` is a C++ wrapper for protobuf message `OpDesc` to manipulate `OpDesc` fast.
41+
The function takes an `OpDescBind` of the forward operator and returns one or many gradient operator descriptions. `OpDescBind` is a C++ wrapper for the protobuf message `OpDesc` for rapid manipulation of `OpDesc`.
4242

43-
The `GradOpDescMaker` will be registered in `OpInfo`, to replace `grad_op_type_` field. The `OpInfo` should be
43+
The `GradOpDescMaker` will be registered in `OpInfo` and will replace the `grad_op_type_` field. The `OpInfo` should look like
4444

4545
```cpp
4646
struct OpInfo {
@@ -49,7 +49,7 @@ struct OpInfo {
4949
};
5050
```
5151
52-
The `grad_op_maker_ ` is `nullptr` if the operator does not have associated gradient operators.
52+
The `grad_op_maker_ ` is a `nullptr` if the operator does not have any associated gradient operators.
5353
5454
We propose a base class called `GradOpDescMakerBase` to let operator developers generate `Gradient Operators` easily. The public interface of that class is
5555
@@ -74,7 +74,7 @@ func = [] (const OpDescBind& fwd_op) {
7474

7575
We can write many helper functions since the `GradOpDescMakerBase` is a class now. The basic helper functions get the variables of `Input`, `Output`, `InputGradient` and `OutputGradient` in the forwarding operator.
7676

77-
We should chagne register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So `REGISTER_OP` just register one operator. If the `REGISTER_OPERATOR ` contains `OpProtoAndCheckerMaker` and `GradOpDescMaker`, we just list them in the same macro. It can be done by a macro contains `__VA_ARGS__`.
77+
We should change register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So `REGISTER_OP` just register one operator. If the `REGISTER_OPERATOR ` contains `OpProtoAndCheckerMaker` and `GradOpDescMaker`, we just list them in the same macro. It can be done by a macro contains `__VA_ARGS__`.
7878

7979
The user interface should be
8080

doc/faq/local/index_cn.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@ decoder_inputs = paddle.layer.fc(
174174
1. 两者都是对梯度的截断,但截断时机不同,前者在 :code:`optimzier` 更新网络参数时应用;后者在激活函数反向计算时被调用;
175175
2. 截断对象不同:前者截断可学习参数的梯度,后者截断回传给前层的梯度;
176176

177-
除此之外,还可以通过减小学习律或者对数据进行归一化处理来解决这类问题
177+
除此之外,还可以通过减小学习率或者对数据进行归一化处理来解决这类问题
178178

179179
5. 如何调用 infer 接口输出多个layer的预测结果
180180
-----------------------------------------------

0 commit comments

Comments
 (0)