Skip to content

Commit 50c2a2e

Browse files
committed
Fix deadlinks in fluid api
1 parent 7c194ac commit 50c2a2e

25 files changed

+72
-79
lines changed

doc/fluid/design/algorithm/parameter_average.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -49,18 +49,18 @@ In the new design, we propose to create a new operation for averaging parameter
4949
- the optimizer
5050
- the window_size to keep the updates
5151

52-
The ParameterAverageOptimizer op can be like any other operator with its own CPU/GPU implementation either using Eigen or separate CPU and GPU kernels. As the initial implementation, we can implement the kernel using Eigen following the abstraction pattern implemented for [Operators](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/rmsprop_op.h). We also want to support the case when the Trainer/Optimizer runs on the GPU while ParameterAverageOptimizer runs on a CPU.
52+
The ParameterAverageOptimizer op can be like any other operator with its own CPU/GPU implementation either using Eigen or separate CPU and GPU kernels. As the initial implementation, we can implement the kernel using Eigen following the abstraction pattern implemented for [Operators](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/rmsprop_op.h). We also want to support the case when the Trainer/Optimizer runs on the GPU while ParameterAverageOptimizer runs on a CPU.
5353

54-
The idea of building an op for averaging is in sync with the refactored PaddlePaddle philosophy of using operators to represent any computation unit. The way the op will be added to the computation graph will be decided by the [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) in Python API.
54+
The idea of building an op for averaging is in sync with the refactored PaddlePaddle philosophy of using operators to represent any computation unit. The way the op will be added to the computation graph will be decided by the [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/modules/python_api.md#layer-function) in Python API.
5555

5656
### Python API implementation for ParameterAverageOptimizer
5757

5858
Based on Polyak and Juditsky (1992), we can generalize the averaging of updates to any optimizer. The input to the op would be the following:
5959
- Any optimizer (RMSProp , AdaGrad etc.)
6060
- A window size. The op keeps accumulating updated parameter values over a window of N batches and takes an average. Move the averaged value to a buffer when window is full to avoid loss of precision.
6161

62-
Using the ParameterAverageOptimizer op, any user can add the operation to their computation graphs. However, this will require a lot of lines of code and we should design Python APIs that support averaging. As per the PaddlePaddle [Python API design](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md), the layer functions are responsible for creating operators, operator parameters and variables. Since ParameterAverageOptimizer will be an operator, it makes sense to create it in the layer functions.
63-
We will have a wrapper written in Python that will support the functionality and implement the actual core computation in C++ core as we have done for other [Optimizers](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/rmsprop_op.cc)
62+
Using the ParameterAverageOptimizer op, any user can add the operation to their computation graphs. However, this will require a lot of lines of code and we should design Python APIs that support averaging. As per the PaddlePaddle [Python API design](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/modules/python_api.md), the layer functions are responsible for creating operators, operator parameters and variables. Since ParameterAverageOptimizer will be an operator, it makes sense to create it in the layer functions.
63+
We will have a wrapper written in Python that will support the functionality and implement the actual core computation in C++ core as we have done for other [Optimizers](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/rmsprop_op.cc)
6464

6565
#### Creation of the ParameterAverageOptimizer operator
6666
There are two ways for creating the ParameterAverageOptimizer op:
@@ -71,4 +71,4 @@ The proposal is to add the op immediately while building the computation graph.
7171

7272
#### High-level API
7373

74-
In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) to create neural network layers. Hence, we also need to provide parameter average functionality in layer functions.
74+
In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/modules/python_api.md#layer-function) to create neural network layers. Hence, we also need to provide parameter average functionality in layer functions.

doc/fluid/design/concepts/block.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ if (cond) {
113113

114114
```
115115
116-
An equivalent PaddlePaddle program from the design doc of the [IfElseOp operator](./if_else_op.md) is as follows:
116+
An equivalent PaddlePaddle program from the design doc of the [IfElseOp operator](../execution/if_else_op.md) is as follows:
117117
118118
```python
119119
import paddle as pd
@@ -140,7 +140,7 @@ The difference is that variables in the C++ program contain scalar values, where
140140

141141
### Blocks with `for` and `RNNOp`
142142

143-
The following RNN model in PaddlePaddle from the [RNN design doc](./rnn.md) :
143+
The following RNN model in PaddlePaddle from the [RNN design doc](../dynamic_rnn/rnn.md) :
144144

145145
```python
146146
x = sequence([10, 20, 30]) # shape=[None, 1]

doc/fluid/design/concepts/executor.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Executor Design Doc
22

33
## Motivation
4-
In [fluid](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/fluid.md), we encourage the user to use deep learning programming paradigms to describe the training process. When the user-written Python program is executed, it will first create a protobuf message
4+
In [fluid](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/motivation/fluid.md), we encourage the user to use deep learning programming paradigms to describe the training process. When the user-written Python program is executed, it will first create a protobuf message
55
[`ProgramDesc`](https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145) that describes the process and is conceptually like an [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
66

77
The executor runs the `ProgramDesc` like an interpreter. `ProgramDesc` contains the intrinsics (operators in this case) and variables which will be used, executor explicitly executes the stored precompiled code.

doc/fluid/design/concepts/program.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
A PaddlePaddle program consists of two parts -- the first generates a `ProgramDesc` protobuf message that describes the program, and the second runs this message using a C++ class `Executor`.
66

7-
A simple example PaddlePaddle program can be found in [graph.md](./graph.md):
7+
A simple example PaddlePaddle program can be found in [graph.md](../others/graph.md):
88

99
```python
1010
x = layer.data("images")

doc/fluid/design/concurrent/concurrent_programming.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Design Doc: Concurrent Programming with Fluid
22

3-
With PaddlePaddle Fluid, users describe a program other than a model. The program is a [`ProgramDesc`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto) protobuf message. TensorFlow/MxNet/Caffe2 applications generate protobuf messages too, but their protobuf messages represent the model, a graph of operators, but not the program that trains/uses the model.
3+
With PaddlePaddle Fluid, users describe a program other than a model. The program is a [`ProgramDesc`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/framework.proto) protobuf message. TensorFlow/MxNet/Caffe2 applications generate protobuf messages too, but their protobuf messages represent the model, a graph of operators, but not the program that trains/uses the model.
44

55
Many know that when we program TensorFlow, we can specify the device on which each operator runs. This allows us to create a concurrent/parallel AI application. An interesting questions is **how does a `ProgramDesc` represents a concurrent program?**
66

@@ -28,19 +28,19 @@ The following table compares concepts in Fluid and Go
2828
<tr>
2929
<td>control-flow and built-in functions </td>
3030
<td>
31-
<a href="https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators">intrinsics/operators</a></td>
31+
<a href="https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/fluid/operators">intrinsics/operators</a></td>
3232
<td></td>
3333
</tr>
3434
<tr>
3535
<td>goroutines, channels </td>
3636
<td>
37-
<a href="https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/framework/thread_pool.h">class ThreadPool</a></td>
37+
<a href="https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/fluid/framework/thread_pool.h">class ThreadPool</a></td>
3838
<td></td>
3939
</tr>
4040
<tr>
4141
<td>runtime </td>
4242
<td>
43-
<a href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h">class Executor</a></td>
43+
<a href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/executor.h">class Executor</a></td>
4444
<td></td>
4545
</tr>
4646
</tbody>
@@ -78,7 +78,7 @@ message ProgramDesc {
7878
}
7979
```
8080

81-
Then, the default `main` function calls `fluid.run()`, which creates an instance of the [`class Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h) and calls `Executor.Run(block[0])`, where `block[0]` is the first and only block defined in above `ProgramDesc` message.
81+
Then, the default `main` function calls `fluid.run()`, which creates an instance of the [`class Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/executor.h) and calls `Executor.Run(block[0])`, where `block[0]` is the first and only block defined in above `ProgramDesc` message.
8282

8383
The default `main` function is defined as follows:
8484

@@ -146,7 +146,7 @@ An explanation of the above program:
146146

147147
- `fluid.k8s` is a package that provides access to Kubernetes API.
148148
- `fluid.k8s.get_worker_addrs` returns the list of IP and ports of all pods of the current job except for the current one (the master pod).
149-
- `fluid.tensor_array` creates a [tensor array](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor_array.h). `fluid.parallel_for` creates a `ParallelFor` intrinsic, which, when executed,
149+
- `fluid.tensor_array` creates a [tensor array](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/lod_tensor_array.h). `fluid.parallel_for` creates a `ParallelFor` intrinsic, which, when executed,
150150

151151
1. creates `len(L)` scopes, each for the concurrent running of the sub-block (block 1 in this case), and initializes a variable named "index" in the scope to an integer value in the range `[0, len(L)-1]`, and
152152
2. creates `len(L)` threads by calling into the `ThreadPool` singleton, each thread
@@ -175,7 +175,7 @@ where
175175
1. listens on the current pod's IP address, as returned by `fliud.k8s.self_addr()`,
176176
2. once a connection is established,
177177
1. creates a scope of two parameters, "input" and "output",
178-
2. reads a [Fluid variable](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h) and saves it into "input",
178+
2. reads a [Fluid variable](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/variable.h) and saves it into "input",
179179
3. creates an Executor instance and calls `Executor.Run(block)`, where the block is generated by running the lambda specified as the second parameter of `fluid.listen_and_do`.
180180

181181
## Summarization

doc/fluid/design/dist_train/distributed_architecture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ The local training architecture will be the same as the distributed training arc
177177
### Training Data
178178

179179
In PaddlePaddle v0.10.0, training data is typically read
180-
with [data reader](../reader/README.md) from Python. This approach is
180+
with [data reader](./README.md) from Python. This approach is
181181
no longer efficient when training distributedly since the Python
182182
process no longer runs on the same node with the trainer processes,
183183
the Python reader will need to read from the distributed filesystem

doc/fluid/design/dist_train/parameter_server.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ For embedding layers, the gradient may have many rows containing only 0 when tra
6565
if the gradient uses a dense tensor to do parameter optimization,
6666
it could spend unnecessary memory, slow down the calculations and waste
6767
the bandwidth while doing distributed training.
68-
In Fluid, we introduce [SelectedRows](../selected_rows.md) to represent a list of rows containing
68+
In Fluid, we introduce [SelectedRows](../modules/selected_rows.md) to represent a list of rows containing
6969
non-zero gradient data. So when we do parameter optimization both locally and remotely,
7070
we only need to send those non-zero rows to the optimizer operators:
7171

doc/fluid/design/dynamic_rnn/rnn.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ There are several important concepts here:
2222
There could be local variables defined in each step-net. PaddlePaddle runtime realizes these variables in *step-scopes* which are created for each step.
2323

2424
<p align="center">
25-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/rnn.png"/><br/>
25+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/rnn.png"/><br/>
2626
Figure 2 illustrates the RNN's data flow
2727
</p>
2828

@@ -93,7 +93,7 @@ For example, we could have a 2-level RNN, where the top level corresponds to par
9393
The following figure illustrates feeding in text into the lower level, one sentence at a step, and the feeding in step outputs to the top level. The final top level output is about the whole text.
9494
9595
<p align="center">
96-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/2_level_rnn.png"/>
96+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/rnn.png"/>
9797
</p>
9898
9999
```python
@@ -149,5 +149,5 @@ If the `output_all_steps` is set to False, it will only output the final time st
149149
150150
151151
<p align="center">
152-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/rnn_2level_data.png"/>
152+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/rnn_2level_data.png"/>
153153
</p>

doc/fluid/design/index_cn.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
concepts/index_cn.rst
1010
data_type/index_cn.rst
1111
memory/index_cn.rst
12-
muti_devices/index_cn.rst
12+
multi_devices/index_cn.rst
1313
dynamic_rnn/index_cn.rst
1414
concurrent/index_cn.rst
1515
algorithm/index_cn.rst

doc/fluid/design/index_en.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Design
99
concepts/index_en.rst
1010
data_type/index_en.rst
1111
memory/index_en.rst
12-
muti_devices/index_en.rst
12+
multi_devices/index_en.rst
1313
dynamic_rnn/index_en.rst
1414
concurrent/index_en.rst
1515
algorithm/index_en.rst

0 commit comments

Comments
 (0)