Skip to content

Commit 5641859

Browse files
authored
Merge pull request #9641 from weixing02/img
Fix links error for github images
2 parents f2c0b88 + fcb4844 commit 5641859

File tree

15 files changed

+34
-31
lines changed

15 files changed

+34
-31
lines changed

doc/fluid/design/algorithm/parameter_average.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@ In a large scale machine learning setup where the size of the training data is h
55

66
Polyak and Juditsky (1992) showed that the test performance of simple average of parameters obtained by Stochastic Gradient Descent (SGD) is as good as that of parameter values that are obtained by training the model over and over again, over the training dataset.
77

8-
Hence, to accelerate the speed of Stochastic Gradient Descent, Averaged Stochastic Gradient Descent (ASGD) was proposed in Polyak and Juditsky (1992). For ASGD, the running average of parameters obtained by SGD, is used as the estimator for <img src="./images/theta_star.gif"/><br/> . The averaging is done as follows:
8+
Hence, to accelerate the speed of Stochastic Gradient Descent, Averaged Stochastic Gradient Descent (ASGD) was proposed in Polyak and Juditsky (1992). For ASGD, the running average of parameters obtained by SGD, is used as the estimator for <img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/theta_star.gif"/><br/> . The averaging is done as follows:
99

1010
<p align="center">
11-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/asgd.gif"><br />
11+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/asgd.gif"><br />
1212
</p>
1313

1414
We propose averaging for any optimizer similar to how ASGD performs it, as mentioned above.

doc/fluid/design/concurrent/channel.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,13 +114,13 @@ current thread under two conditions:
114114
#### Channel Send
115115

116116
<p align="center">
117-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/channel_send.png"/><br/>
117+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/channel_send.png"/><br/>
118118
</p>
119119

120120
#### Channel Receive
121121

122122
<p align="center">
123-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/channel_recv.png"/><br/>
123+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/channel_recv.png"/><br/>
124124
</p>
125125

126126
## Limitations and Considerations

doc/fluid/design/concurrent/concurrent_programming.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,21 +23,25 @@ The following table compares concepts in Fluid and Go
2323
<td>user-defined functions </td>
2424
<td>
2525
<a href="https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid">layers</a></td>
26+
<td></td>
2627
</tr>
2728
<tr>
2829
<td>control-flow and built-in functions </td>
2930
<td>
3031
<a href="https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators">intrinsics/operators</a></td>
32+
<td></td>
3133
</tr>
3234
<tr>
3335
<td>goroutines, channels </td>
3436
<td>
3537
<a href="https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/framework/thread_pool.h">class ThreadPool</a></td>
38+
<td></td>
3639
</tr>
3740
<tr>
3841
<td>runtime </td>
3942
<td>
4043
<a href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h">class Executor</a></td>
44+
<td></td>
4145
</tr>
4246
</tbody>
4347
</table>

doc/fluid/design/concurrent/select_op.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -254,7 +254,7 @@ only one case will be executed.
254254
### select_op flow
255255

256256
<p align="center">
257-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/select_op_workflow.png"/><br/>
257+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/select_op_workflow.png"/><br/>
258258
</p>
259259

260260
The select algorithm is inspired by golang's select routine. Please refer to

doc/fluid/design/dist_train/distributed_architecture.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,11 @@ computation is only specified in Python code which sits outside of PaddlePaddle,
4040

4141
Similar to how a compiler uses an intermediate representation (IR) so that the programmer does not need to manually optimize their code for most of the cases, we can have an intermediate representation in PaddlePaddle as well. The compiler optimizes the IR as follows:
4242

43-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/compiler.png"/>
43+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/compiler.png"/>
4444

4545
PaddlePaddle can support model parallelism by converting the IR so that the user no longer needs to manually perform the computation and operations in the Python component:
4646

47-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/paddle-compile.png"/>
47+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/paddle-compile.png"/>
4848

4949
The IR for PaddlePaddle after refactoring is called a `Block`, it specifies the computation dependency graph and the variables used in the computation.
5050

@@ -60,7 +60,7 @@ For a detailed explanation, refer to this document -
6060

6161
The revamped distributed training architecture can address the above discussed limitations. Below is the illustration of how it does so:
6262

63-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/distributed_architecture.png"/>
63+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/distributed_architecture.png"/>
6464

6565
The major components are: *Python API*, *Distribute Transpiler* and *Remote Executor*.
6666

@@ -152,7 +152,7 @@ for data in train_reader():
152152
`JobDesc` object describe the distributed job resource specification to run on
153153
Cluster environment.
154154

155-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/remote_executor.png" width="500" align="center" />
155+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/remote_executor.png" width="500" align="center" />
156156

157157
`RemoteExecutor.run` sends the `ProgramDesc` and
158158
[TrainingJob](https://github.com/PaddlePaddle/cloud/blob/unreleased-tpr/doc/autoscale/README.md#training-job-resource)
@@ -171,7 +171,7 @@ In the future, a more general placement algorithm should be implemented, which m
171171

172172
The local training architecture will be the same as the distributed training architecture, the difference is that everything runs locally, and there is just one PaddlePaddle runtime:
173173

174-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/local_architecture.png"/>
174+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/local_architecture.png"/>
175175

176176

177177
### Training Data

doc/fluid/design/dist_train/multi_cpu.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ Op graph to a multi-CPU Op graph, and run `ParallelDo` Op to run the graph.
88

99
## Transpiler
1010

11-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/[email protected]" width="300">
11+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/[email protected]" width="300">
1212

1313
After converted:
1414

15-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/[email protected]" width="1000">
15+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/[email protected]" width="1000">
1616

1717
## Implement
1818

doc/fluid/design/dist_train/parameter_server.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -41,11 +41,11 @@ We will need these OPs: *Send*, *Recv*, *Enqueue*, *Dequeue*.
4141
Below is an example of converting the user defined graph to the
4242
subgraphs for the trainer and the parameter server:
4343

44-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/local-graph.png" width="300"/>
44+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/local-graph.png" width="300"/>
4545

4646
After converting:
4747

48-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/dist-graph.png" width="700"/>
48+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/dist-graph.png" width="700"/>
4949

5050
1. The parameter variable W and its optimizer program are placed on the parameter server.
5151
1. Operators are added to the program.
@@ -69,8 +69,7 @@ In Fluid, we introduce [SelectedRows](../selected_rows.md) to represent a list o
6969
non-zero gradient data. So when we do parameter optimization both locally and remotely,
7070
we only need to send those non-zero rows to the optimizer operators:
7171

72-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/sparse_update.png" width="700" />
73-
72+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/sparse_update.png" width="700" />
7473
### Benefits
7574

7675
- Model parallelism becomes easier to implement: it is an extension to

doc/fluid/design/dynamic_rnn/rnn.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ This document describes the RNN (Recurrent Neural Network) operator and how it i
55
## RNN Algorithm Implementation
66

77
<p align="center">
8-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/rnn.jpg"/>
8+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/rnn.jpg"/>
99
</p>
1010

1111
The above diagram shows an RNN unrolled into a full network.

doc/fluid/design/modules/batch_norm_op.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ As most C++ operators do, `batch_norm_op` is defined by inputs, outputs, attribu
6666

6767
The following graph showes the training computational process of `batch_norm_op`:
6868

69-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/batch_norm_op_kernel.png" width="800"/>
69+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/batch_norm_op_kernel.png" width="800"/>
7070

7171
cudnn provides APIs to finish the whole series of computation, we can use them in our GPU kernel.
7272

@@ -124,7 +124,7 @@ for pass_id in range(PASS_NUM):
124124
`is_infer` is an attribute. Once an operator is created, its attributes can not be changed. It suggests us that we shall maintain two `batch_norm_op` in the model, one's `is_infer` is `True`(we call it `infer_batch_norm_op`) and the other one's is `False`(we call it `train_batch_norm_op`). They share all parameters and variables, but be placed in two different branches. That is to say, if a network contains a `batch_norm_op`, it will fork into two branches, one go through `train_batch_norm_op` and the other one go through `infer_batch_norm_op`:
125125

126126
<div align=center>
127-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/batch_norm_fork.png" width="500"/>
127+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/batch_norm_fork.png" width="500"/>
128128
</div>
129129

130130
Just like what is shown in the above graph, the net forks before `batch_norm_op` and will never merge again. All the operators after `batch_norm_op` will duplicate.

doc/fluid/design/modules/regularization.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,17 @@ A central problem in machine learning is how to design an algorithm that will pe
66
### Parameter Norm Penalties
77
Most common regularization approaches in deep learning are based on limiting the capacity of the models by adding a parameter norm penalty to the objective function `J`. This is given as follows:
88

9-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/loss_equation.png" align="center"/><br/>
9+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/loss_equation.png" align="center"/><br/>
1010

1111
The parameter `alpha` is a hyperparameter that weights the relative contribution of the norm penalty term, `omega`, relative to the standard objective function `J`.
1212

1313
The most commonly used norm penalties are the L2 norm penalty and the L1 norm penalty. These are given as follows:
1414

1515
##### L2 Regularization:
16-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/l2_regularization.png" align="center"/><br/>
16+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/l2_regularization.png" align="center"/><br/>
1717

1818
##### L1 Regularization
19-
<img src=".https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/l1_regularization.png" align="center"/><br/>
19+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/l1_regularization.png" align="center"/><br/>
2020

2121
A much more detailed mathematical background of regularization can be found [here](http://www.deeplearningbook.org/contents/regularization.html).
2222

@@ -40,11 +40,11 @@ The idea of building ops for regularization is in sync with the refactored Paddl
4040

4141
Below is an example of a really simple feed forward neural network.
4242

43-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/feed_forward.png" align="center"/><br/>
43+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/feed_forward.png" align="center"/><br/>
4444

4545
The Python API will modify this computation graph to add regularization operators. The modified computation graph will look as follows:
4646

47-
<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/feed_forward_regularized.png" align="center"/><br/>
47+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/feed_forward_regularized.png" align="center"/><br/>
4848

4949
### Python API implementation for Regularization
5050

0 commit comments

Comments
 (0)