Skip to content

Commit 3a82578

Browse files
committed
Merge branch 'develop' into core_inference_prepare
2 parents fbd3604 + d139f2c commit 3a82578

File tree

186 files changed

+4223
-755
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

186 files changed

+4223
-755
lines changed

benchmark/cluster/README.md

Lines changed: 133 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -36,23 +36,83 @@
3636
- Trainer Count: 100
3737
- Metrics: mini-batch / sec
3838

39-
| Batch Size | 32 | 64 | 128 | 256 |
40-
| -- | -- | -- | -- | -- |
41-
| PaddlePaddle Fluid | - | - | - | - |
42-
| PaddlePaddle v2 | - | - | - | - |
43-
| TensorFlow | - | - | - | - |
39+
40+
<table>
41+
<thead>
42+
<tr>
43+
<th>Batch Size </th>
44+
<th> 32</th>
45+
<th>64</th>
46+
<th>128 </th>
47+
<th>256</th>
48+
</tr>
49+
</thead>
50+
<tbody>
51+
<tr>
52+
<td> PaddlePaddle Fluid</td>
53+
<td>-</td>
54+
<td>- </td>
55+
<td>- </td>
56+
<td>- </td>
57+
</tr>
58+
<tr>
59+
<td>PaddlePaddle v2 </td>
60+
<td>- </td>
61+
<td>- </td>
62+
<td>- </td>
63+
<td>- </td>
64+
</tr>
65+
<tr>
66+
<td>TensorFlow </td>
67+
<td>- </td>
68+
<td>- </td>
69+
<td>- </td>
70+
<td>- </td>
71+
</tr>
72+
</tbody>
73+
</table>
4474

4575
### Measure the Performance for Different PServer Count
4676

4777
- Trainer Count: 100
4878
- Batch Size: 64
4979
- Metrics: mini-batch / sec
5080

51-
| PServer Count | 10 | 20 | 40 | 60 |
52-
| -- | -- | -- | -- | -- |
53-
| PaddlePaddle Fluid | - | - | - | - |
54-
| PaddlePaddle v2 | - | - | - | - |
55-
| TensorFlow | - | - | - | - |
81+
82+
<table>
83+
<thead>
84+
<tr>
85+
<th>PServer Count </th>
86+
<th>10</th>
87+
<th>20</th>
88+
<th>40 </th>
89+
<th>60</th>
90+
</tr>
91+
</thead>
92+
<tbody>
93+
<tr>
94+
<td> PaddlePaddle Fluid</td>
95+
<td>-</td>
96+
<td>- </td>
97+
<td>- </td>
98+
<td>- </td>
99+
</tr>
100+
<tr>
101+
<td>PaddlePaddle v2 </td>
102+
<td>- </td>
103+
<td>- </td>
104+
<td>- </td>
105+
<td>- </td>
106+
</tr>
107+
<tr>
108+
<td>TensorFlow </td>
109+
<td>- </td>
110+
<td>- </td>
111+
<td>- </td>
112+
<td>- </td>
113+
</tr>
114+
</tbody>
115+
</table>
56116

57117
### Measure Parallel Efficiency By Increasing Trainer Count
58118

@@ -67,11 +127,69 @@ The parallel efficiency is:
67127

68128
$E = \div(S, N)$
69129

70-
| Trainer Counter | 1 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
71-
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
72-
| PaddlePaddle Fluid | - | - | - | - | - | - | - | - | - | - | - |
73-
| PaddlePaddle v2 | - | - | - | - | - | - | - | - | - | - | - | - |
74-
| TensorFlow | - | - | - | - | - | - | - | - | - | - | - | - | - |
130+
<table>
131+
<thead>
132+
<tr>
133+
<th>Trainer Counter </th>
134+
<th>1</th>
135+
<th>10</th>
136+
<th>20 </th>
137+
<th>30</th>
138+
<th>40</th>
139+
<th>50</th>
140+
<th>60 </th>
141+
<th>70</th>
142+
<th>80</th>
143+
<th>90</th>
144+
<th>100 </th>
145+
</tr>
146+
</thead>
147+
<tbody>
148+
<tr>
149+
<td> PaddlePaddle Fluid</td>
150+
<td>-</td>
151+
<td>- </td>
152+
<td>- </td>
153+
<td>- </td>
154+
<td>-</td>
155+
<td>- </td>
156+
<td>- </td>
157+
<td>- </td>
158+
<td>-</td>
159+
<td>- </td>
160+
<td>- </td>
161+
</tr>
162+
<tr>
163+
<td>PaddlePaddle v2 </td>
164+
<td>- </td>
165+
<td>- </td>
166+
<td>- </td>
167+
<td>- </td>
168+
<td>-</td>
169+
<td>- </td>
170+
<td>- </td>
171+
<td>- </td>
172+
<td>-</td>
173+
<td>- </td>
174+
<td>- </td>
175+
</tr>
176+
<tr>
177+
<td>TensorFlow </td>
178+
<td>- </td>
179+
<td>- </td>
180+
<td>- </td>
181+
<td>- </td>
182+
<td>-</td>
183+
<td>- </td>
184+
<td>- </td>
185+
<td>- </td>
186+
<td>-</td>
187+
<td>- </td>
188+
<td>- </td>
189+
</tr>
190+
</tbody>
191+
</table>
192+
75193

76194
## Reproduce the benchmark
77195

benchmark/cluster/vgg16/README.md

Lines changed: 139 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -16,48 +16,166 @@ Setting environment variable: `MKL_NUM_THREADS=1`.
1616

1717
- Metrics: samples / sec
1818

19-
| Batch Size | 32 | 64 | 128 | 256 |
20-
| -- | -- | -- | -- | -- |
21-
| PaddlePaddle Fluid | 15.44 | 16.32 | 16.74 | 16.79 |
22-
| PaddlePaddle v2 | 15.97 | 17.04 | 17.60 | 17.83 |
23-
| TensorFlow | 9.09 | 9.10 | 9.24 | 8.66 |
19+
<table>
20+
<thead>
21+
<tr>
22+
<th>Batch Size </th>
23+
<th> 32</th>
24+
<th>64</th>
25+
<th>128 </th>
26+
<th>256</th>
27+
</tr>
28+
</thead>
29+
<tbody>
30+
<tr>
31+
<td> PaddlePaddle Fluid</td>
32+
<td> 15.44 </td>
33+
<td> 16.32 </td>
34+
<td> 16.74 </td>
35+
<td> 16.79 </td>
36+
</tr>
37+
<tr>
38+
<td>PaddlePaddle v2 </td>
39+
<td> 15.97 </td>
40+
<td> 17.04 </td>
41+
<td> 17.60 </td>
42+
<td> 17.83 </td>
43+
</tr>
44+
<tr>
45+
<td>TensorFlow </td>
46+
<td> 9.09 </td>
47+
<td> 9.10 </td>
48+
<td> 9.24 </td>
49+
<td> 8.66 </td>
50+
</tr>
51+
</tbody>
52+
</table>
53+
2454

2555
### Different Batch Size
2656

2757
- PServer Count: 10
2858
- Trainer Count: 20
2959
- Metrics: samples / sec
3060

31-
| Batch Size | 32 | 64 | 128 | 256 |
32-
| -- | -- | -- | -- | -- |
33-
| PaddlePaddle Fluid | 190.20 | 222.15 | 247.40 | 258.18 |
34-
| PaddlePaddle v2 | 170.96 | 233.71 | 256.14 | 329.23 |
35-
| TensorFlow | - | - | - | - |
36-
61+
<table>
62+
<thead>
63+
<tr>
64+
<th>Batch Size </th>
65+
<th> 32</th>
66+
<th>64</th>
67+
<th>128 </th>
68+
<th>256</th>
69+
</tr>
70+
</thead>
71+
<tbody>
72+
<tr>
73+
<td> PaddlePaddle Fluid</td>
74+
<td> 190.20 </td>
75+
<td> 222.15 </td>
76+
<td> 247.40 </td>
77+
<td> 258.18 </td>
78+
</tr>
79+
<tr>
80+
<td>PaddlePaddle v2 </td>
81+
<td> 170.96 </td>
82+
<td> 233.71 </td>
83+
<td> 256.14 </td>
84+
<td> 329.23 </td>
85+
</tr>
86+
<tr>
87+
<td>TensorFlow </td>
88+
<td> - </td>
89+
<td> - </td>
90+
<td> - </td>
91+
<td> - </td>
92+
</tr>
93+
</tbody>
94+
</table>
3795

3896
### Accelerate Rate
3997

4098
- Pserver Count: 20
4199
- Batch Size: 128
42100
- Metrics: samples / sec
43101

44-
| Trainer Count | 20 | 40 | 80 | 100 |
45-
| -- | -- | -- | -- | -- |
46-
| PaddlePaddle Fluid | 263.29 (78.64%) | 518.80 (77.47%) | 836.26 (62.44%) | 1019.29 (60.89%) |
47-
| PaddlePaddle v2 (need more tests) | 326.85 (92.85%) | 534.58 (75.93%) | 853.30 (60.60%) | 1041.99 (59.20%) |
48-
| TensorFlow | - | - | - | - |
102+
<table>
103+
<thead>
104+
<tr>
105+
<th>Trainer Count </th>
106+
<th>20</th>
107+
<th>40</th>
108+
<th>80</th>
109+
<th>100</th>
110+
</tr>
111+
</thead>
112+
<tbody>
113+
<tr>
114+
<td> PaddlePaddle Fluid</td>
115+
<td> 263.29 (78.64%) </td>
116+
<td> 518.80 (77.47%) </td>
117+
<td> 836.26 (62.44%) </td>
118+
<td> 1019.29 (60.89%) </td>
119+
</tr>
120+
<tr>
121+
<td>PaddlePaddle v2 (need more tests) </td>
122+
<td> 326.85 (92.85%) </td>
123+
<td> 534.58 (75.93%) </td>
124+
<td> 853.30 (60.60%) </td>
125+
<td> 1041.99 (59.20%) </td>
126+
</tr>
127+
<tr>
128+
<td>TensorFlow </td>
129+
<td> - </td>
130+
<td> - </td>
131+
<td> - </td>
132+
<td> - </td>
133+
</tr>
134+
</tbody>
135+
</table>
136+
49137

50138
### Different Pserver Count
51139

52140
- Trainer Count: 60
53141
- Batch Size: 128
54142
- Metrics: samples/ sec
55143

56-
| PServer Count | 3 | 6 |10 | 20 |
57-
| -- | -- | -- | -- | -- |
58-
| PaddlePaddle Fluid(should fix in next PR) | 589.1 | 592.6 | 656.4 | 655.8 |
59-
| PaddlePaddle v2 | 593.4 | 791.3 | 729.7 | 821.7 |
60-
| TensorFlow | - | - | - | - |
144+
<table>
145+
<thead>
146+
<tr>
147+
<th>PServer Count </th>
148+
<th>3</th>
149+
<th>6</th>
150+
<th>10</th>
151+
<th>20</th>
152+
</tr>
153+
</thead>
154+
<tbody>
155+
<tr>
156+
<td> PaddlePaddle Fluid(should fix in next PR) </td>
157+
<td> 589.1 </td>
158+
<td> 592.6 </td>
159+
<td> 656.4 </td>
160+
<td> 655.8 </td>
161+
</tr>
162+
<tr>
163+
<td>PaddlePaddle v2 (need more tests) </td>
164+
<td> 593.4 </td>
165+
<td> 791.3 </td>
166+
<td> 729.7 </td>
167+
<td> 821.7 </td>
168+
</tr>
169+
<tr>
170+
<td>TensorFlow </td>
171+
<td> - </td>
172+
<td> - </td>
173+
<td> - </td>
174+
<td> - </td>
175+
</tr>
176+
</tbody>
177+
</table>
178+
61179

62180
*The performance gap between Fuild and v2 comes from the network interference.*
63181

doc/fluid/api/layers.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -494,6 +494,12 @@ reshape
494494
.. autofunction:: paddle.fluid.layers.reshape
495495
:noindex:
496496

497+
pad
498+
---
499+
500+
.. autofunction:: paddle.fluid.layers.pad
501+
:noindex:
502+
497503
scale
498504
-----
499505

doc/fluid/design/algorithm/parameter_average.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,11 @@ In a large scale machine learning setup where the size of the training data is h
55

66
Polyak and Juditsky (1992) showed that the test performance of simple average of parameters obtained by Stochastic Gradient Descent (SGD) is as good as that of parameter values that are obtained by training the model over and over again, over the training dataset.
77

8-
Hence, to accelerate the speed of Stochastic Gradient Descent, Averaged Stochastic Gradient Descent (ASGD) was proposed in Polyak and Juditsky (1992). For ASGD, the running average of parameters obtained by SGD, is used as the estimator for <img src="./images/theta_star.gif"/><br/> . The averaging is done as follows:
8+
Hence, to accelerate the speed of Stochastic Gradient Descent, Averaged Stochastic Gradient Descent (ASGD) was proposed in Polyak and Juditsky (1992). For ASGD, the running average of parameters obtained by SGD, is used as the estimator for <img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/theta_star.gif"/><br/> . The averaging is done as follows:
99

10-
<img src="./images/asgd.gif" align="center"/><br/>
10+
<p align="center">
11+
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/asgd.gif"><br />
12+
</p>
1113

1214
We propose averaging for any optimizer similar to how ASGD performs it, as mentioned above.
1315

0 commit comments

Comments
 (0)