@@ -28,10 +28,10 @@ the parameter `w1` as an example to introduce the steps:
28
28
1 . For each gradient variables, they may distribute on different GPU card and aggregate
29
29
them while they are all calculated.
30
30
1 . Split the gradient variable into multiple blocks according to the number of PServer
31
- instances and then sent them.
31
+ instances and then send them.
32
32
1 . PServer would run an ` Optimize Block ` using a specified optimize algorithm to update
33
33
the specified parameter.
34
- 1 . The trainer will fetch the parameter before running forward Op depends on the specified
34
+ 1 . The trainer will fetch the parameter before running forward Op which depends on the specified
35
35
parameter.
36
36
1 . Broadcast the received variable into multiple GPU cards and continue to run the next
37
37
mini-batch.
@@ -44,15 +44,15 @@ send the gradient variables to the multiple PServer instances.
44
44
- Schedule ` FetchVars ` operator to fetch the latest parameter from PServer before running
45
45
the forward ops.
46
46
- There could be a large number of gradient variables to be sent, so we need to use another
47
- thread pool(IO Threadpool) which a number of the schedulable threads is larger than the
47
+ thread pool(IO Threadpool) whose a number of the schedulable threads is larger than the
48
48
computing thread pool to avoid competitive the thread resources with computing.
49
49
50
50
### Parameter Server
51
51
52
52
<img src =" ./src/async_pserver.png " width =" 750 " />
53
53
54
54
- There should be multiple trainer instances want to optimize the same parameter at
55
- the same time, to avoid the pollution , we need one ` BlockingQueue ` for each gradient
55
+ the same time, to avoid the racing , we need one ` BlockingQueue ` for each gradient
56
56
variable to process them one by one.
57
57
- We need a ` Map ` structure to map a gradient variable name to the ` OptimizeBlock ` which
58
58
can optimize the respective parameter.
0 commit comments