Skip to content

Commit 845618e

Browse files
author
Yancey
authored
Merge pull request #9068 from Yancey1989/large_model_design_doc
Add design doc for lookup remote table in Fluid
2 parents a0fefc2 + e343afb commit 845618e

File tree

3 files changed

+26
-0
lines changed

3 files changed

+26
-0
lines changed

doc/fluid/design/dist_train/distributed_lookup_table_design.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,32 @@ optimization algorithm $f$ runs on the storage service.
119119
- Con: the storage service needs to be able to run the optimization
120120
algorithm.
121121

122+
## Distributed Sparse Table in Fluid
123+
124+
For another design, we can implement a distributed sparse table in Fluid,
125+
and don't need to maintain an external storage component while training.
126+
127+
You may need to read Fluid [Distributed Training Architecture](./distributed_architecture.md)
128+
and [Parameter Server](./parameter_server.md) before going on.
129+
130+
![fluid lookup remote table](./src/fluid_lookup_remote_table.png)
131+
132+
Partition a large table into multiple pserver instances
133+
1. `DistributeTranspiler` would split the table partitioned into some small
134+
table blocks with some partitioned algorithms such as
135+
[RoundRobin](https://en.wikipedia.org/wiki/Round-robin_scheduling),
136+
[Hash](https://en.wikipedia.org/wiki/Hash) and etc...
137+
1. For some cases, the range of input `Ids` is very wide and unpredictable, so the sparse
138+
table would be able to fill a new value for the id that didn't appear before with
139+
zero, uniform random or Gaussian distribution.
140+
141+
For each Trainer's training process:
142+
1. In the forward pass, we use `pre-fetch` op to pre-fetch parameter blocks according to the
143+
input `Ids` from PServers instead of the local `lookup_table` op, and then merge the blocks
144+
into a parameter `W`.
145+
1. Compute `GRAD@W'` in the backward pass using the pre-fetched `W` and send it to PServer to
146+
execute the optimize pass.
147+
122148
## Conclusion
123149

124150
Let us do the "storage service does not optimize" solution first, as a
Binary file not shown.
317 KB
Loading

0 commit comments

Comments
 (0)