@@ -119,6 +119,33 @@ optimization algorithm $f$ runs on the storage service.
119
119
- Con: the storage service needs to be able to run the optimization
120
120
algorithm.
121
121
122
+ ## Distributed Sparse Table in Fluid
123
+
124
+ For another design, we can implement a distributed sparse table in Fluid,
125
+ and don't need to maintain an external storage component while training.
126
+
127
+ Prior to reading this design, it would be useful for the reader to make themselves
128
+ familiar with Fluid [ Distributed Training Architecture] ( ./distributed_architecture.md )
129
+ and [ Parameter Server] ( ./parameter_server.md ) .
130
+
131
+ ![ fluid lookup remote table] ( ./src/fluid_lookup_remote_table.png )
132
+
133
+ Partition a large table into multiple pserver instances
134
+ 1 . ` DistributeTranspiler ` would split the table partitioned into some small
135
+ table blocks with some partitioned algorithms such as
136
+ [ RoundRobin] ( https://en.wikipedia.org/wiki/Round-robin_scheduling ) ,
137
+ [ Hash] ( https://en.wikipedia.org/wiki/Hash ) and etc...
138
+ 1 . For some cases, the range of input ` Ids ` is very wide and unpredictable, so the sparse
139
+ table would be able to fill a new value for the id that didn't appear before with
140
+ zero, uniform random or Gaussian distribution.
141
+
142
+ For each Trainer's training process:
143
+ 1 . In the forward pass, we use ` pre-fetch ` op to pre-fetch parameter blocks according to the
144
+ input ` Ids ` from PServers instead of the local ` lookup_table ` op, and then merge the blocks
145
+ into a parameter ` W ` .
146
+ 1 . Compute ` GRAD@W' ` in the backward pass using the pre-fetched ` W ` and send it to PServer to
147
+ execute the optimize pass.
148
+
122
149
## Conclusion
123
150
124
151
Let us do the "storage service does not optimize" solution first, as a
0 commit comments