@@ -119,6 +119,32 @@ optimization algorithm $f$ runs on the storage service.
119
119
- Con: the storage service needs to be able to run the optimization
120
120
algorithm.
121
121
122
+ ## Distributed Sparse Table in Fluid
123
+
124
+ For another design, we can implement a distributed sparse table in Fluid,
125
+ and don't need to maintain an external storage component while training.
126
+
127
+ You may need to read Fluid [ Distributed Training Architecture] ( ./distributed_architecture.md )
128
+ and [ Parameter Server] ( ./parameter_server.md ) before going on.
129
+
130
+ ![ fluid lookup remote table] ( ./src/fluid_lookup_remote_table.png )
131
+
132
+ Partition a large table into multiple pserver instances
133
+ 1 . ` DistributeTranspiler ` would split the table partitioned into some small
134
+ table blocks with some partitioned algorithms such as
135
+ [ RoundRobin] ( https://en.wikipedia.org/wiki/Round-robin_scheduling ) ,
136
+ [ Hash] ( https://en.wikipedia.org/wiki/Hash ) and etc...
137
+ 1 . For some cases, the range of input ` Ids ` is very wide and unpredictable, so the sparse
138
+ table would be able to fill a new value for the id that didn't appear before with
139
+ zero, uniform random or Gaussian distribution.
140
+
141
+ For each Trainer's training process:
142
+ 1 . In the forward pass, we use ` pre-fetch ` op to pre-fetch parameter blocks according to the
143
+ input ` Ids ` from PServers instead of the local ` lookup_table ` op, and then merge the blocks
144
+ into a parameter ` W ` .
145
+ 1 . Compute ` GRAD@W' ` in the backward pass using the pre-fetched ` W ` and send it to PServer to
146
+ execute the optimize pass.
147
+
122
148
## Conclusion
123
149
124
150
Let us do the "storage service does not optimize" solution first, as a
0 commit comments