PaddlePaddle
diff --git a/‎doc/fluid/design/dist_train/distributed_lookup_table_design.md
Lines changed: 27 additions & 0 deletions b/‎doc/fluid/design/dist_train/distributed_lookup_table_design.md
Lines changed: 27 additions & 0 deletions
diff --git a/‎doc/fluid/design/dist_train/prefetch_parameter.md
Lines changed: 0 additions & 50 deletions b/‎doc/fluid/design/dist_train/prefetch_parameter.md
Lines changed: 0 additions & 50 deletions
diff --git a/‎doc/fluid/design/dist_train/src/fluid_lookup_remote_table.graffle
13.7 KB b/‎doc/fluid/design/dist_train/src/fluid_lookup_remote_table.graffle
13.7 KB
diff --git a/‎doc/fluid/design/dist_train/src/fluid_lookup_remote_table.png
317 KB b/‎doc/fluid/design/dist_train/src/fluid_lookup_remote_table.png
317 KB
diff --git a/‎doc/fluid/design/dist_train/src/lookup_local_table.graffle
-11 KB b/‎doc/fluid/design/dist_train/src/lookup_local_table.graffle
-11 KB
diff --git a/‎doc/fluid/design/dist_train/src/lookup_local_table.png
-297 KB b/‎doc/fluid/design/dist_train/src/lookup_local_table.png
-297 KB
diff --git a/‎doc/fluid/design/dist_train/src/lookup_remote_table.graffle
-10.4 KB b/‎doc/fluid/design/dist_train/src/lookup_remote_table.graffle
-10.4 KB
diff --git a/‎doc/fluid/design/dist_train/src/lookup_remote_table.png
-284 KB b/‎doc/fluid/design/dist_train/src/lookup_remote_table.png
-284 KB
diff --git a/‎doc/fluid/design/dist_train/src/split_parameter.graffle
-6.9 KB b/‎doc/fluid/design/dist_train/src/split_parameter.graffle
-6.9 KB
diff --git a/‎doc/fluid/design/dist_train/src/split_parameter.png
-76.9 KB b/‎doc/fluid/design/dist_train/src/split_parameter.png
-76.9 KB
@@ -119,6 +119,33 @@ optimization algorithm $f$ runs on the storage service.
 - Con: the storage service needs to be able to run the optimization
   algorithm.
 
+## Distributed Sparse Table in Fluid
+
+For another design, we can implement a distributed sparse table in Fluid,
+and don't need to maintain an external storage component while training.
+
+Prior to reading this design, it would be useful for the reader to make themselves
+familiar with Fluid [Distributed Training Architecture](./distributed_architecture.md)
+and [Parameter Server](./parameter_server.md).
+
+![fluid lookup remote table](./src/fluid_lookup_remote_table.png)
+
+Partition a large table into multiple pserver instances
+1. `DistributeTranspiler` would split the table partitioned into some small
+table blocks with some partitioned algorithms such as
+[RoundRobin](https://en.wikipedia.org/wiki/Round-robin_scheduling),
+[Hash](https://en.wikipedia.org/wiki/Hash) and etc...
+1. For some cases, the range of input `Ids` is very wide and unpredictable, so the sparse
+table would be able to fill a new value for the id that didn't appear before with
+zero, uniform random or Gaussian distribution.
+
+For each Trainer's training process:
+1. In the forward pass, we use `pre-fetch` op to pre-fetch parameter blocks according to the
+input `Ids` from PServers instead of the local `lookup_table` op, and then merge the blocks
+into a parameter `W`.
+1. Compute `GRAD@W'` in the backward pass using the pre-fetched `W` and send it to PServer to
+execute the optimize pass.
+
 ## Conclusion
 
 Let us do the "storage service does not optimize" solution first, as a