Skip to content

Commit 5d7c4c2

Browse files
rchaocopybara-github
authored andcommitted
Update parameter server strategy doc to use legacy optimizer in order to use a constant learning rate.
PiperOrigin-RevId: 531325144
1 parent ba87988 commit 5d7c4c2

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

site/en/tutorials/distribute/parameter_server_training.ipynb

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1292,7 +1292,11 @@
12921292
"One common reason is that the parameter servers have unbalanced load and some heavily-loaded parameter servers have reached capacity. There can also be multiple root causes. Some simple methods to mitigate this issue are to:\n",
12931293
"\n",
12941294
"1. Shard your large model variables via specifying a `variable_partitioner` when constructing a `ParameterServerStrategy`.\n",
1295-
"2. Avoid creating a hotspot variable that is required by all parameter servers in a single step if possible. For example, use a constant learning rate or subclass `tf.keras.optimizers.schedules.LearningRateSchedule` in optimizers since the default behavior is that the learning rate will become a variable placed on a particular parameter server and requested by all other parameter servers in each step.\n",
1295+
"2. Avoid creating a hotspot variable that is required by all parameter servers in a single step, by both:\n",
1296+
"\n",
1297+
" 1) Using a constant learning rate or subclass `tf.keras.optimizers.schedules.LearningRateSchedule` in optimizers. This is because the default behavior is that the learning rate will become a variable placed on a particular parameter server, and requested by all other parameter servers in each step); and\n",
1298+
"\n",
1299+
" 2) Using a `tf.keras.optimizers.legacy.Optimizer` (the standard `tf.keras.optimizers.Optimizer`s could still lead to hotspot variables).\n",
12961300
"3. Shuffle your large vocabularies before passing them to Keras preprocessing layers.\n",
12971301
"\n",
12981302
"Another possible reason for performance issues is the coordinator. The implementation of `schedule`/`join` is Python-based and thus may have threading overhead. Also, the latency between the coordinator and the workers can be large. If this is the case:\n",

0 commit comments

Comments
 (0)