This repository was archived by the owner on Jul 10, 2025. It is now read-only.
File tree Expand file tree Collapse file tree 1 file changed +11
-3
lines changed Expand file tree Collapse file tree 1 file changed +11
-3
lines changed Original file line number Diff line number Diff line change @@ -241,9 +241,17 @@ the MVP this is designed for:
241241 training dataset and be able to store that data on disk. Otherwise, a
242242 snapshot will never get created.
243243
244- 2 . In case there are multiple workers and the dataset is sharded across
245- workers, we assume that the number of workers remains the same from one run
246- to another. If the number changes, we’ll trigger another snapshot.
244+ 2 . In the cases where there are multiple workers and the dataset is sharded with
245+ ` Dataset.shard ` , we assume that the number of workers remains the same from
246+ the initial (writing) run through to the reading runs.
247+
248+ If the number of workers change, then the ` num_shards ` parameter to
249+ ` Dataset.shard ` will change, and this will result in a different graph
250+ fingerprint and another snapshot write will be triggered.
251+
252+ If all workers use the exact same input pipeline with no sharding (e.g. all
253+ workers will read from all the files), then snapshot will still be able to
254+ read from previous snapshots even if the number of workers is different.
247255
2482563 . Any ` repeat ` s in the dataset should be moved to after the ` snapshot ` op, to
249257 avoid writing large (or infinite) amounts of data during a snapshot writing
You can’t perform that action at this time.
0 commit comments