Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Commit 2f698f9

Browse files
committed
Update doc based on design review
1 parent 4c005e7 commit 2f698f9

File tree

1 file changed

+20
-15
lines changed

1 file changed

+20
-15
lines changed

rfcs/20200113-tf-data-service.md

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
# Distributed tf.data service
22

3-
| Status | Proposed |
3+
| Status | Accepted |
44
| :------------ | :------------------------------------------------------ |
55
| **RFC #** | [195](https://github.com/tensorflow/community/pull/195) |
66
| **Author(s)** | Andrew Audibert ([email protected]) Rohan Jain ([email protected]) |
77
| **Sponsor** | Jiri Simsa ([email protected]) |
8-
| **Updated** | 2019-01-24 |
8+
| **Updated** | 2019-01-30 |
99

1010
## Objective
1111

@@ -143,14 +143,16 @@ here to implement datasets which produce per-replica elements, enabling
143143
idiomatic control flow.
144144

145145
```python
146-
def tf.data.experimental.service.distribute(address):
146+
def tf.data.experimental.service.distribute(address_or_resolver):
147147
"""Marks that a dataset should be processed by the tf.data service.
148148
149149
ds = ... # dataset to distribute
150-
ds = ds.apply(tf.data.experimental.service.distribute(address))
150+
ds = ds.apply(
151+
tf.data.experimental.service.distribute(address_or_resolver))
151152
152153
Args:
153-
address: The address of the tf.data service master.
154+
address_or_resolver: The address of the tf.data service master, or a
155+
cluster resolver that can be used to determine the master address.
154156
155157
Returns:
156158
A function that can be passed to `dataset.apply()`.
@@ -622,22 +624,25 @@ service. We will also provide a tutorial for using the tf.data service.
622624
* How should we communicate that distributing a dataset will change the order
623625
in which elements are processed? If users' datasets rely on elements being
624626
processed in a certain order, they could face unpleasant surprises.
625-
- Current plan is to address this through documentation.
627+
- Final decision: Address this through documentation.
626628
* Should we support splitting `skip`, `take`, and `scan` by having them
627629
operate at a per-task level (e.g. skip or take the first `N` elements within
628630
each task)?
629-
- Leaning towards supporting these operations at a per-task level. This is
630-
consistent with how skip/take/scan behave today when using distribution
631-
strategies to distribute a dataset.
631+
- Final decision: Prohibit distributing these transformations, and tell
632+
users to instead use these transformations *after* applying the
633+
`distribute` transformation.
632634
* Is there a more user-friendly way to share iteration ids across consumers?
633635
Distribution strategy is well-equipped with collective ops to share the
634636
iteration ids, but sharing the iteration id could be a heavy burden for
635637
some users.
636-
- Distributing iteration ids is simple in the common case where a single
637-
process builds the graph. If users are advanced enough to do distributed
638-
training without distribution strategies, they will likely have a
639-
different mechanism available for distributing iteration ids.
638+
- Final decision: It is a reasonable expectation for users to either use
639+
distribution strategies, or distribute their own iteration ids.
640+
TensorFlow will soon have public APIs for collective operations that
641+
would make it easy to broadcast iteration ids.
640642
* Can `service.distribute` take a `ClusterResolver` so that the master
641643
hostname isn't baked into the dataset definition?
642-
- We can achieve this by having the `distribute` transformation take a
643-
master_address_or_resolver.
644+
- Final decision: Accept `master_address_or_resolver`, and wait to resolve
645+
the master address until iteration begins. The `ClusterResolver` will be
646+
stored in the Python `Dataset` object. In the future, we may want C++
647+
implementations of `ClusterResolver` so that we can represent the
648+
resolver within the dataset graph.

0 commit comments

Comments
 (0)