Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Commit afec5f9

Browse files
committed
Improve doc clarity
1 parent 910474a commit afec5f9

File tree

1 file changed

+15
-5
lines changed

1 file changed

+15
-5
lines changed

rfcs/20200113-tf-data-service.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
| **RFC #** | [195](https://github.com/tensorflow/community/pull/195) |
66
| **Author(s)** | Andrew Audibert ([email protected]) Rohan Jain ([email protected]) |
77
| **Sponsor** | Jiri Simsa ([email protected]) |
8-
| **Updated** | 2019-01-09 |
8+
| **Updated** | 2019-01-24 |
99

1010
## Objective
1111

@@ -98,20 +98,20 @@ provides dataset elements to consumers over RPC.
9898
**Consumer**: A machine which consumes data from the tf.data service. The
9999
consumer may be attached to a GPU or TPU, or use data for on-CPU training.
100100

101-
#### Separate Cluster Architecture
101+
#### Option 1: Separate Cluster Architecture
102102

103103
Each server is run on a separate host from the TensorFlow cluster. This
104104
configuration gives users a way to provide horizontally scaling CPU for
105105
processing their input pipelines and quickly feeding data to accelerators.
106106

107-
#### Embedded Cluster Architecture
107+
#### Option 2: Embedded Cluster Architecture
108108

109109
Each TensorFlow server runs the tf.data worker gRPC service, and one server also
110110
runs the master gRPC service. This lets users leverage the tf.data service
111111
without needing to provision additional compute resources. and gives all the
112112
benefits of the tf.data service except for horizontal scaling.
113113

114-
#### Hybrid Architecture
114+
#### Option 3: Hybrid Architecture
115115

116116
Users could run tf.data workers embedded in their TensorFlow cluster, and also
117117
run additional tf.data workers (and potentially the tf.data master) outside the
@@ -136,6 +136,12 @@ code. The steps for distributed iteration over a dataset are
136136
5. Create per-consumer iterators using `make_iterator`, and use these iterators
137137
to read data from the tf.data service.
138138

139+
We move away from the idiomatic `for element in dataset` control flow because
140+
there is now an extra step when going from dataset to iterator: creating an
141+
iteration. A higher layer API such as tf.distribute may use the API presented
142+
here to implement datasets which produce per-replica elements, enabling
143+
idiomatic control flow.
144+
139145
```python
140146
def tf.data.experimental.service.distribute(address):
141147
"""Marks that a dataset should be processed by the tf.data service.
@@ -237,7 +243,9 @@ It will be entirely in C++, and we don't currently have any plans to expose
237243
splitting through Python.
238244

239245
The API focuses on producing and consuming `Split`s. A `Split` is a variant
240-
Tensor that can be subclassed to represent arbitrary types of splitting.
246+
Tensor that can be subclassed to represent arbitrary types of splitting. The
247+
`Split` base class is intentionally general so that subclasses have the
248+
flexibility to define splits however they like.
241249

242250
```cpp
243251
class Split {
@@ -614,6 +622,8 @@ service. We will also provide a tutorial for using the tf.data service.
614622
* How should we communicate that distributing a dataset will change the order
615623
in which elements are processed? If users' datasets rely on elements being
616624
processed in a certain order, they could face unpleasant surprises.
625+
* Should we support splitting `skip` and `take` by having them operate at a
626+
per-task level (skip or take the first `N` elements within each task)?
617627
* Is there a more user-friendly way to share iteration data across consumers?
618628
Distribution strategy is well-equipped with collective ops to share the
619629
iteration data, but sharing the iteration data could be a heavy burden for

0 commit comments

Comments
 (0)