Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Commit 230783e

Browse files
committed
replaced colab links with publicly accessible ones and clarify about TFX components does not always need to run TF.
1 parent 273c62a commit 230783e

File tree

1 file changed

+12
-2
lines changed

1 file changed

+12
-2
lines changed

rfcs/20191017-tfx-standardized-inputs.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Status | Proposed
3232
# Motivation
3333

3434
## Interoperability of TFX libraries
35-
TFX offers a portfolio of libraries, including
35+
TFX offers a portfolio of libraries, including
3636
[TFT](https://github.com/tensorflow/transform),
3737
[TFDV](https://github.com/tensorflow/data-validation) and
3838
[TFMA](https://github.com/tensorflow/model-analysis). These libraries can be
@@ -169,6 +169,13 @@ The in-memory representation should:
169169
This type of workload is also significant in TFX, both in terms of CPU
170170
cycles and number of examples processed.
171171

172+
Note that TFX libraries don't always need to run TF graphs. For example,
173+
TFDV, despite of its name, only analyzes the training data and (almost) does
174+
not call any TF API. Another example, TFMA, will support "blackbox"
175+
evaluation where the model being evaluated does not have to be a TF model.
176+
Therefore a TF-neutral in-memory representation that works well with plain
177+
Python code is desirable.
178+
172179
* Be interoperable with the rest of the world.
173180

174181
The OSS world should be able to use TFX components with little effort on
@@ -680,7 +687,10 @@ Remarks:
680687
2. Only when the backing buffers are aligned correctly. Currently both TF
681688
and Apache Arrow has 64-byte alignment. And this can be enforced by
682689
implementing our own Arrow MemoryPool wrapping a TF allocator.
683-
3. See the comparison in [this colab notebook](https://colab.sandbox.google.com/drive/1SCQs88J4Tc6HKk2AfpYvELaJDxa4h4IQ).
690+
[This colab notebook](https://colab.research.google.com/drive/1bM8gso7c8x4UXx5htDM4N1KUSTuRvIFL)
691+
shows that as long as the memory alignment is the same, feeding TF with an
692+
Arrow Array has very little overhead.
693+
3. See the comparison in [this colab notebook](https://colab.research.google.com/drive/1CvDjZCH3GQE8iojCmRHPuSqLTw8KgNf3).
684694
* It’s worth calling out that Arrow is meant to be a data analysis library
685695
and better data analysis support (for example, support for a “group-by”
686696
clause) will be added over time.

0 commit comments

Comments
 (0)