@@ -32,7 +32,7 @@ Status | Proposed
3232# Motivation
3333
3434## Interoperability of TFX libraries
35- TFX offers a portfolio of libraries, including
35+ TFX offers a portfolio of libraries, including
3636[ TFT] ( https://github.com/tensorflow/transform ) ,
3737[ TFDV] ( https://github.com/tensorflow/data-validation ) and
3838[ TFMA] ( https://github.com/tensorflow/model-analysis ) . These libraries can be
@@ -169,6 +169,13 @@ The in-memory representation should:
169169 This type of workload is also significant in TFX, both in terms of CPU
170170 cycles and number of examples processed.
171171
172+ Note that TFX libraries don't always need to run TF graphs. For example,
173+ TFDV, despite of its name, only analyzes the training data and (almost) does
174+ not call any TF API. Another example, TFMA, will support "blackbox"
175+ evaluation where the model being evaluated does not have to be a TF model.
176+ Therefore a TF-neutral in-memory representation that works well with plain
177+ Python code is desirable.
178+
172179* Be interoperable with the rest of the world.
173180
174181 The OSS world should be able to use TFX components with little effort on
@@ -680,7 +687,10 @@ Remarks:
6806872. Only when the backing buffers are aligned correctly. Currently both TF
681688 and Apache Arrow has 64-byte alignment. And this can be enforced by
682689 implementing our own Arrow MemoryPool wrapping a TF allocator.
683- 3. See the comparison in [this colab notebook](https://colab.sandbox.google.com/drive/1SCQs88J4Tc6HKk2AfpYvELaJDxa4h4IQ).
690+ [This colab notebook](https://colab.research.google.com/drive/1bM8gso7c8x4UXx5htDM4N1KUSTuRvIFL)
691+ shows that as long as the memory alignment is the same, feeding TF with an
692+ Arrow Array has very little overhead.
693+ 3. See the comparison in [this colab notebook](https://colab.research.google.com/drive/1CvDjZCH3GQE8iojCmRHPuSqLTw8KgNf3).
684694 * It’s worth calling out that Arrow is meant to be a data analysis library
685695 and better data analysis support (for example, support for a “group-by”
686696 clause) will be added over time.
0 commit comments