Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Commit 6faac9b

Browse files
committed
Relax API requirements.
1 parent b42798f commit 6faac9b

File tree

1 file changed

+17
-41
lines changed

1 file changed

+17
-41
lines changed

rfcs/20190305-modular-tensorflow.md

Lines changed: 17 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
TensorFlow is a very successful open source project. Since it has been open sourced, [1800+ contributors](https://github.com/tensorflow/tensorflow) have submitted code into TF from outside Google. However, as more and more developers contribute, it becomes more and more difficult to manage contributions in the single repository.
1313

14-
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs** that guarantee backwards compatibility. Thanks to the module APIs, these modules are now **managed/owned/released independently.**
14+
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs**. Thanks to the module APIs, these modules are now **managed/owned/released independently.**
1515

1616
### Problems addressed
1717

@@ -55,20 +55,27 @@ Having a monolithic repository means we need to rebuild all of our code for all
5555

5656
## Overview
5757

58-
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs** that guarantee backwards compatibility. Thanks to these APIs, these modules will be **managed/owned/released independently**. There will be different strategies to break apart pieces based on the languages, but below summarizes the approach for C++ and Python:
58+
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs** that will evolve over time. Thanks to these APIs, these modules will be **managed/owned/released independently**. There will be different strategies to break apart pieces based on the languages, but below summarizes the approach for C++ and Python:
5959

6060

6161
![alt_text](20190305-modular-tensorflow/big_picture.png "Overview of modular TensorFlow")
6262

6363
A summary of the above is:
6464

65-
66-
6765
* Core TF functionality will be implemented in C++
6866
* Core TF functionality can be extended using shared objects.
6967
* On top of the core C++ libraries, we will have the language bindings (Using the C API)
7068
* There can be more functionality built on top of the core TF bindings in different languages, which can be maintained and distributed separately.
71-
* All different pieces need to use Stable public APIs with backwards compatibility guarantees.
69+
* All different pieces need to use well defined public APIs.
70+
71+
A few important points to clarify above are:
72+
73+
* We will try our best to make sure the APIs will stay as close as possible to
74+
the current APIs.
75+
* We are aiming to avoid needing to change most existing custom op and kernel
76+
code.
77+
* The APIs will evolve over time. We will modify the APIs based on our and
78+
user's needs. These modifications are expected to reduce in frequency.
7279

7380

7481
### Definitions
@@ -90,7 +97,7 @@ This project aims to implement similar plugin architectures for multiple compone
9097

9198
1. Networking module, with verbs, gdr plugins initially
9299
1. Filesystems module, with GCP, AWS and HDFS support
93-
1. Kernels module,
100+
1. Kernels module,
94101
1. Optimizers/Graph rewrite module,
95102
1. Accelerator backends module
96103

@@ -285,26 +292,11 @@ This section will describe the key design points for modular Python packages for
285292

286293
Contains the base Python API, and "Core TF" C++ shared objects
287294

288-
This package will be a subset of the current "tensorflow" pip package. It will include all of the core TF API except the high level API modules we will split up. It will define a public API for everything except for the required add on packages. This API is required to have backwards compatibility guarantees for minor version changes. With this guarantee, we expect the following:
289-
290-
291-
_"Given that the combination of these packages work: TF-base 1.n, and addon package 1.m work together, TF-base 1.(n+k) and add on package 1.m should always work together."_
292-
293-
If we discover a violation of this guarantee, that will be treated as a P1 bug, and it will require a patch release for the base package 1.(n+k)
294-
295+
This package will be a subset of the current "tensorflow" pip package. It will include all of the core TF API except the high level API modules we will split up. It will define a public API for everything except for the required add on packages.
295296

296297
### Required tensorflow addons
297298

298-
These packages are planned to contain high level TF functionality that can be safely split up from TF. Examples for these are tensorboard, estimator and keras. Together with the base TF package, these packages will contain the full Python code of TF, except for top level API wiring.
299-
300-
These packages have two constraints:
301-
302-
303-
304-
1. They are only allowed to use public APIs exposed by their dependencies.
305-
1. They are required to provide backwards compatible public APIs.
306-
307-
With the backwards compatible public APIs, we expect addons to be able to release independently as long as features they depend on are released in their dependencies.
299+
These packages are planned to contain high level TF functionality that can be safely split up from TF. Examples for these are tensorboard, estimator and keras. Together with the base TF package, these packages will contain the full Python code of TF, except for top level API wiring. As like any addons, these are only allowed to use public APIs exposed by their dependencies.
308300

309301
These packages will have full control over the versions of their dependencies. We recommend they only set a minimum version for their dependencies. When they need new features, they will bump their minimum requirement to include the new API changes.
310302

@@ -342,19 +334,7 @@ TENSORFLOW_DEPENDENCIES= [
342334

343335
### TF Public APIs
344336

345-
As a part of the modularization, to be able to decouple development and releases for each of these packages, each package is required to expose a **public API with backwards compatibility guarantees**. What this means is, no API symbols in the public API cannot be changed in a backwards incompatible way, syntactically or semantically, between any minor versions. Below is a toy example of two packages explaining the guarantees we expect:
346-
347-
348-
![alt_text](20190305-modular-tensorflow/simple_package_deps.png "Just two example packages.")
349-
350-
351-
352-
* P1 depends on P2
353-
* P2 is expected to provide a public API
354-
* All API symbols exposed by P2 version M.N is expected to work at version M.(N+K) for any non-negative integer K.
355-
* P2 is allowed to make breaking changes to its API between major releases (M to M+1)
356-
* If P1 version X.Y works with P2 version M.N, it should also work the same way with P2 version M.(N+K) However, there are no guarantees for it to work with P2 version (M+K).L
357-
* When P1 is releasing a new version, it should check which API symbols it needs from P2, and fix the minimum version requirement in its pip package for P2 accordingly.
337+
As a part of the modularization, to be able to decouple development and releases for each of these packages, each package is required to expose a **well defined, well documented public API**.
358338

359339

360340
### Optional TF packages
@@ -363,17 +343,13 @@ Mostly expected to contain the C++ plugins defined in the previous section. Thes
363343

364344
These shared objects will be automatically loaded by TF core if:
365345

366-
367-
368346
* They correctly define the compatibility strings using `TF_PLATFORM_STRINGS`
369347
* They are compatible with the system tf core is running on
370348
* They have been properly built and signed (unless running in developer mode)
371349

372350

373351
## Alternatives / Potential Issues
374352

375-
376-
377353
* **Why do we not use C++ APIs instead of C**: Compilers have no guarantees for ABIs generated for C++ code. Any C++ API used will require each shared object to be compiled with the same compiler, using the same version of the compiler, with the same compiler flags ([See github issue 23561](https://github.com/tensorflow/tensorflow/issues/23561)).
378354
* **Why do not we statically link everything**: Single shared object for everything: Anywhere except google does not have access to the massively parallel build system we use here at google. This causes prohibitive build times, causing major developer pain for open source developers. There are many more issues, but the summary is while this is a great solution for google, outside google this is simply infeasible.
379355
* **TF will become a suite of multiple packages, built by multiple authorities. What if the bugs get blamed on TF team**: With the modular model, we expect testing of 3rd party code to become easier. This can also be mitigated if the error messages are better, and if they can clearly point out which module the issue stems from. Finally, we can create an apple-swift like testing model, where we run a Jenkins setup that people can donate their machines to, and we can run continuous integration tests on their plugins.
@@ -439,7 +415,7 @@ To summarize the above timeline:
439415

440416
* Different packages set their own release cadences
441417
* Each package will set version boundaries for each of their dependencies.
442-
* Each package is responsible for ensuring that all of their public APIs are working without any changes until the next major release
418+
* Each package is responsible for ensuring that all of their public APIs are working as promised.
443419
* Packages do not need to modify the minimum version requirements unless they start using newly introduced public API symbols.
444420
* TF metapackage releases may choose to hold back individual packages in favor of faster releases. But dependency requirements have to be respected when doing so.
445421
* Major releases still need to be coordinated.

0 commit comments

Comments
 (0)