You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 10, 2025. It is now read-only.
Copy file name to clipboardExpand all lines: rfcs/20190305-modular-tensorflow.md
+17-41Lines changed: 17 additions & 41 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@
11
11
12
12
TensorFlow is a very successful open source project. Since it has been open sourced, [1800+ contributors](https://github.com/tensorflow/tensorflow) have submitted code into TF from outside Google. However, as more and more developers contribute, it becomes more and more difficult to manage contributions in the single repository.
13
13
14
-
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs** that guarantee backwards compatibility. Thanks to the module APIs, these modules are now **managed/owned/released independently.**
14
+
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs**. Thanks to the module APIs, these modules are now **managed/owned/released independently.**
15
15
16
16
### Problems addressed
17
17
@@ -55,20 +55,27 @@ Having a monolithic repository means we need to rebuild all of our code for all
55
55
56
56
## Overview
57
57
58
-
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs** that guarantee backwards compatibility. Thanks to these APIs, these modules will be **managed/owned/released independently**. There will be different strategies to break apart pieces based on the languages, but below summarizes the approach for C++ and Python:
58
+
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs** that will evolve over time. Thanks to these APIs, these modules will be **managed/owned/released independently**. There will be different strategies to break apart pieces based on the languages, but below summarizes the approach for C++ and Python:
59
59
60
60
61
61

62
62
63
63
A summary of the above is:
64
64
65
-
66
-
67
65
* Core TF functionality will be implemented in C++
68
66
* Core TF functionality can be extended using shared objects.
69
67
* On top of the core C++ libraries, we will have the language bindings (Using the C API)
70
68
* There can be more functionality built on top of the core TF bindings in different languages, which can be maintained and distributed separately.
71
-
* All different pieces need to use Stable public APIs with backwards compatibility guarantees.
69
+
* All different pieces need to use well defined public APIs.
70
+
71
+
A few important points to clarify above are:
72
+
73
+
* We will try our best to make sure the APIs will stay as close as possible to
74
+
the current APIs.
75
+
* We are aiming to avoid needing to change most existing custom op and kernel
76
+
code.
77
+
* The APIs will evolve over time. We will modify the APIs based on our and
78
+
user's needs. These modifications are expected to reduce in frequency.
72
79
73
80
74
81
### Definitions
@@ -90,7 +97,7 @@ This project aims to implement similar plugin architectures for multiple compone
90
97
91
98
1. Networking module, with verbs, gdr plugins initially
92
99
1. Filesystems module, with GCP, AWS and HDFS support
93
-
1. Kernels module,
100
+
1. Kernels module,
94
101
1. Optimizers/Graph rewrite module,
95
102
1. Accelerator backends module
96
103
@@ -285,26 +292,11 @@ This section will describe the key design points for modular Python packages for
285
292
286
293
Contains the base Python API, and "Core TF" C++ shared objects
287
294
288
-
This package will be a subset of the current "tensorflow" pip package. It will include all of the core TF API except the high level API modules we will split up. It will define a public API for everything except for the required add on packages. This API is required to have backwards compatibility guarantees for minor version changes. With this guarantee, we expect the following:
289
-
290
-
291
-
_"Given that the combination of these packages work: TF-base 1.n, and addon package 1.m work together, TF-base 1.(n+k) and add on package 1.m should always work together."_
292
-
293
-
If we discover a violation of this guarantee, that will be treated as a P1 bug, and it will require a patch release for the base package 1.(n+k)
294
-
295
+
This package will be a subset of the current "tensorflow" pip package. It will include all of the core TF API except the high level API modules we will split up. It will define a public API for everything except for the required add on packages.
295
296
296
297
### Required tensorflow addons
297
298
298
-
These packages are planned to contain high level TF functionality that can be safely split up from TF. Examples for these are tensorboard, estimator and keras. Together with the base TF package, these packages will contain the full Python code of TF, except for top level API wiring.
299
-
300
-
These packages have two constraints:
301
-
302
-
303
-
304
-
1. They are only allowed to use public APIs exposed by their dependencies.
305
-
1. They are required to provide backwards compatible public APIs.
306
-
307
-
With the backwards compatible public APIs, we expect addons to be able to release independently as long as features they depend on are released in their dependencies.
299
+
These packages are planned to contain high level TF functionality that can be safely split up from TF. Examples for these are tensorboard, estimator and keras. Together with the base TF package, these packages will contain the full Python code of TF, except for top level API wiring. As like any addons, these are only allowed to use public APIs exposed by their dependencies.
308
300
309
301
These packages will have full control over the versions of their dependencies. We recommend they only set a minimum version for their dependencies. When they need new features, they will bump their minimum requirement to include the new API changes.
310
302
@@ -342,19 +334,7 @@ TENSORFLOW_DEPENDENCIES= [
342
334
343
335
### TF Public APIs
344
336
345
-
As a part of the modularization, to be able to decouple development and releases for each of these packages, each package is required to expose a **public API with backwards compatibility guarantees**. What this means is, no API symbols in the public API cannot be changed in a backwards incompatible way, syntactically or semantically, between any minor versions. Below is a toy example of two packages explaining the guarantees we expect:
346
-
347
-
348
-

349
-
350
-
351
-
352
-
* P1 depends on P2
353
-
* P2 is expected to provide a public API
354
-
* All API symbols exposed by P2 version M.N is expected to work at version M.(N+K) for any non-negative integer K.
355
-
* P2 is allowed to make breaking changes to its API between major releases (M to M+1)
356
-
* If P1 version X.Y works with P2 version M.N, it should also work the same way with P2 version M.(N+K) However, there are no guarantees for it to work with P2 version (M+K).L
357
-
* When P1 is releasing a new version, it should check which API symbols it needs from P2, and fix the minimum version requirement in its pip package for P2 accordingly.
337
+
As a part of the modularization, to be able to decouple development and releases for each of these packages, each package is required to expose a **well defined, well documented public API**.
358
338
359
339
360
340
### Optional TF packages
@@ -363,17 +343,13 @@ Mostly expected to contain the C++ plugins defined in the previous section. Thes
363
343
364
344
These shared objects will be automatically loaded by TF core if:
365
345
366
-
367
-
368
346
* They correctly define the compatibility strings using `TF_PLATFORM_STRINGS`
369
347
* They are compatible with the system tf core is running on
370
348
* They have been properly built and signed (unless running in developer mode)
371
349
372
350
373
351
## Alternatives / Potential Issues
374
352
375
-
376
-
377
353
***Why do we not use C++ APIs instead of C**: Compilers have no guarantees for ABIs generated for C++ code. Any C++ API used will require each shared object to be compiled with the same compiler, using the same version of the compiler, with the same compiler flags ([See github issue 23561](https://github.com/tensorflow/tensorflow/issues/23561)).
378
354
***Why do not we statically link everything**: Single shared object for everything: Anywhere except google does not have access to the massively parallel build system we use here at google. This causes prohibitive build times, causing major developer pain for open source developers. There are many more issues, but the summary is while this is a great solution for google, outside google this is simply infeasible.
379
355
***TF will become a suite of multiple packages, built by multiple authorities. What if the bugs get blamed on TF team**: With the modular model, we expect testing of 3rd party code to become easier. This can also be mitigated if the error messages are better, and if they can clearly point out which module the issue stems from. Finally, we can create an apple-swift like testing model, where we run a Jenkins setup that people can donate their machines to, and we can run continuous integration tests on their plugins.
@@ -439,7 +415,7 @@ To summarize the above timeline:
439
415
440
416
* Different packages set their own release cadences
441
417
* Each package will set version boundaries for each of their dependencies.
442
-
* Each package is responsible for ensuring that all of their public APIs are working without any changes until the next major release
418
+
* Each package is responsible for ensuring that all of their public APIs are working as promised.
443
419
* Packages do not need to modify the minimum version requirements unless they start using newly introduced public API symbols.
444
420
* TF metapackage releases may choose to hold back individual packages in favor of faster releases. But dependency requirements have to be respected when doing so.
0 commit comments