You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 10, 2025. It is now read-only.
Allow users to enable determinism behavior in TensorFlow. This means if the user runs a TensorFlow program multiple times, the model outputs and weights will be the same each time. Determinism will be supported on CPUs and GPUs.
12
12
13
13
To get deterministic behavior, users must do the following:
14
14
15
-
* Enable determinism using the API proposed in this doc
16
-
* Use same hardware every run
17
-
* Use the same software environment every run (OS, checkpoints, version of TF, environmental variables, etc).
18
-
* Not use constructs outside TensorFlow that are nondeterministic, such as Python’s `random` module or using multiple threads/processes in ways that influence TensorFlow’s behavior
15
+
* Enable determinism using the API proposed in this doc.
16
+
* Use same hardware in every run.
17
+
* Use the same software environment every run (OS, checkpoints, version of TF, environmental variables, etc).
18
+
* Not use constructs outside TensorFlow that are nondeterministic, such as Python’s `random` module or using multiple threads/processes in ways that influence TensorFlow’s behavior.
19
19
20
20
## Motivation
21
-
There are several mission critical applications in life-sciences, finance and automation that require deterministic behavior. Determinism is required so that the behavior of these applications can be accurately predicted & demonstrated in a variety of scenarios.
21
+
There are several mission critical applications in lifesciences, finance and automation that require deterministic behavior. Determinism is required so that the behavior of these applications can be accurately predicted & demonstrated in a variety of scenarios.
22
22
23
23
Lack of determinism prevents companies from launching products using models developed in TF. For a subset of these industries having deterministic behavior is a regulatory requirement.
24
24
25
-
In addition, it increases model velocity development by reducing noise, while also simplifying the debugging workflow.
25
+
In addition, determinism increases model velocity development by reducing noise, while also simplifying the debugging workflow.
26
26
27
27
## Design Proposal
28
28
We will create a new flag with the default value of ‘False’ which enables determinism. We will define 2 functions:
@@ -32,7 +32,7 @@ We will create a new flag with the default value of ‘False’ which enables de
32
32
33
33
The first function takes in a boolean value, and allows the model developer to enable/disable determinism. The second function returns a bool indicating whether determinism is enabled.
34
34
In some cases, we have deterministic and nondeterministic versions of the kernel. In such cases, we will use this flag to run the appropriate kernels.
35
-
For ops which do not yet have a deterministic implementation, TensorFlow will raise an error`tf.errors.UnimplementedError` if the flag is enabled.
35
+
For ops which do not yet have a deterministic implementation, TensorFlow will raise a`tf.errors.UnimplementedError` if the flag is enabled.
36
36
37
37
Enabling deterministic execution does not automatically cause a user’s program to become deterministic. If users use nondeterministic constructs outside TensorFlow, such as threads/process, in ways that influence TensorFlow’s behavior, their program will not be deterministic. In order for a user to ensure their program is deterministic, users must both enable deterministic execution within TensorFlow and remove any sources of nondeterminism outside TensorFlow.
38
38
@@ -42,14 +42,16 @@ Multiple environmental variables exist today that control determinism. As part o
42
42
* TF_DETERMINISTIC_OPS
43
43
* TF_CUDNN_DETERMINISTIC
44
44
45
-
tf.data also has flags for determinism. The system will throw an error message if flags are out of sync i.e. if deterministic_execution_enabled is enabled but if the tf.data option is set to ‘false’, we will throw an error. (`tf.data.Options.experimental_deterministic`). We’ll also add the necessary checks for Dataset.map and Dataset.interleave.
45
+
tf.data also has flags for determinism. The system will throw an error message if flags are out of sync i.e. if deterministic_execution_enabled is enabled but if the tf.data option is set to ‘false’, we will throw an error. (`tf.data.Options.experimental_deterministic`). We’ll also add the necessary checks for Dataset.map and Dataset.interleave. See the [Random ops](#random-ops) section for how random Datasets, such as `tf.data.experimental.RandomDataset`, are handled.
46
46
47
47
### Grappler changes
48
-
Grappler graph optimizations may add nondeterministic behavior. In particular some optimizations will time out if they take too long to run. When determinism is enabled, these time outs will be disabled.
48
+
Grappler graph optimizations may add nondeterministic behavior. In particular some optimizations will time out if they take too long to run. When determinism is enabled, these timeouts will be disabled.
49
49
50
50
### Random ops
51
51
Legacy random ops, such as `tf.random.normal`, are not deterministic if no seed is set, and so such ops will raise an error when determinism is enabled. To fix, the user should set a global seed with `tf.random.set_seed`. Since most models use legacy random ops, in practice users must call `tf.random.set_seed` when enabling deterministic behavior. Alternatively, users can pass a seed to every individual random operation, but doing so is more inconvenient.
52
52
53
+
Certain random ops, such as `tf.image.sample_distorted_bounding_box` and `tf.nn.fractional_max_pool`, ignore the global seed if a seed is not explicitly passed. For such ops, setting the global seed is not enough to avoid the error, so users must pass a seed directly to the op.
54
+
53
55
As for TensorFlow 2 random number generation, `tf.random.Generator.from_non_deterministic_state` will raise an error if called when determinism is enabled. In such cases, users should check if determinism is enabled and if so, use a different generator from a deterministic source. `tf.random.get_global_generator` implicitly calls `from_non_deterministic_state` if no global generator is set, and so will also raise an error if a global generator is not set with `tf.random.set_global_generator`.
54
56
55
57
Stateless random functions, such as `tf.random.stateless_normal`, are always deterministic and so will never raise determinism-related errors.
@@ -62,11 +64,11 @@ No error will be raised if a random op or generator is run before determinism is
62
64
Use of parameter servers adds nondeterministic behavior. In case a model constructs a ParameterServerStrategy, TensorFlow will throw an error. We’ll also document this in the documentation for the flag.
63
65
64
66
### Op Review and changes
65
-
As part of the implementation, we will review all Ops to make a determination of their behaviour (deterministic vs nondeterministic). Some of the Ops that are known to be nondeterministic, at least when running on a GPU, include:
67
+
As part of the implementation, we will review all ops to make a determination of their behavior (deterministic vs nondeterministic). Some of the ops that are known to be nondeterministic, at least when running on a GPU, include:
66
68
67
-
*`tf.nn.softmax_cross_entropy_with_logits`
68
-
*`tf.nn.sparse_softmax_cross_entropy_with_logits`
69
-
*`tf.image.resize` with method=ResizeMethod.NEAREST
69
+
*`tf.nn.softmax_cross_entropy_with_logits`
70
+
*`tf.nn.sparse_softmax_cross_entropy_with_logits`
71
+
*`tf.image.resize`gradient with `method=ResizeMethod.NEAREST`
*`tf.image.crop_and_resize` gradient to both image and boxes
72
74
*`tf.sparse.sparse_dense_matmul` forward
@@ -76,4 +78,4 @@ As part of the implementation, we will review all Ops to make a determination of
76
78
77
79
`tf.image.sample_distorted_bounding_box` has been observed to behave nondeterministically unless you set its seed parameter, even if you call tf.random.set_seed. We will review this Op as part the change. Another case that needs review is "pulling a random number from a PRNG before its state has been initialized".
78
80
79
-
Given the large number of Ops involved, there is a chance that we might omit raising an error for a nondeterministic Op.
81
+
Given the large number of ops involved, there is a chance that we might omit raising an error for a nondeterministic Op.
0 commit comments