You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 10, 2025. It is now read-only.
Copy file name to clipboardExpand all lines: rfcs/20210119-determinism.md
+14-10Lines changed: 14 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ To get deterministic behavior, users must do the following:
15
15
* Enable determinism using the API proposed in this doc
16
16
* Use same hardware every run
17
17
* Use the same software environment every run (OS, checkpoints, version of TF, environmental variables, etc).
18
-
* Not use constructs outside TensorFlow that are non-deterministic, such as Python’s `random` module or using multiple threads/processes in ways that influence TensorFlow’s behavior
18
+
* Not use constructs outside TensorFlow that are nondeterministic, such as Python’s `random` module or using multiple threads/processes in ways that influence TensorFlow’s behavior
19
19
20
20
## Motivation
21
21
There are several mission critical applications in life-sciences, finance and automation that require deterministic behavior. Determinism is required so that the behavior of these applications can be accurately predicted & demonstrated in a variety of scenarios.
@@ -31,10 +31,10 @@ We will create a new flag with the default value of ‘False’ which enables de
31
31
*`tf.config.deterministic_execution_enabled()`
32
32
33
33
The first function takes in a boolean value, and allows the model developer to enable/disable determinism. The second function returns a bool indicating whether determinism is enabled.
34
-
In some cases, we have deterministic and non-deterministic versions of the kernel. In such cases, we will use this flag to run the appropriate kernels.
34
+
In some cases, we have deterministic and nondeterministic versions of the kernel. In such cases, we will use this flag to run the appropriate kernels.
35
35
For ops which do not yet have a deterministic implementation, TensorFlow will raise an error `tf.errors.UnimplementedError` if the flag is enabled.
36
36
37
-
Enabling deterministic execution does not automatically cause a user’s program to become deterministic. If users use non-deterministic constructs outside TensorFlow, such as threads/process, in ways that influence TensorFlow’s behavior, their program will not be deterministic. In order for a user to ensure their program is deterministic, users must both enable deterministic execution within TensorFlow and remove any sources of non-determinism outside TensorFlow.
37
+
Enabling deterministic execution does not automatically cause a user’s program to become deterministic. If users use nondeterministic constructs outside TensorFlow, such as threads/process, in ways that influence TensorFlow’s behavior, their program will not be deterministic. In order for a user to ensure their program is deterministic, users must both enable deterministic execution within TensorFlow and remove any sources of nondeterminism outside TensorFlow.
38
38
39
39
### Existing Flags
40
40
Multiple environmental variables exist today that control determinism. As part of this change, we will deprecate then remove the following:
@@ -45,7 +45,7 @@ Multiple environmental variables exist today that control determinism. As part o
45
45
tf.data also has flags for determinism. The system will throw an error message if flags are out of sync i.e. if deterministic_execution_enabled is enabled but if the tf.data option is set to ‘false’, we will throw an error. (`tf.data.Options.experimental_deterministic`). We’ll also add the necessary checks for Dataset.map and Dataset.interleave.
46
46
47
47
### Grappler changes
48
-
Grappler graph optimizations may add non-deterministic behavior. In particular some optimizations will time out if they take too long to run. When determinism is enabled, these time outs will be disabled.
48
+
Grappler graph optimizations may add nondeterministic behavior. In particular some optimizations will time out if they take too long to run. When determinism is enabled, these time outs will be disabled.
49
49
50
50
### Random ops
51
51
Legacy random ops, such as `tf.random.normal`, are not deterministic if no seed is set, and so such ops will raise an error when determinism is enabled. To fix, the user should set a global seed with `tf.random.set_seed`. Since most models use legacy random ops, in practice users must call `tf.random.set_seed` when enabling deterministic behavior. Alternatively, users can pass a seed to every individual random operation, but doing so is more inconvenient.
@@ -59,17 +59,21 @@ In graph mode, ops will raise an error message when the random op is created. If
59
59
No error will be raised if a random op or generator is run before determinism is enabled (as is true for any other op), so users should take care to enable determinism before running any random ops or generators.
60
60
61
61
### Parameter Server
62
-
Use of parameter servers adds non-deterministic behavior. In case a model constructs a ParameterServerStrategy, TensorFlow will throw an error. We’ll also document this in the documentation for the flag.
62
+
Use of parameter servers adds nondeterministic behavior. In case a model constructs a ParameterServerStrategy, TensorFlow will throw an error. We’ll also document this in the documentation for the flag.
63
63
64
64
### Op Review and changes
65
-
As part of the implementation, we will review all Ops to make a determination of their behaviour (deterministic vs non-deterministic). Some of the Ops that are known to be non-deterministic include:
65
+
As part of the implementation, we will review all Ops to make a determination of their behaviour (deterministic vs nondeterministic). Some of the Ops that are known to be nondeterministic, at least when running on a GPU, include:
66
66
67
67
*`tf.nn.softmax_cross_entropy_with_logits`
68
68
*`tf.nn.sparse_softmax_cross_entropy_with_logits`
69
69
*`tf.image.resize` with method=ResizeMethod.NEAREST
Given the large number of Ops involved, there is a chance that we might omit raising an error for a non-deterministic Op.
76
+
77
+
`tf.image.sample_distorted_bounding_box` has been observed to behave nondeterministically unless you set its seed parameter, even if you call tf.random.set_seed. We will review this Op as part the change. Another case that needs review is "pulling a random number from a PRNG before its state has been initialized".
78
+
79
+
Given the large number of Ops involved, there is a chance that we might omit raising an error for a nondeterministic Op.
0 commit comments