Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Commit 8743797

Browse files
authored
Update 20210119-determinism.md
Updating the RFC based on comments from NVIDIA.
1 parent 2e72dd2 commit 8743797

File tree

1 file changed

+14
-10
lines changed

1 file changed

+14
-10
lines changed

rfcs/20210119-determinism.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ To get deterministic behavior, users must do the following:
1515
* Enable determinism using the API proposed in this doc
1616
* Use same hardware every run
1717
* Use the same software environment every run (OS, checkpoints, version of TF, environmental variables, etc).
18-
* Not use constructs outside TensorFlow that are non-deterministic, such as Python’s `random` module or using multiple threads/processes in ways that influence TensorFlow’s behavior
18+
* Not use constructs outside TensorFlow that are nondeterministic, such as Python’s `random` module or using multiple threads/processes in ways that influence TensorFlow’s behavior
1919

2020
## Motivation
2121
There are several mission critical applications in life-sciences, finance and automation that require deterministic behavior. Determinism is required so that the behavior of these applications can be accurately predicted & demonstrated in a variety of scenarios.
@@ -31,10 +31,10 @@ We will create a new flag with the default value of ‘False’ which enables de
3131
* `tf.config.deterministic_execution_enabled()`
3232

3333
The first function takes in a boolean value, and allows the model developer to enable/disable determinism. The second function returns a bool indicating whether determinism is enabled.
34-
In some cases, we have deterministic and non-deterministic versions of the kernel. In such cases, we will use this flag to run the appropriate kernels.
34+
In some cases, we have deterministic and nondeterministic versions of the kernel. In such cases, we will use this flag to run the appropriate kernels.
3535
For ops which do not yet have a deterministic implementation, TensorFlow will raise an error `tf.errors.UnimplementedError` if the flag is enabled.
3636

37-
Enabling deterministic execution does not automatically cause a user’s program to become deterministic. If users use non-deterministic constructs outside TensorFlow, such as threads/process, in ways that influence TensorFlow’s behavior, their program will not be deterministic. In order for a user to ensure their program is deterministic, users must both enable deterministic execution within TensorFlow and remove any sources of non-determinism outside TensorFlow.
37+
Enabling deterministic execution does not automatically cause a user’s program to become deterministic. If users use nondeterministic constructs outside TensorFlow, such as threads/process, in ways that influence TensorFlow’s behavior, their program will not be deterministic. In order for a user to ensure their program is deterministic, users must both enable deterministic execution within TensorFlow and remove any sources of nondeterminism outside TensorFlow.
3838

3939
### Existing Flags
4040
Multiple environmental variables exist today that control determinism. As part of this change, we will deprecate then remove the following:
@@ -45,7 +45,7 @@ Multiple environmental variables exist today that control determinism. As part o
4545
tf.data also has flags for determinism. The system will throw an error message if flags are out of sync i.e. if deterministic_execution_enabled is enabled but if the tf.data option is set to ‘false’, we will throw an error. (`tf.data.Options.experimental_deterministic`). We’ll also add the necessary checks for Dataset.map and Dataset.interleave.
4646

4747
### Grappler changes
48-
Grappler graph optimizations may add non-deterministic behavior. In particular some optimizations will time out if they take too long to run. When determinism is enabled, these time outs will be disabled.
48+
Grappler graph optimizations may add nondeterministic behavior. In particular some optimizations will time out if they take too long to run. When determinism is enabled, these time outs will be disabled.
4949

5050
### Random ops
5151
Legacy random ops, such as `tf.random.normal`, are not deterministic if no seed is set, and so such ops will raise an error when determinism is enabled. To fix, the user should set a global seed with `tf.random.set_seed`. Since most models use legacy random ops, in practice users must call `tf.random.set_seed` when enabling deterministic behavior. Alternatively, users can pass a seed to every individual random operation, but doing so is more inconvenient.
@@ -59,17 +59,21 @@ In graph mode, ops will raise an error message when the random op is created. If
5959
No error will be raised if a random op or generator is run before determinism is enabled (as is true for any other op), so users should take care to enable determinism before running any random ops or generators.
6060

6161
### Parameter Server
62-
Use of parameter servers adds non-deterministic behavior. In case a model constructs a ParameterServerStrategy, TensorFlow will throw an error. We’ll also document this in the documentation for the flag.
62+
Use of parameter servers adds nondeterministic behavior. In case a model constructs a ParameterServerStrategy, TensorFlow will throw an error. We’ll also document this in the documentation for the flag.
6363

6464
### Op Review and changes
65-
As part of the implementation, we will review all Ops to make a determination of their behaviour (deterministic vs non-deterministic). Some of the Ops that are known to be non-deterministic include:
65+
As part of the implementation, we will review all Ops to make a determination of their behaviour (deterministic vs nondeterministic). Some of the Ops that are known to be nondeterministic, at least when running on a GPU, include:
6666

6767
* `tf.nn.softmax_cross_entropy_with_logits`
6868
* `tf.nn.sparse_softmax_cross_entropy_with_logits`
6969
* `tf.image.resize` with method=ResizeMethod.NEAREST
70-
* `tf.math.segment_sum`, `tf.math.unsorted_segment_sum`
71-
* `tf.image.crop_and_resize gradient` to both image and boxes
70+
* `tf.math.segment_sum`, `tf.math.unsorted_segment_sum` forward
71+
* `tf.image.crop_and_resize` gradient to both image and boxes
72+
* `tf.sparse.sparse_dense_matmul` forward
73+
* `tf.math.unsorted_segment_mean`, `tf.math.unsorted_segment_prod` and `tf.math.unsorted_segment_sqrt`; all foward
7274
* `tf.sparse.sparse_dense_matmul`
73-
* `tf.math.unsorted_segment_mean`, `tf.math.unsorted_segment_prod`
7475

75-
Given the large number of Ops involved, there is a chance that we might omit raising an error for a non-deterministic Op.
76+
77+
`tf.image.sample_distorted_bounding_box` has been observed to behave nondeterministically unless you set its seed parameter, even if you call tf.random.set_seed. We will review this Op as part the change. Another case that needs review is "pulling a random number from a PRNG before its state has been initialized".
78+
79+
Given the large number of Ops involved, there is a chance that we might omit raising an error for a nondeterministic Op.

0 commit comments

Comments
 (0)