Update 20210119-determinism.md

pkanwar23 · web-flow · commit 8743797be65b · 2021-01-31T11:35:23.000-08:00
Updating the RFC based on comments from NVIDIA.
diff --git a/rfcs/20210119-determinism.md b/rfcs/20210119-determinism.md
@@ -15,7 +15,7 @@ To get deterministic behavior, users must do the following:
  * Enable determinism using the API proposed in this doc
  * Use same hardware every run
  * Use the same software environment every run (OS, checkpoints, version of TF, environmental variables, etc).
- * Not use constructs outside TensorFlow that are non-deterministic, such as Python’s `random` module or using multiple threads/processes in ways that influence TensorFlow’s behavior
+ * Not use constructs outside TensorFlow that are nondeterministic, such as Python’s `random` module or using multiple threads/processes in ways that influence TensorFlow’s behavior
 
 ## Motivation
 There are several mission critical applications in life-sciences, finance and automation that require deterministic behavior. Determinism is required so that the behavior of these applications can be accurately predicted & demonstrated in a variety of scenarios. 
@@ -31,10 +31,10 @@ We will create a new flag with the default value of ‘False’ which enables de
 * `tf.config.deterministic_execution_enabled()`
 
 The first function takes in a boolean value, and allows the model developer to enable/disable determinism. The second function returns a bool indicating whether determinism is enabled.
-In some cases, we have deterministic and non-deterministic versions of the kernel. In such cases, we will use this flag to run the appropriate kernels.
+In some cases, we have deterministic and nondeterministic versions of the kernel. In such cases, we will use this flag to run the appropriate kernels.
 For ops which do not yet have a deterministic implementation, TensorFlow will raise an error `tf.errors.UnimplementedError` if the flag is enabled.
 
-Enabling deterministic execution does not automatically cause a user’s program to become deterministic. If users use non-deterministic constructs outside TensorFlow, such as threads/process, in ways that influence TensorFlow’s behavior, their program will not be deterministic. In order for a user to ensure their program is deterministic, users must both enable deterministic execution within TensorFlow and remove any sources of non-determinism outside TensorFlow.
+Enabling deterministic execution does not automatically cause a user’s program to become deterministic. If users use nondeterministic constructs outside TensorFlow, such as threads/process, in ways that influence TensorFlow’s behavior, their program will not be deterministic. In order for a user to ensure their program is deterministic, users must both enable deterministic execution within TensorFlow and remove any sources of nondeterminism outside TensorFlow.
 
 ### Existing Flags
 Multiple environmental variables exist today that control determinism. As part of this change, we will deprecate then remove the following:
@@ -45,7 +45,7 @@ Multiple environmental variables exist today that control determinism. As part o
 tf.data also has flags for determinism. The system will throw an error message if flags are out of sync i.e. if deterministic_execution_enabled is enabled but if the tf.data option is set to ‘false’, we will throw an error. (`tf.data.Options.experimental_deterministic`). We’ll also add the necessary checks for Dataset.map and Dataset.interleave.
 
 ### Grappler changes
-Grappler graph optimizations may add non-deterministic behavior. In particular some optimizations will time out if they take too long to run. When determinism is enabled, these time outs will be disabled.
+Grappler graph optimizations may add nondeterministic behavior. In particular some optimizations will time out if they take too long to run. When determinism is enabled, these time outs will be disabled.
 
 ### Random ops
 Legacy random ops, such as `tf.random.normal`, are not deterministic if no seed is set, and so such ops will raise an error when determinism is enabled. To fix, the user should set a global seed with `tf.random.set_seed`. Since most models use legacy random ops, in practice users must call `tf.random.set_seed` when enabling deterministic behavior. Alternatively, users can pass a seed to every individual random operation, but doing so is more inconvenient.
@@ -59,17 +59,21 @@ In graph mode, ops will raise an error message when the random op is created. If
 No error will be raised if a random op or generator is run before determinism is enabled (as is true for any other op), so users should take care to enable determinism before running any random ops or generators.
 
 ### Parameter Server
-Use of parameter servers adds non-deterministic behavior. In case a model constructs a ParameterServerStrategy, TensorFlow will throw an error. We’ll also document this in the documentation for the flag.
+Use of parameter servers adds nondeterministic behavior. In case a model constructs a ParameterServerStrategy, TensorFlow will throw an error. We’ll also document this in the documentation for the flag.
 
 ### Op Review and changes
-As part of the implementation, we will review all Ops to make a determination of their behaviour (deterministic vs non-deterministic). Some of the Ops that are known to be non-deterministic include:
+As part of the implementation, we will review all Ops to make a determination of their behaviour (deterministic vs nondeterministic). Some of the Ops that are known to be nondeterministic, at least when running on a GPU, include:
 
 * `tf.nn.softmax_cross_entropy_with_logits` 
 * `tf.nn.sparse_softmax_cross_entropy_with_logits` 
 * `tf.image.resize` with method=ResizeMethod.NEAREST 
-* `tf.math.segment_sum`, `tf.math.unsorted_segment_sum`
-* `tf.image.crop_and_resize gradient` to both image and boxes 
+* `tf.math.segment_sum`, `tf.math.unsorted_segment_sum` forward
+* `tf.image.crop_and_resize` gradient to both image and boxes 
+* `tf.sparse.sparse_dense_matmul` forward
+* `tf.math.unsorted_segment_mean`, `tf.math.unsorted_segment_prod` and `tf.math.unsorted_segment_sqrt`; all foward
 * `tf.sparse.sparse_dense_matmul`
-* `tf.math.unsorted_segment_mean`, `tf.math.unsorted_segment_prod`
 
-Given the large number of Ops involved, there is a chance that we might omit raising an error for a non-deterministic Op.
+
+`tf.image.sample_distorted_bounding_box` has been observed to behave nondeterministically unless you set its seed parameter, even if you call tf.random.set_seed. We will review this Op as part the change. Another case that needs review is "pulling a random number from a PRNG before its state has been initialized".
+
+Given the large number of Ops involved, there is a chance that we might omit raising an error for a nondeterministic Op.