Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Commit 2406126

Browse files
reedwmduncanriach
andauthored
Make changes suggested by @duncanriach
Co-authored-by: Duncan Riach <[email protected]>
1 parent a315f84 commit 2406126

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

rfcs/20210119-determinism.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
:---------------|:-----------------------------------------------------------------------------|
55
| **Author(s)** | Pankaj Kanwar (Google), Duncan Riach (NVIDIA), Reed Wanderman-Milne (Google) |
66
| **Sponsor** | Sanjoy Das (Google) |
7-
| **Updated** | 2021-03-10 |
7+
| **Updated** | 2021-03-12 |
88

99
## Objective
1010

@@ -16,7 +16,7 @@ There are several mission critical applications in medicine, finance and automat
1616

1717
The lack of determinism in certain ops prevents companies from launching products using models developed in TF. For a subset of these industries having deterministic behavior is a regulatory requirement.
1818

19-
In addition, deterministic ops increases model velocity development by reducing noise, while also simplifying the debugging workflow.
19+
In addition, deterministic functionality, enabled by deterministic ops, increases model velocity development by reducing noise, while also simplifying the debugging workflow.
2020

2121
## Design Proposal
2222

@@ -27,7 +27,7 @@ We will create a new flag with the default value of "False" which enables determ
2727

2828
The first function takes in a boolean value, and allows the model developer to enable/disable deterministic ops. The second function returns a bool indicating whether deterministic ops is enabled.
2929

30-
Once enabled, every built-in op will either be made deterministic or raise an error if determinism is not supported. For ops which we have not yet implemented a deterministic version, a `NotImplementedError` will be raised. In the long term, we plan on adding a deterministic version to all such ops. For ops which are inherently nondeterministic such as `tf.random.normal` without a seed, a `FailedPreconditionError` will be raised (the precondition being that determinism must be disabled). Certain ops will only raise an error for certain input shapes or attributes. Depending on the op, in graph mode, the error will either be raised when the op is constructed or when the op is run.
30+
Once enabled, every built-in op will either be made deterministic or raise an error if determinism is not supported. A `tf.errors.UnimplementedError` will be raised by ops for which we have not yet implemented a deterministic version. In the long term, we plan on adding a deterministic version to all such ops. For ops which are inherently nondeterministic such as `tf.random.normal` without a seed, a `tf.errors.FailedPreconditionError` will be raised (the precondition being that determinism must be disabled). Some ops will only raise an error on a subset of input shapes, attributes, data types, or codepaths through the op. Depending on the op, in graph mode, the error will either be raised when the op is constructed or when the op is run.
3131

3232
By "deterministic", we mean that if an op is run multiple times with the same inputs and attributes, it produces the same outputs. The op must be run with the same hardware configuration on the same device each time. The software environment must be the same every run as well (OS, TF and CUDA version, environmental variables, etc). For stateful ops, the all relevant state must be identical each run (values of `tf.Variable`s, checkpoints, etc).
3333

@@ -38,8 +38,8 @@ This API only makes ops deterministic, not other parts of TensorFlow. For exampl
3838
The API allows users to write deterministic models. To do so, users must:
3939

4040
* Enable deterministic ops with `tf.config.enable_deterministic_ops`.
41-
* Use same hardware configuration in every run.
42-
* Use the same software environment every run (OS, checkpoints, version of CUDA and TF, environmental variables, etc).
41+
* Use the same hardware configuration in every run.
42+
* Use the same software environment in every run (OS, checkpoints, version of CUDA and TF, environmental variables, etc).
4343
* Not use nondeterministic parts of TensorFlow (besides ops), such as `ParameterServerStrategy`.
4444
* Not use constructs outside TensorFlow that are nondeterministic, such as Python’s `random` module (without a fixed seed) or using multiple threads/processes in ways that influence TensorFlow’s behavior.
4545
* Not use nondeterministic custom ops.
@@ -77,7 +77,7 @@ It is also possible Grappler is nondeterministic due to nondeterministic iterati
7777

7878
### Random ops
7979

80-
Legacy random ops, such as `tf.random.normal`, are not deterministic if no seed is set, and so such ops will raise a `FailedPreconditionError` when determinism is enabled. To fix, the user should set a global seed with `tf.random.set_seed`. Since most models use legacy random ops (for variable initialization and various other uses), in practice users must call `tf.random.set_seed` when enabling deterministic ops. Alternatively, users can pass a seed to every individual random operation, but doing so is more inconvenient.
80+
Legacy random ops, such as `tf.random.normal`, are not deterministic if no seed is set, and so such ops will raise a `tf.errors.FailedPreconditionError` when determinism is enabled. To fix, the user should set a global seed with `tf.random.set_seed`. Since most models use legacy random ops (for variable initialization and various other uses), in practice users must call `tf.random.set_seed` when enabling deterministic ops. Alternatively, users can pass a seed to every individual random operation, but doing so is more inconvenient.
8181

8282
Certain random ops, such as `tf.image.sample_distorted_bounding_box` and `tf.nn.fractional_max_pool`, ignore the global seed if a seed is not explicitly passed. For such ops, setting the global seed is not enough to avoid the error, so users must pass a seed directly to the op.
8383

@@ -95,7 +95,7 @@ We must ensure that every op will either run deterministically or raise an error
9595

9696
2. We will add a special mode to TensorFlow where every time a non-stateful op is run, TensorFlow will rerun the op several times and assert the outputs are the same each time. We will then run the TensorFlow unit tests with this mode as part of the nightly tests. Doing so ensures that for each op that is run as part of a unit test, it will be tested for determinism.
9797

98-
3. When adding determinism to an op which previously was nondeterministic, an explicit unit test will be added that checks for determinism. This is slightly redundant with the special mode described above, but the explicit unit test can be part of the presubmit tests instead of the nightly tests, and can test on inputs that are very likely to demonstrate nondeterminism if it exists.
98+
3. When adding determinism to an op which previously was nondeterministic, an explicit unit test will be added that checks for determinism. Unlike running unit tests with the special mode above, the explicit unit tests can be part of the presubmit tests instead of the nightly tests, and can test on inputs that are very likely to demonstrate nondeterminism if it exists.
9999

100100
### Op Review and changes
101101

0 commit comments

Comments
 (0)