Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Commit fb09c1a

Browse files
Sanjoy DasEdd Wilder-James
authored andcommitted
Document concurrency semantics for resource variables (#112)
* Initial commit. * Set status to "Implemented"
1 parent d9b5cfb commit fb09c1a

File tree

4 files changed

+59
-0
lines changed

4 files changed

+59
-0
lines changed
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Concurrency Semantics For TensorFlow Resource Variables
2+
3+
| Status | Implemented |
4+
:-------------- |:---------------------------------------------------- |
5+
| **Author(s)** | Sanjoy Das ([email protected]) |
6+
| **Sponsor** | Alexandre Passos ([email protected]) |
7+
| **Updated** | 2019-06-10 |
8+
9+
This document aims to assign semantics to _concurrent accesses_ to resource variables in TensorFlow. _Concurrent accesses_ refer to resource variable loads and stores that are not totally ordered by control or data dependencies.
10+
11+
The semantics presented in this doc will probably be unsurprising to people familiar with TensorFlow. However a concrete written down semantics is a useful foundation for writing transformations and analyses over TensorFlow graphs.
12+
13+
We do not discuss ref variables, which are deprecated.
14+
15+
## Semantics
16+
17+
We define an _operation instance_ as a `(node, frame)` tuple, where `node` is a node in the TensorFlow graph and `frame` is a frame (see [Dynamic control flow in large-scale machine learning](https://dl.acm.org/citation.cfm?id=3190551)) in which the node was executed.
18+
19+
The operation instances in an execution of a graph are put in a total order such that:
20+
21+
1. The total order is consistent with the partial order imposed by the control and data dependencies within the graph. That is, if `(n1, f1)` depends on `(n0, f0)` then in the total order `(n1, f1)` will be ordered after `(n0, f0)`.
22+
1. The total order respects critical sections. That is, if two nodes, `n0` and `n1` are in the same critical section, then the following total order is illegal: [(`n0`, `f0`), (`n0`, `f1`), (`n1`, `f0`), (`n1`, `f1`)]. On the other hand, the following total order is fine: [(`n0`, `f0`), (`n1`, `f0`), (`n0`, `f1`), (`n1`, `f1`)]
23+
1. If there are `N` graphs running in the same [TensorFlow session](https://www.tensorflow.org/api_docs/python/tf/Session), with source nodes `sub_source_i` and sink nodes `sub_sink_i` (with `i` in `[0, N)`) then the total order is consistent with a “supergraph” with a source node that has control edges to all `sub_source_i` and a sink node that has control edges from all `sub_sink_i`.
24+
1. The total order is consistent with blocking semantics of TensorFlow ops. For example if an instance of a `QueueDequeueV2` can complete only after a matching `QueueEnqueueV2` has completed then the total order cannot put the `QueueDequeueV2` instance before the `QueueEnqueueV2` instance.
25+
26+
Resource variable reads and writes follow sequential semantics as if they were executed in this total order. The writes and reads themselves are atomic -- a read never sees a partial write, even when it is racing with the write.
27+
28+
This total order is not fixed run to run and can vary depending on various external factors like operating system thread scheduling, hardware behavior etc.
29+
30+
Operations that read and write resource variables at the same time, like `ResourceApplyGradientDescent`, are modeled by decomposing them into a read, compute and write sequence which does not execute atomically as a unit. This behavior is weaker than what the executor implements and this weakening simplifies XLA’s auto-clustering.
31+
32+
33+
## Examples & Consequences
34+
35+
### Simplest Non-Trivial Example
36+
37+
![alt_text](20190610-resource-variable-semantics/simplest_non_trival_example.png "Simple non-trivial example.")
38+
39+
The graph above does two resource writes and two resource reads. Assuming that `X` and `Y` are `0` to start with, control dependencies on the graph guarantee that _if_ `r0` is `2` _then_ `r1` is `1`. That is, the result `r0 == 2 && r1 == 0` is disallowed.
40+
41+
Note that this means stores that are constrained to run in a specific order via control or data dependencies cannot be reordered.
42+
43+
44+
### Resource Update Operations
45+
46+
Resource update operations, like `ResourceApplyGradientDescent`, are modeled as a read, compute, write sequence. For simplicity we invent a resource update operation, `*X+=1`, that increments the tensor stored in the resource variable `X` by `1`. With this framing, two concurrent `*X+=1` operations can be modeled as the graph below. All total orders that are consistent with the partial order in the graph are valid.
47+
48+
![alt_text](20190610-resource-variable-semantics/resource_update_operations.png "Lowered resource update operation.")
49+
50+
This behavior is weaker than the memory model actually implemented by the graph executor. Since `ResourceApplyGradientDescent` et. al. [lock out](https://github.com/tensorflow/tensorflow/blob/a53365719e445edc5b48f0877f1d85b8d5837384/tensorflow/core/kernels/training_ops.cc#L556) the resource variable being updated for the entire duration of their execution the graph executor guarantees that the two increments happen atomically and `*X` is always `2` after the graph execution.
51+
52+
53+
### Load/Store Reordering
54+
55+
![alt_text](20190610-resource-variable-semantics/load_store_reordering.png "Load/store reordering example.")
56+
57+
Just like two stores ordered via control dependence cannot be reordered, a load and a store ordered via control edges cannot be reordered either. For instance, given the semantics in this document, when the execution of graph above ends if `X` is `1` then `r0` must be `5`. This invariant breaks if we allow the `r0 = *Y` to happen before `*X = 1`.
58+
59+
This invariant is relevant for XLA; it means wrapping an arbitrary TensorFlow graph in an `_XlaCompile` / `_XlaRun` sequence or an `XlaLaunch` op is generally incorrect -- these XLA ops hoist all the resource variable reads to the beginning of the computation and sink all the writes to the end of the computation which may violate load/store ordering present in the TensorFlow graph.
23.9 KB
Loading
25.7 KB
Loading
23.6 KB
Loading

0 commit comments

Comments
 (0)