@@ -946,22 +946,39 @@ Limitations of the Linux-kernel memory model (LKMM) include:
946
946
carrying a dependency, then the compiler can break that dependency
947
947
by substituting a constant of that value.
948
948
949
- Conversely, LKMM sometimes doesn't recognize that a particular
950
- optimization is not allowed, and as a result, thinks that a
951
- dependency is not present (because the optimization would break it).
952
- The memory model misses some pretty obvious control dependencies
953
- because of this limitation. A simple example is:
949
+ Conversely, LKMM will sometimes overestimate the amount of
950
+ reordering compilers and CPUs can carry out, leading it to miss
951
+ some pretty obvious cases of ordering. A simple example is:
954
952
955
953
r1 = READ_ONCE(x);
956
954
if (r1 == 0)
957
955
smp_mb();
958
956
WRITE_ONCE(y, 1);
959
957
960
- There is a control dependency from the READ_ONCE to the WRITE_ONCE,
961
- even when r1 is nonzero, but LKMM doesn't realize this and thinks
962
- that the write may execute before the read if r1 != 0. (Yes, that
963
- doesn't make sense if you think about it, but the memory model's
964
- intelligence is limited.)
958
+ The WRITE_ONCE() does not depend on the READ_ONCE(), and as a
959
+ result, LKMM does not claim ordering. However, even though no
960
+ dependency is present, the WRITE_ONCE() will not be executed before
961
+ the READ_ONCE(). There are two reasons for this:
962
+
963
+ The presence of the smp_mb() in one of the branches
964
+ prevents the compiler from moving the WRITE_ONCE()
965
+ up before the "if" statement, since the compiler has
966
+ to assume that r1 will sometimes be 0 (but see the
967
+ comment below);
968
+
969
+ CPUs do not execute stores before po-earlier conditional
970
+ branches, even in cases where the store occurs after the
971
+ two arms of the branch have recombined.
972
+
973
+ It is clear that it is not dangerous in the slightest for LKMM to
974
+ make weaker guarantees than architectures. In fact, it is
975
+ desirable, as it gives compilers room for making optimizations.
976
+ For instance, suppose that a 0 value in r1 would trigger undefined
977
+ behavior elsewhere. Then a clever compiler might deduce that r1
978
+ can never be 0 in the if condition. As a result, said clever
979
+ compiler might deem it safe to optimize away the smp_mb(),
980
+ eliminating the branch and any ordering an architecture would
981
+ guarantee otherwise.
965
982
966
983
2. Multiple access sizes for a single variable are not supported,
967
984
and neither are misaligned or partially overlapping accesses.
0 commit comments