Skip to content

Commit 192a5e0

Browse files
committed
Merge tag 'lkmm.2023.02.15a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull LKMM (Linux Kernel Memory Model) updates from Paul McKenney: "Documentation updates. Add read-modify-write sequences, which means that stronger primitives more consistently result in stronger ordering, while still remaining in the envelope of the hardware that supports Linux. Address, data, and control dependencies used to ignore data that was stored in temporaries. This update extends these dependency chains to include unmarked intra-thread stores and loads. Note that these unmarked stores and loads should not be concurrently accessed from multiple threads, and doing so will cause LKMM to flag such accesses as data races" * tag 'lkmm.2023.02.15a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: tools: memory-model: Make plain accesses carry dependencies Documentation: Fixed a typo in atomic_t.txt tools: memory-model: Add rmw-sequences to the LKMM locking/memory-barriers.txt: Improve documentation for writel() example
2 parents a5c95ca + 9ba7d3b commit 192a5e0

File tree

6 files changed

+90
-15
lines changed

6 files changed

+90
-15
lines changed

Documentation/atomic_t.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -324,7 +324,7 @@ atomic operations.
324324

325325
Specifically 'simple' cmpxchg() loops are expected to not starve one another
326326
indefinitely. However, this is not evident on LL/SC architectures, because
327-
while an LL/SC architecure 'can/should/must' provide forward progress
327+
while an LL/SC architecture 'can/should/must' provide forward progress
328328
guarantees between competing LL/SC sections, such a guarantee does not
329329
transfer to cmpxchg() implemented using LL/SC. Consider:
330330

Documentation/memory-barriers.txt

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1910,7 +1910,8 @@ There are some more advanced barrier functions:
19101910

19111911
These are for use with consistent memory to guarantee the ordering
19121912
of writes or reads of shared memory accessible to both the CPU and a
1913-
DMA capable device.
1913+
DMA capable device. See Documentation/core-api/dma-api.rst file for more
1914+
information about consistent memory.
19141915

19151916
For example, consider a device driver that shares memory with a device
19161917
and uses a descriptor status value to indicate if the descriptor belongs
@@ -1931,22 +1932,21 @@ There are some more advanced barrier functions:
19311932
/* assign ownership */
19321933
desc->status = DEVICE_OWN;
19331934

1934-
/* notify device of new descriptors */
1935+
/* Make descriptor status visible to the device followed by
1936+
* notify device of new descriptor
1937+
*/
19351938
writel(DESC_NOTIFY, doorbell);
19361939
}
19371940

1938-
The dma_rmb() allows us guarantee the device has released ownership
1941+
The dma_rmb() allows us to guarantee that the device has released ownership
19391942
before we read the data from the descriptor, and the dma_wmb() allows
19401943
us to guarantee the data is written to the descriptor before the device
19411944
can see it now has ownership. The dma_mb() implies both a dma_rmb() and
1942-
a dma_wmb(). Note that, when using writel(), a prior wmb() is not needed
1943-
to guarantee that the cache coherent memory writes have completed before
1944-
writing to the MMIO region. The cheaper writel_relaxed() does not provide
1945-
this guarantee and must not be used here.
1946-
1947-
See the subsection "Kernel I/O barrier effects" for more information on
1948-
relaxed I/O accessors and the Documentation/core-api/dma-api.rst file for
1949-
more information on consistent memory.
1945+
a dma_wmb().
1946+
1947+
Note that the dma_*() barriers do not provide any ordering guarantees for
1948+
accesses to MMIO regions. See the later "KERNEL I/O BARRIER EFFECTS"
1949+
subsection for more information about I/O accessors and MMIO ordering.
19501950

19511951
(*) pmem_wmb();
19521952

tools/memory-model/Documentation/explanation.txt

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1007,6 +1007,36 @@ order. Equivalently,
10071007
where the rmw relation links the read and write events making up each
10081008
atomic update. This is what the LKMM's "atomic" axiom says.
10091009

1010+
Atomic rmw updates play one more role in the LKMM: They can form "rmw
1011+
sequences". An rmw sequence is simply a bunch of atomic updates where
1012+
each update reads from the previous one. Written using events, it
1013+
looks like this:
1014+
1015+
Z0 ->rf Y1 ->rmw Z1 ->rf ... ->rf Yn ->rmw Zn,
1016+
1017+
where Z0 is some store event and n can be any number (even 0, in the
1018+
degenerate case). We write this relation as: Z0 ->rmw-sequence Zn.
1019+
Note that this implies Z0 and Zn are stores to the same variable.
1020+
1021+
Rmw sequences have a special property in the LKMM: They can extend the
1022+
cumul-fence relation. That is, if we have:
1023+
1024+
U ->cumul-fence X -> rmw-sequence Y
1025+
1026+
then also U ->cumul-fence Y. Thinking about this in terms of the
1027+
operational model, U ->cumul-fence X says that the store U propagates
1028+
to each CPU before the store X does. Then the fact that X and Y are
1029+
linked by an rmw sequence means that U also propagates to each CPU
1030+
before Y does. In an analogous way, rmw sequences can also extend
1031+
the w-post-bounded relation defined below in the PLAIN ACCESSES AND
1032+
DATA RACES section.
1033+
1034+
(The notion of rmw sequences in the LKMM is similar to, but not quite
1035+
the same as, that of release sequences in the C11 memory model. They
1036+
were added to the LKMM to fix an obscure bug; without them, atomic
1037+
updates with full-barrier semantics did not always guarantee ordering
1038+
at least as strong as atomic updates with release-barrier semantics.)
1039+
10101040

10111041
THE PRESERVED PROGRAM ORDER RELATION: ppo
10121042
-----------------------------------------
@@ -2545,7 +2575,7 @@ smp_store_release() -- which is basically how the Linux kernel treats
25452575
them.
25462576

25472577
Although we said that plain accesses are not linked by the ppo
2548-
relation, they do contribute to it indirectly. Namely, when there is
2578+
relation, they do contribute to it indirectly. Firstly, when there is
25492579
an address dependency from a marked load R to a plain store W,
25502580
followed by smp_wmb() and then a marked store W', the LKMM creates a
25512581
ppo link from R to W'. The reasoning behind this is perhaps a little
@@ -2554,6 +2584,13 @@ for this source code in which W' could execute before R. Just as with
25542584
pre-bounding by address dependencies, it is possible for the compiler
25552585
to undermine this relation if sufficient care is not taken.
25562586

2587+
Secondly, plain accesses can carry dependencies: If a data dependency
2588+
links a marked load R to a store W, and the store is read by a load R'
2589+
from the same thread, then the data loaded by R' depends on the data
2590+
loaded originally by R. Thus, if R' is linked to any access X by a
2591+
dependency, R is also linked to access X by the same dependency, even
2592+
if W' or R' (or both!) are plain.
2593+
25572594
There are a few oddball fences which need special treatment:
25582595
smp_mb__before_atomic(), smp_mb__after_atomic(), and
25592596
smp_mb__after_spinlock(). The LKMM uses fence events with special

tools/memory-model/linux-kernel.bell

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,3 +82,9 @@ flag ~empty different-values(srcu-rscs) as srcu-bad-nesting
8282
let Marked = (~M) | IW | Once | Release | Acquire | domain(rmw) | range(rmw) |
8383
LKR | LKW | UL | LF | RL | RU
8484
let Plain = M \ Marked
85+
86+
(* Redefine dependencies to include those carried through plain accesses *)
87+
let carry-dep = (data ; rfi)*
88+
let addr = carry-dep ; addr
89+
let ctrl = carry-dep ; ctrl
90+
let data = carry-dep ; data

tools/memory-model/linux-kernel.cat

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,9 @@ let ppo = to-r | to-w | fence | (po-unlock-lock-po & int)
7474

7575
(* Propagation: Ordering from release operations and strong fences. *)
7676
let A-cumul(r) = (rfe ; [Marked])? ; r
77+
let rmw-sequence = (rf ; rmw)*
7778
let cumul-fence = [Marked] ; (A-cumul(strong-fence | po-rel) | wmb |
78-
po-unlock-lock-po) ; [Marked]
79+
po-unlock-lock-po) ; [Marked] ; rmw-sequence
7980
let prop = [Marked] ; (overwrite & ext)? ; cumul-fence* ;
8081
[Marked] ; rfe? ; [Marked]
8182

@@ -174,7 +175,7 @@ let vis = cumul-fence* ; rfe? ; [Marked] ;
174175
let w-pre-bounded = [Marked] ; (addr | fence)?
175176
let r-pre-bounded = [Marked] ; (addr | nonrw-fence |
176177
([R4rmb] ; fencerel(Rmb) ; [~Noreturn]))?
177-
let w-post-bounded = fence? ; [Marked]
178+
let w-post-bounded = fence? ; [Marked] ; rmw-sequence
178179
let r-post-bounded = (nonrw-fence | ([~Noreturn] ; fencerel(Rmb) ; [R4rmb]))? ;
179180
[Marked]
180181

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
C dep+plain
2+
3+
(*
4+
* Result: Never
5+
*
6+
* This litmus test demonstrates that in LKMM, plain accesses
7+
* carry dependencies much like accesses to registers:
8+
* The data stored to *z1 and *z2 by P0() originates from P0()'s
9+
* READ_ONCE(), and therefore using that data to compute the
10+
* conditional of P0()'s if-statement creates a control dependency
11+
* from that READ_ONCE() to P0()'s WRITE_ONCE().
12+
*)
13+
14+
{}
15+
16+
P0(int *x, int *y, int *z1, int *z2)
17+
{
18+
int a = READ_ONCE(*x);
19+
*z1 = a;
20+
*z2 = *z1;
21+
if (*z2 == 1)
22+
WRITE_ONCE(*y, 1);
23+
}
24+
25+
P1(int *x, int *y)
26+
{
27+
int r = smp_load_acquire(y);
28+
smp_store_release(x, r);
29+
}
30+
31+
exists (x=1 /\ y=1)

0 commit comments

Comments
 (0)