Skip to content

Commit b8fb65e

Browse files
committed
Merge tag 'lkmm.2022.09.30a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull LKMM (Linux Kernel Memory Model) updates from Paul McKenney: "Several documentation updates" * tag 'lkmm.2022.09.30a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: tools/memory-model: Clarify LKMM's limitations in litmus-tests.txt docs/memory-barriers.txt: Fixup long lines docs/memory-barriers.txt: Fix confusing name of 'data dependency barrier'
2 parents dda0ba4 + be94ecf commit b8fb65e

File tree

2 files changed

+122
-92
lines changed

2 files changed

+122
-92
lines changed

Documentation/memory-barriers.txt

Lines changed: 95 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ CONTENTS
5252

5353
- Varieties of memory barrier.
5454
- What may not be assumed about memory barriers?
55-
- Data dependency barriers (historical).
55+
- Address-dependency barriers (historical).
5656
- Control dependencies.
5757
- SMP barrier pairing.
5858
- Examples of memory barrier sequences.
@@ -187,9 +187,9 @@ As a further example, consider this sequence of events:
187187
B = 4; Q = P;
188188
P = &B; D = *Q;
189189

190-
There is an obvious data dependency here, as the value loaded into D depends on
191-
the address retrieved from P by CPU 2. At the end of the sequence, any of the
192-
following results are possible:
190+
There is an obvious address dependency here, as the value loaded into D depends
191+
on the address retrieved from P by CPU 2. At the end of the sequence, any of
192+
the following results are possible:
193193

194194
(Q == &A) and (D == 1)
195195
(Q == &B) and (D == 2)
@@ -391,58 +391,62 @@ Memory barriers come in four basic varieties:
391391
memory system as time progresses. All stores _before_ a write barrier
392392
will occur _before_ all the stores after the write barrier.
393393

394-
[!] Note that write barriers should normally be paired with read or data
395-
dependency barriers; see the "SMP barrier pairing" subsection.
394+
[!] Note that write barriers should normally be paired with read or
395+
address-dependency barriers; see the "SMP barrier pairing" subsection.
396396

397397

398-
(2) Data dependency barriers.
398+
(2) Address-dependency barriers (historical).
399399

400-
A data dependency barrier is a weaker form of read barrier. In the case
401-
where two loads are performed such that the second depends on the result
402-
of the first (eg: the first load retrieves the address to which the second
403-
load will be directed), a data dependency barrier would be required to
404-
make sure that the target of the second load is updated after the address
405-
obtained by the first load is accessed.
400+
An address-dependency barrier is a weaker form of read barrier. In the
401+
case where two loads are performed such that the second depends on the
402+
result of the first (eg: the first load retrieves the address to which
403+
the second load will be directed), an address-dependency barrier would
404+
be required to make sure that the target of the second load is updated
405+
after the address obtained by the first load is accessed.
406406

407-
A data dependency barrier is a partial ordering on interdependent loads
408-
only; it is not required to have any effect on stores, independent loads
409-
or overlapping loads.
407+
An address-dependency barrier is a partial ordering on interdependent
408+
loads only; it is not required to have any effect on stores, independent
409+
loads or overlapping loads.
410410

411411
As mentioned in (1), the other CPUs in the system can be viewed as
412412
committing sequences of stores to the memory system that the CPU being
413-
considered can then perceive. A data dependency barrier issued by the CPU
414-
under consideration guarantees that for any load preceding it, if that
415-
load touches one of a sequence of stores from another CPU, then by the
416-
time the barrier completes, the effects of all the stores prior to that
417-
touched by the load will be perceptible to any loads issued after the data
418-
dependency barrier.
413+
considered can then perceive. An address-dependency barrier issued by
414+
the CPU under consideration guarantees that for any load preceding it,
415+
if that load touches one of a sequence of stores from another CPU, then
416+
by the time the barrier completes, the effects of all the stores prior to
417+
that touched by the load will be perceptible to any loads issued after
418+
the address-dependency barrier.
419419

420420
See the "Examples of memory barrier sequences" subsection for diagrams
421421
showing the ordering constraints.
422422

423-
[!] Note that the first load really has to have a _data_ dependency and
423+
[!] Note that the first load really has to have an _address_ dependency and
424424
not a control dependency. If the address for the second load is dependent
425425
on the first load, but the dependency is through a conditional rather than
426426
actually loading the address itself, then it's a _control_ dependency and
427427
a full read barrier or better is required. See the "Control dependencies"
428428
subsection for more information.
429429

430-
[!] Note that data dependency barriers should normally be paired with
430+
[!] Note that address-dependency barriers should normally be paired with
431431
write barriers; see the "SMP barrier pairing" subsection.
432432

433+
[!] Kernel release v5.9 removed kernel APIs for explicit address-
434+
dependency barriers. Nowadays, APIs for marking loads from shared
435+
variables such as READ_ONCE() and rcu_dereference() provide implicit
436+
address-dependency barriers.
433437

434438
(3) Read (or load) memory barriers.
435439

436-
A read barrier is a data dependency barrier plus a guarantee that all the
437-
LOAD operations specified before the barrier will appear to happen before
438-
all the LOAD operations specified after the barrier with respect to the
439-
other components of the system.
440+
A read barrier is an address-dependency barrier plus a guarantee that all
441+
the LOAD operations specified before the barrier will appear to happen
442+
before all the LOAD operations specified after the barrier with respect to
443+
the other components of the system.
440444

441445
A read barrier is a partial ordering on loads only; it is not required to
442446
have any effect on stores.
443447

444-
Read memory barriers imply data dependency barriers, and so can substitute
445-
for them.
448+
Read memory barriers imply address-dependency barriers, and so can
449+
substitute for them.
446450

447451
[!] Note that read barriers should normally be paired with write barriers;
448452
see the "SMP barrier pairing" subsection.
@@ -550,17 +554,21 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
550554
Documentation/core-api/dma-api.rst
551555

552556

553-
DATA DEPENDENCY BARRIERS (HISTORICAL)
554-
-------------------------------------
557+
ADDRESS-DEPENDENCY BARRIERS (HISTORICAL)
558+
----------------------------------------
555559

556560
As of v4.15 of the Linux kernel, an smp_mb() was added to READ_ONCE() for
557561
DEC Alpha, which means that about the only people who need to pay attention
558562
to this section are those working on DEC Alpha architecture-specific code
559563
and those working on READ_ONCE() itself. For those who need it, and for
560564
those who are interested in the history, here is the story of
561-
data-dependency barriers.
565+
address-dependency barriers.
566+
567+
[!] While address dependencies are observed in both load-to-load and
568+
load-to-store relations, address-dependency barriers are not necessary
569+
for load-to-store situations.
562570

563-
The usage requirements of data dependency barriers are a little subtle, and
571+
The requirement of address-dependency barriers is a little subtle, and
564572
it's not always obvious that they're needed. To illustrate, consider the
565573
following sequence of events:
566574

@@ -570,11 +578,14 @@ following sequence of events:
570578
B = 4;
571579
<write barrier>
572580
WRITE_ONCE(P, &B);
573-
Q = READ_ONCE(P);
581+
Q = READ_ONCE_OLD(P);
574582
D = *Q;
575583

576-
There's a clear data dependency here, and it would seem that by the end of the
577-
sequence, Q must be either &A or &B, and that:
584+
[!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
585+
doesn't imply an address-dependency barrier.
586+
587+
There's a clear address dependency here, and it would seem that by the end of
588+
the sequence, Q must be either &A or &B, and that:
578589

579590
(Q == &A) implies (D == 1)
580591
(Q == &B) implies (D == 4)
@@ -588,8 +599,8 @@ While this may seem like a failure of coherency or causality maintenance, it
588599
isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
589600
Alpha).
590601

591-
To deal with this, a data dependency barrier or better must be inserted
592-
between the address load and the data load:
602+
To deal with this, READ_ONCE() provides an implicit address-dependency barrier
603+
since kernel release v4.15:
593604

594605
CPU 1 CPU 2
595606
=============== ===============
@@ -598,7 +609,7 @@ between the address load and the data load:
598609
<write barrier>
599610
WRITE_ONCE(P, &B);
600611
Q = READ_ONCE(P);
601-
<data dependency barrier>
612+
<implicit address-dependency barrier>
602613
D = *Q;
603614

604615
This enforces the occurrence of one of the two implications, and prevents the
@@ -615,26 +626,26 @@ odd-numbered bank is idle, one can see the new value of the pointer P (&B),
615626
but the old value of the variable B (2).
616627

617628

618-
A data-dependency barrier is not required to order dependent writes
619-
because the CPUs that the Linux kernel supports don't do writes
620-
until they are certain (1) that the write will actually happen, (2)
621-
of the location of the write, and (3) of the value to be written.
629+
An address-dependency barrier is not required to order dependent writes
630+
because the CPUs that the Linux kernel supports don't do writes until they
631+
are certain (1) that the write will actually happen, (2) of the location of
632+
the write, and (3) of the value to be written.
622633
But please carefully read the "CONTROL DEPENDENCIES" section and the
623-
Documentation/RCU/rcu_dereference.rst file: The compiler can and does
624-
break dependencies in a great many highly creative ways.
634+
Documentation/RCU/rcu_dereference.rst file: The compiler can and does break
635+
dependencies in a great many highly creative ways.
625636

626637
CPU 1 CPU 2
627638
=============== ===============
628639
{ A == 1, B == 2, C = 3, P == &A, Q == &C }
629640
B = 4;
630641
<write barrier>
631642
WRITE_ONCE(P, &B);
632-
Q = READ_ONCE(P);
643+
Q = READ_ONCE_OLD(P);
633644
WRITE_ONCE(*Q, 5);
634645

635-
Therefore, no data-dependency barrier is required to order the read into
646+
Therefore, no address-dependency barrier is required to order the read into
636647
Q with the store into *Q. In other words, this outcome is prohibited,
637-
even without a data-dependency barrier:
648+
even without an implicit address-dependency barrier of modern READ_ONCE():
638649

639650
(Q == &B) && (B == 4)
640651

@@ -645,12 +656,12 @@ can be used to record rare error conditions and the like, and the CPUs'
645656
naturally occurring ordering prevents such records from being lost.
646657

647658

648-
Note well that the ordering provided by a data dependency is local to
659+
Note well that the ordering provided by an address dependency is local to
649660
the CPU containing it. See the section on "Multicopy atomicity" for
650661
more information.
651662

652663

653-
The data dependency barrier is very important to the RCU system,
664+
The address-dependency barrier is very important to the RCU system,
654665
for example. See rcu_assign_pointer() and rcu_dereference() in
655666
include/linux/rcupdate.h. This permits the current target of an RCU'd
656667
pointer to be replaced with a new modified target, without the replacement
@@ -667,20 +678,21 @@ not understand them. The purpose of this section is to help you prevent
667678
the compiler's ignorance from breaking your code.
668679

669680
A load-load control dependency requires a full read memory barrier, not
670-
simply a data dependency barrier to make it work correctly. Consider the
671-
following bit of code:
681+
simply an (implicit) address-dependency barrier to make it work correctly.
682+
Consider the following bit of code:
672683

673684
q = READ_ONCE(a);
685+
<implicit address-dependency barrier>
674686
if (q) {
675-
<data dependency barrier> /* BUG: No data dependency!!! */
687+
/* BUG: No address dependency!!! */
676688
p = READ_ONCE(b);
677689
}
678690

679-
This will not have the desired effect because there is no actual data
691+
This will not have the desired effect because there is no actual address
680692
dependency, but rather a control dependency that the CPU may short-circuit
681693
by attempting to predict the outcome in advance, so that other CPUs see
682-
the load from b as having happened before the load from a. In such a
683-
case what's actually required is:
694+
the load from b as having happened before the load from a. In such a case
695+
what's actually required is:
684696

685697
q = READ_ONCE(a);
686698
if (q) {
@@ -927,9 +939,9 @@ General barriers pair with each other, though they also pair with most
927939
other types of barriers, albeit without multicopy atomicity. An acquire
928940
barrier pairs with a release barrier, but both may also pair with other
929941
barriers, including of course general barriers. A write barrier pairs
930-
with a data dependency barrier, a control dependency, an acquire barrier,
942+
with an address-dependency barrier, a control dependency, an acquire barrier,
931943
a release barrier, a read barrier, or a general barrier. Similarly a
932-
read barrier, control dependency, or a data dependency barrier pairs
944+
read barrier, control dependency, or an address-dependency barrier pairs
933945
with a write barrier, an acquire barrier, a release barrier, or a
934946
general barrier:
935947

@@ -948,7 +960,7 @@ Or:
948960
a = 1;
949961
<write barrier>
950962
WRITE_ONCE(b, &a); x = READ_ONCE(b);
951-
<data dependency barrier>
963+
<implicit address-dependency barrier>
952964
y = *x;
953965

954966
Or even:
@@ -968,8 +980,8 @@ Basically, the read barrier always has to be there, even though it can be of
968980
the "weaker" type.
969981

970982
[!] Note that the stores before the write barrier would normally be expected to
971-
match the loads after the read barrier or the data dependency barrier, and vice
972-
versa:
983+
match the loads after the read barrier or the address-dependency barrier, and
984+
vice versa:
973985

974986
CPU 1 CPU 2
975987
=================== ===================
@@ -1021,8 +1033,8 @@ STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
10211033
V
10221034

10231035

1024-
Secondly, data dependency barriers act as partial orderings on data-dependent
1025-
loads. Consider the following sequence of events:
1036+
Secondly, address-dependency barriers act as partial orderings on address-
1037+
dependent loads. Consider the following sequence of events:
10261038

10271039
CPU 1 CPU 2
10281040
======================= =======================
@@ -1067,8 +1079,8 @@ effectively random order, despite the write barrier issued by CPU 1:
10671079
In the above example, CPU 2 perceives that B is 7, despite the load of *C
10681080
(which would be B) coming after the LOAD of C.
10691081

1070-
If, however, a data dependency barrier were to be placed between the load of C
1071-
and the load of *C (ie: B) on CPU 2:
1082+
If, however, an address-dependency barrier were to be placed between the load
1083+
of C and the load of *C (ie: B) on CPU 2:
10721084

10731085
CPU 1 CPU 2
10741086
======================= =======================
@@ -1078,7 +1090,7 @@ and the load of *C (ie: B) on CPU 2:
10781090
<write barrier>
10791091
STORE C = &B LOAD X
10801092
STORE D = 4 LOAD C (gets &B)
1081-
<data dependency barrier>
1093+
<address-dependency barrier>
10821094
LOAD *C (reads B)
10831095

10841096
then the following will occur:
@@ -1101,7 +1113,7 @@ then the following will occur:
11011113
| +-------+ | |
11021114
| | X->9 |------>| |
11031115
| +-------+ | |
1104-
Makes sure all effects ---> \ ddddddddddddddddd | |
1116+
Makes sure all effects ---> \ aaaaaaaaaaaaaaaaa | |
11051117
prior to the store of C \ +-------+ | |
11061118
are perceptible to ----->| B->2 |------>| |
11071119
subsequent loads +-------+ | |
@@ -1292,7 +1304,7 @@ Which might appear as this:
12921304
LOAD with immediate effect : : +-------+
12931305

12941306

1295-
Placing a read barrier or a data dependency barrier just before the second
1307+
Placing a read barrier or an address-dependency barrier just before the second
12961308
load:
12971309

12981310
CPU 1 CPU 2
@@ -1816,20 +1828,20 @@ which may then reorder things however it wishes.
18161828
CPU MEMORY BARRIERS
18171829
-------------------
18181830

1819-
The Linux kernel has eight basic CPU memory barriers:
1831+
The Linux kernel has seven basic CPU memory barriers:
18201832

1821-
TYPE MANDATORY SMP CONDITIONAL
1822-
=============== ======================= ===========================
1823-
GENERAL mb() smp_mb()
1824-
WRITE wmb() smp_wmb()
1825-
READ rmb() smp_rmb()
1826-
DATA DEPENDENCY READ_ONCE()
1833+
TYPE MANDATORY SMP CONDITIONAL
1834+
======================= =============== ===============
1835+
GENERAL mb() smp_mb()
1836+
WRITE wmb() smp_wmb()
1837+
READ rmb() smp_rmb()
1838+
ADDRESS DEPENDENCY READ_ONCE()
18271839

18281840

1829-
All memory barriers except the data dependency barriers imply a compiler
1830-
barrier. Data dependencies do not impose any additional compiler ordering.
1841+
All memory barriers except the address-dependency barriers imply a compiler
1842+
barrier. Address dependencies do not impose any additional compiler ordering.
18311843

1832-
Aside: In the case of data dependencies, the compiler would be expected
1844+
Aside: In the case of address dependencies, the compiler would be expected
18331845
to issue the loads in the correct order (eg. `a[b]` would have to load
18341846
the value of b before loading a[b]), however there is no guarantee in
18351847
the C specification that the compiler may not speculate the value of b
@@ -2749,7 +2761,8 @@ is discarded from the CPU's cache and reloaded. To deal with this, the
27492761
appropriate part of the kernel must invalidate the overlapping bits of the
27502762
cache on each CPU.
27512763

2752-
See Documentation/core-api/cachetlb.rst for more information on cache management.
2764+
See Documentation/core-api/cachetlb.rst for more information on cache
2765+
management.
27532766

27542767

27552768
CACHE COHERENCY VS MMIO
@@ -2889,8 +2902,8 @@ AND THEN THERE'S THE ALPHA
28892902
The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that,
28902903
some versions of the Alpha CPU have a split data cache, permitting them to have
28912904
two semantically-related cache lines updated at separate times. This is where
2892-
the data dependency barrier really becomes necessary as this synchronises both
2893-
caches with the memory coherence system, thus making it seem like pointer
2905+
the address-dependency barrier really becomes necessary as this synchronises
2906+
both caches with the memory coherence system, thus making it seem like pointer
28942907
changes vs new data occur in the right order.
28952908

28962909
The Alpha defines the Linux kernel's memory model, although as of v4.15

0 commit comments

Comments
 (0)