Skip to content

Commit 203185f

Browse files
akiykspaulmckrcu
authored andcommitted
docs/memory-barriers.txt: Fix confusing name of 'data dependency barrier'
The term "data dependency barrier", which has been in memory-barriers.txt ever since it was first authored by David Howells, has become confusing due to the fact that in LKMM's explanations.txt and elsewhere, "data dependency" is used mostly for load-to-store data dependency. To prevent further confusions, do the changes listed below: - substitute "data dependency barrier" with "address-dependency barrier"; - add note on the removal of kernel APIs for explicit address- dependency barriers in kernel release v5.9; - note that address-dependency barriers are not necessary for load-to-store situations; - use READ_ONCE_OLD() for pre-4.15 READ_ONCE() (no implicit address- dependency barrier); - fix count of kernel memory barrier APIs; - and a few more context adjustments. Note: Cleanups of long lines are deferred to a followup patch. Reported-by: "Michael S. Tsirkin" <[email protected]> Link: https://lore.kernel.org/r/[email protected]/ Signed-off-by: Akira Yokosawa <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Alan Stern <[email protected]> Cc: Will Deacon <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrea Parri <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: David Howells <[email protected]> Cc: Daniel Lustig <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
1 parent 568035b commit 203185f

File tree

1 file changed

+64
-52
lines changed

1 file changed

+64
-52
lines changed

Documentation/memory-barriers.txt

Lines changed: 64 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ CONTENTS
5252

5353
- Varieties of memory barrier.
5454
- What may not be assumed about memory barriers?
55-
- Data dependency barriers (historical).
55+
- Address-dependency barriers (historical).
5656
- Control dependencies.
5757
- SMP barrier pairing.
5858
- Examples of memory barrier sequences.
@@ -187,7 +187,7 @@ As a further example, consider this sequence of events:
187187
B = 4; Q = P;
188188
P = &B; D = *Q;
189189

190-
There is an obvious data dependency here, as the value loaded into D depends on
190+
There is an obvious address dependency here, as the value loaded into D depends on
191191
the address retrieved from P by CPU 2. At the end of the sequence, any of the
192192
following results are possible:
193193

@@ -391,57 +391,61 @@ Memory barriers come in four basic varieties:
391391
memory system as time progresses. All stores _before_ a write barrier
392392
will occur _before_ all the stores after the write barrier.
393393

394-
[!] Note that write barriers should normally be paired with read or data
395-
dependency barriers; see the "SMP barrier pairing" subsection.
394+
[!] Note that write barriers should normally be paired with read or
395+
address-dependency barriers; see the "SMP barrier pairing" subsection.
396396

397397

398-
(2) Data dependency barriers.
398+
(2) Address-dependency barriers (historical).
399399

400-
A data dependency barrier is a weaker form of read barrier. In the case
400+
An address-dependency barrier is a weaker form of read barrier. In the case
401401
where two loads are performed such that the second depends on the result
402402
of the first (eg: the first load retrieves the address to which the second
403-
load will be directed), a data dependency barrier would be required to
403+
load will be directed), an address-dependency barrier would be required to
404404
make sure that the target of the second load is updated after the address
405405
obtained by the first load is accessed.
406406

407-
A data dependency barrier is a partial ordering on interdependent loads
407+
An address-dependency barrier is a partial ordering on interdependent loads
408408
only; it is not required to have any effect on stores, independent loads
409409
or overlapping loads.
410410

411411
As mentioned in (1), the other CPUs in the system can be viewed as
412412
committing sequences of stores to the memory system that the CPU being
413-
considered can then perceive. A data dependency barrier issued by the CPU
413+
considered can then perceive. An address-dependency barrier issued by the CPU
414414
under consideration guarantees that for any load preceding it, if that
415415
load touches one of a sequence of stores from another CPU, then by the
416416
time the barrier completes, the effects of all the stores prior to that
417-
touched by the load will be perceptible to any loads issued after the data
417+
touched by the load will be perceptible to any loads issued after the address-
418418
dependency barrier.
419419

420420
See the "Examples of memory barrier sequences" subsection for diagrams
421421
showing the ordering constraints.
422422

423-
[!] Note that the first load really has to have a _data_ dependency and
423+
[!] Note that the first load really has to have an _address_ dependency and
424424
not a control dependency. If the address for the second load is dependent
425425
on the first load, but the dependency is through a conditional rather than
426426
actually loading the address itself, then it's a _control_ dependency and
427427
a full read barrier or better is required. See the "Control dependencies"
428428
subsection for more information.
429429

430-
[!] Note that data dependency barriers should normally be paired with
430+
[!] Note that address-dependency barriers should normally be paired with
431431
write barriers; see the "SMP barrier pairing" subsection.
432432

433+
[!] Kernel release v5.9 removed kernel APIs for explicit address-
434+
dependency barriers. Nowadays, APIs for marking loads from shared
435+
variables such as READ_ONCE() and rcu_dereference() provide implicit
436+
address-dependency barriers.
433437

434438
(3) Read (or load) memory barriers.
435439

436-
A read barrier is a data dependency barrier plus a guarantee that all the
440+
A read barrier is an address-dependency barrier plus a guarantee that all the
437441
LOAD operations specified before the barrier will appear to happen before
438442
all the LOAD operations specified after the barrier with respect to the
439443
other components of the system.
440444

441445
A read barrier is a partial ordering on loads only; it is not required to
442446
have any effect on stores.
443447

444-
Read memory barriers imply data dependency barriers, and so can substitute
448+
Read memory barriers imply address-dependency barriers, and so can substitute
445449
for them.
446450

447451
[!] Note that read barriers should normally be paired with write barriers;
@@ -550,17 +554,21 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
550554
Documentation/core-api/dma-api.rst
551555

552556

553-
DATA DEPENDENCY BARRIERS (HISTORICAL)
554-
-------------------------------------
557+
ADDRESS-DEPENDENCY BARRIERS (HISTORICAL)
558+
----------------------------------------
555559

556560
As of v4.15 of the Linux kernel, an smp_mb() was added to READ_ONCE() for
557561
DEC Alpha, which means that about the only people who need to pay attention
558562
to this section are those working on DEC Alpha architecture-specific code
559563
and those working on READ_ONCE() itself. For those who need it, and for
560564
those who are interested in the history, here is the story of
561-
data-dependency barriers.
565+
address-dependency barriers.
566+
567+
[!] While address dependencies are observed in both load-to-load and
568+
load-to-store relations, address-dependency barriers are not necessary
569+
for load-to-store situations.
562570

563-
The usage requirements of data dependency barriers are a little subtle, and
571+
The requirement of address-dependency barriers is a little subtle, and
564572
it's not always obvious that they're needed. To illustrate, consider the
565573
following sequence of events:
566574

@@ -570,10 +578,13 @@ following sequence of events:
570578
B = 4;
571579
<write barrier>
572580
WRITE_ONCE(P, &B);
573-
Q = READ_ONCE(P);
581+
Q = READ_ONCE_OLD(P);
574582
D = *Q;
575583

576-
There's a clear data dependency here, and it would seem that by the end of the
584+
[!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
585+
doesn't imply an address-dependency barrier.
586+
587+
There's a clear address dependency here, and it would seem that by the end of the
577588
sequence, Q must be either &A or &B, and that:
578589

579590
(Q == &A) implies (D == 1)
@@ -588,8 +599,8 @@ While this may seem like a failure of coherency or causality maintenance, it
588599
isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
589600
Alpha).
590601

591-
To deal with this, a data dependency barrier or better must be inserted
592-
between the address load and the data load:
602+
To deal with this, READ_ONCE() provides an implicit address-dependency
603+
barrier since kernel release v4.15:
593604

594605
CPU 1 CPU 2
595606
=============== ===============
@@ -598,7 +609,7 @@ between the address load and the data load:
598609
<write barrier>
599610
WRITE_ONCE(P, &B);
600611
Q = READ_ONCE(P);
601-
<data dependency barrier>
612+
<implicit address-dependency barrier>
602613
D = *Q;
603614

604615
This enforces the occurrence of one of the two implications, and prevents the
@@ -615,7 +626,7 @@ odd-numbered bank is idle, one can see the new value of the pointer P (&B),
615626
but the old value of the variable B (2).
616627

617628

618-
A data-dependency barrier is not required to order dependent writes
629+
An address-dependency barrier is not required to order dependent writes
619630
because the CPUs that the Linux kernel supports don't do writes
620631
until they are certain (1) that the write will actually happen, (2)
621632
of the location of the write, and (3) of the value to be written.
@@ -629,12 +640,12 @@ break dependencies in a great many highly creative ways.
629640
B = 4;
630641
<write barrier>
631642
WRITE_ONCE(P, &B);
632-
Q = READ_ONCE(P);
643+
Q = READ_ONCE_OLD(P);
633644
WRITE_ONCE(*Q, 5);
634645

635-
Therefore, no data-dependency barrier is required to order the read into
646+
Therefore, no address-dependency barrier is required to order the read into
636647
Q with the store into *Q. In other words, this outcome is prohibited,
637-
even without a data-dependency barrier:
648+
even without an implicit address-dependency barrier of modern READ_ONCE():
638649

639650
(Q == &B) && (B == 4)
640651

@@ -645,12 +656,12 @@ can be used to record rare error conditions and the like, and the CPUs'
645656
naturally occurring ordering prevents such records from being lost.
646657

647658

648-
Note well that the ordering provided by a data dependency is local to
659+
Note well that the ordering provided by an address dependency is local to
649660
the CPU containing it. See the section on "Multicopy atomicity" for
650661
more information.
651662

652663

653-
The data dependency barrier is very important to the RCU system,
664+
The address-dependency barrier is very important to the RCU system,
654665
for example. See rcu_assign_pointer() and rcu_dereference() in
655666
include/linux/rcupdate.h. This permits the current target of an RCU'd
656667
pointer to be replaced with a new modified target, without the replacement
@@ -667,16 +678,17 @@ not understand them. The purpose of this section is to help you prevent
667678
the compiler's ignorance from breaking your code.
668679

669680
A load-load control dependency requires a full read memory barrier, not
670-
simply a data dependency barrier to make it work correctly. Consider the
681+
simply an (implicit) address-dependency barrier to make it work correctly. Consider the
671682
following bit of code:
672683

673684
q = READ_ONCE(a);
685+
<implicit address-dependency barrier>
674686
if (q) {
675-
<data dependency barrier> /* BUG: No data dependency!!! */
687+
/* BUG: No address dependency!!! */
676688
p = READ_ONCE(b);
677689
}
678690

679-
This will not have the desired effect because there is no actual data
691+
This will not have the desired effect because there is no actual address
680692
dependency, but rather a control dependency that the CPU may short-circuit
681693
by attempting to predict the outcome in advance, so that other CPUs see
682694
the load from b as having happened before the load from a. In such a
@@ -927,9 +939,9 @@ General barriers pair with each other, though they also pair with most
927939
other types of barriers, albeit without multicopy atomicity. An acquire
928940
barrier pairs with a release barrier, but both may also pair with other
929941
barriers, including of course general barriers. A write barrier pairs
930-
with a data dependency barrier, a control dependency, an acquire barrier,
942+
with an address-dependency barrier, a control dependency, an acquire barrier,
931943
a release barrier, a read barrier, or a general barrier. Similarly a
932-
read barrier, control dependency, or a data dependency barrier pairs
944+
read barrier, control dependency, or an address-dependency barrier pairs
933945
with a write barrier, an acquire barrier, a release barrier, or a
934946
general barrier:
935947

@@ -948,7 +960,7 @@ Or:
948960
a = 1;
949961
<write barrier>
950962
WRITE_ONCE(b, &a); x = READ_ONCE(b);
951-
<data dependency barrier>
963+
<implicit address-dependency barrier>
952964
y = *x;
953965

954966
Or even:
@@ -968,7 +980,7 @@ Basically, the read barrier always has to be there, even though it can be of
968980
the "weaker" type.
969981

970982
[!] Note that the stores before the write barrier would normally be expected to
971-
match the loads after the read barrier or the data dependency barrier, and vice
983+
match the loads after the read barrier or the address-dependency barrier, and vice
972984
versa:
973985

974986
CPU 1 CPU 2
@@ -1021,7 +1033,7 @@ STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
10211033
V
10221034

10231035

1024-
Secondly, data dependency barriers act as partial orderings on data-dependent
1036+
Secondly, address-dependency barriers act as partial orderings on address-dependent
10251037
loads. Consider the following sequence of events:
10261038

10271039
CPU 1 CPU 2
@@ -1067,7 +1079,7 @@ effectively random order, despite the write barrier issued by CPU 1:
10671079
In the above example, CPU 2 perceives that B is 7, despite the load of *C
10681080
(which would be B) coming after the LOAD of C.
10691081

1070-
If, however, a data dependency barrier were to be placed between the load of C
1082+
If, however, an address-dependency barrier were to be placed between the load of C
10711083
and the load of *C (ie: B) on CPU 2:
10721084

10731085
CPU 1 CPU 2
@@ -1078,7 +1090,7 @@ and the load of *C (ie: B) on CPU 2:
10781090
<write barrier>
10791091
STORE C = &B LOAD X
10801092
STORE D = 4 LOAD C (gets &B)
1081-
<data dependency barrier>
1093+
<address-dependency barrier>
10821094
LOAD *C (reads B)
10831095

10841096
then the following will occur:
@@ -1101,7 +1113,7 @@ then the following will occur:
11011113
| +-------+ | |
11021114
| | X->9 |------>| |
11031115
| +-------+ | |
1104-
Makes sure all effects ---> \ ddddddddddddddddd | |
1116+
Makes sure all effects ---> \ aaaaaaaaaaaaaaaaa | |
11051117
prior to the store of C \ +-------+ | |
11061118
are perceptible to ----->| B->2 |------>| |
11071119
subsequent loads +-------+ | |
@@ -1292,7 +1304,7 @@ Which might appear as this:
12921304
LOAD with immediate effect : : +-------+
12931305

12941306

1295-
Placing a read barrier or a data dependency barrier just before the second
1307+
Placing a read barrier or an address-dependency barrier just before the second
12961308
load:
12971309

12981310
CPU 1 CPU 2
@@ -1816,20 +1828,20 @@ which may then reorder things however it wishes.
18161828
CPU MEMORY BARRIERS
18171829
-------------------
18181830

1819-
The Linux kernel has eight basic CPU memory barriers:
1831+
The Linux kernel has seven basic CPU memory barriers:
18201832

1821-
TYPE MANDATORY SMP CONDITIONAL
1822-
=============== ======================= ===========================
1823-
GENERAL mb() smp_mb()
1824-
WRITE wmb() smp_wmb()
1825-
READ rmb() smp_rmb()
1826-
DATA DEPENDENCY READ_ONCE()
1833+
TYPE MANDATORY SMP CONDITIONAL
1834+
======================= =============== ===============
1835+
GENERAL mb() smp_mb()
1836+
WRITE wmb() smp_wmb()
1837+
READ rmb() smp_rmb()
1838+
ADDRESS DEPENDENCY READ_ONCE()
18271839

18281840

1829-
All memory barriers except the data dependency barriers imply a compiler
1830-
barrier. Data dependencies do not impose any additional compiler ordering.
1841+
All memory barriers except the address-dependency barriers imply a compiler
1842+
barrier. Address dependencies do not impose any additional compiler ordering.
18311843

1832-
Aside: In the case of data dependencies, the compiler would be expected
1844+
Aside: In the case of address dependencies, the compiler would be expected
18331845
to issue the loads in the correct order (eg. `a[b]` would have to load
18341846
the value of b before loading a[b]), however there is no guarantee in
18351847
the C specification that the compiler may not speculate the value of b
@@ -2889,7 +2901,7 @@ AND THEN THERE'S THE ALPHA
28892901
The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that,
28902902
some versions of the Alpha CPU have a split data cache, permitting them to have
28912903
two semantically-related cache lines updated at separate times. This is where
2892-
the data dependency barrier really becomes necessary as this synchronises both
2904+
the address-dependency barrier really becomes necessary as this synchronises both
28932905
caches with the memory coherence system, thus making it seem like pointer
28942906
changes vs new data occur in the right order.
28952907

0 commit comments

Comments
 (0)