@@ -187,9 +187,9 @@ As a further example, consider this sequence of events:
187
187
B = 4; Q = P;
188
188
P = &B; D = *Q;
189
189
190
- There is an obvious address dependency here, as the value loaded into D depends on
191
- the address retrieved from P by CPU 2. At the end of the sequence, any of the
192
- following results are possible:
190
+ There is an obvious address dependency here, as the value loaded into D depends
191
+ on the address retrieved from P by CPU 2. At the end of the sequence, any of
192
+ the following results are possible:
193
193
194
194
(Q == &A) and (D == 1)
195
195
(Q == &B) and (D == 2)
@@ -397,25 +397,25 @@ Memory barriers come in four basic varieties:
397
397
398
398
(2) Address-dependency barriers (historical).
399
399
400
- An address-dependency barrier is a weaker form of read barrier. In the case
401
- where two loads are performed such that the second depends on the result
402
- of the first (eg: the first load retrieves the address to which the second
403
- load will be directed), an address-dependency barrier would be required to
404
- make sure that the target of the second load is updated after the address
405
- obtained by the first load is accessed.
400
+ An address-dependency barrier is a weaker form of read barrier. In the
401
+ case where two loads are performed such that the second depends on the
402
+ result of the first (eg: the first load retrieves the address to which
403
+ the second load will be directed), an address-dependency barrier would
404
+ be required to make sure that the target of the second load is updated
405
+ after the address obtained by the first load is accessed.
406
406
407
- An address-dependency barrier is a partial ordering on interdependent loads
408
- only; it is not required to have any effect on stores, independent loads
409
- or overlapping loads.
407
+ An address-dependency barrier is a partial ordering on interdependent
408
+ loads only; it is not required to have any effect on stores, independent
409
+ loads or overlapping loads.
410
410
411
411
As mentioned in (1), the other CPUs in the system can be viewed as
412
412
committing sequences of stores to the memory system that the CPU being
413
- considered can then perceive. An address-dependency barrier issued by the CPU
414
- under consideration guarantees that for any load preceding it, if that
415
- load touches one of a sequence of stores from another CPU, then by the
416
- time the barrier completes, the effects of all the stores prior to that
417
- touched by the load will be perceptible to any loads issued after the address-
418
- dependency barrier.
413
+ considered can then perceive. An address-dependency barrier issued by
414
+ the CPU under consideration guarantees that for any load preceding it,
415
+ if that load touches one of a sequence of stores from another CPU, then
416
+ by the time the barrier completes, the effects of all the stores prior to
417
+ that touched by the load will be perceptible to any loads issued after
418
+ the address- dependency barrier.
419
419
420
420
See the "Examples of memory barrier sequences" subsection for diagrams
421
421
showing the ordering constraints.
@@ -437,16 +437,16 @@ Memory barriers come in four basic varieties:
437
437
438
438
(3) Read (or load) memory barriers.
439
439
440
- A read barrier is an address-dependency barrier plus a guarantee that all the
441
- LOAD operations specified before the barrier will appear to happen before
442
- all the LOAD operations specified after the barrier with respect to the
443
- other components of the system.
440
+ A read barrier is an address-dependency barrier plus a guarantee that all
441
+ the LOAD operations specified before the barrier will appear to happen
442
+ before all the LOAD operations specified after the barrier with respect to
443
+ the other components of the system.
444
444
445
445
A read barrier is a partial ordering on loads only; it is not required to
446
446
have any effect on stores.
447
447
448
- Read memory barriers imply address-dependency barriers, and so can substitute
449
- for them.
448
+ Read memory barriers imply address-dependency barriers, and so can
449
+ substitute for them.
450
450
451
451
[!] Note that read barriers should normally be paired with write barriers;
452
452
see the "SMP barrier pairing" subsection.
@@ -584,8 +584,8 @@ following sequence of events:
584
584
[!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
585
585
doesn't imply an address-dependency barrier.
586
586
587
- There's a clear address dependency here, and it would seem that by the end of the
588
- sequence, Q must be either &A or &B, and that:
587
+ There's a clear address dependency here, and it would seem that by the end of
588
+ the sequence, Q must be either &A or &B, and that:
589
589
590
590
(Q == &A) implies (D == 1)
591
591
(Q == &B) implies (D == 4)
@@ -599,8 +599,8 @@ While this may seem like a failure of coherency or causality maintenance, it
599
599
isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
600
600
Alpha).
601
601
602
- To deal with this, READ_ONCE() provides an implicit address-dependency
603
- barrier since kernel release v4.15:
602
+ To deal with this, READ_ONCE() provides an implicit address-dependency barrier
603
+ since kernel release v4.15:
604
604
605
605
CPU 1 CPU 2
606
606
=============== ===============
@@ -627,12 +627,12 @@ but the old value of the variable B (2).
627
627
628
628
629
629
An address-dependency barrier is not required to order dependent writes
630
- because the CPUs that the Linux kernel supports don't do writes
631
- until they are certain (1) that the write will actually happen, (2)
632
- of the location of the write, and (3) of the value to be written.
630
+ because the CPUs that the Linux kernel supports don't do writes until they
631
+ are certain (1) that the write will actually happen, (2) of the location of
632
+ the write, and (3) of the value to be written.
633
633
But please carefully read the "CONTROL DEPENDENCIES" section and the
634
- Documentation/RCU/rcu_dereference.rst file: The compiler can and does
635
- break dependencies in a great many highly creative ways.
634
+ Documentation/RCU/rcu_dereference.rst file: The compiler can and does break
635
+ dependencies in a great many highly creative ways.
636
636
637
637
CPU 1 CPU 2
638
638
=============== ===============
@@ -678,8 +678,8 @@ not understand them. The purpose of this section is to help you prevent
678
678
the compiler's ignorance from breaking your code.
679
679
680
680
A load-load control dependency requires a full read memory barrier, not
681
- simply an (implicit) address-dependency barrier to make it work correctly. Consider the
682
- following bit of code:
681
+ simply an (implicit) address-dependency barrier to make it work correctly.
682
+ Consider the following bit of code:
683
683
684
684
q = READ_ONCE(a);
685
685
<implicit address-dependency barrier>
@@ -691,8 +691,8 @@ following bit of code:
691
691
This will not have the desired effect because there is no actual address
692
692
dependency, but rather a control dependency that the CPU may short-circuit
693
693
by attempting to predict the outcome in advance, so that other CPUs see
694
- the load from b as having happened before the load from a. In such a
695
- case what's actually required is:
694
+ the load from b as having happened before the load from a. In such a case
695
+ what's actually required is:
696
696
697
697
q = READ_ONCE(a);
698
698
if (q) {
@@ -980,8 +980,8 @@ Basically, the read barrier always has to be there, even though it can be of
980
980
the "weaker" type.
981
981
982
982
[!] Note that the stores before the write barrier would normally be expected to
983
- match the loads after the read barrier or the address-dependency barrier, and vice
984
- versa:
983
+ match the loads after the read barrier or the address-dependency barrier, and
984
+ vice versa:
985
985
986
986
CPU 1 CPU 2
987
987
=================== ===================
@@ -1033,8 +1033,8 @@ STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
1033
1033
V
1034
1034
1035
1035
1036
- Secondly, address-dependency barriers act as partial orderings on address-dependent
1037
- loads. Consider the following sequence of events:
1036
+ Secondly, address-dependency barriers act as partial orderings on address-
1037
+ dependent loads. Consider the following sequence of events:
1038
1038
1039
1039
CPU 1 CPU 2
1040
1040
======================= =======================
@@ -1079,8 +1079,8 @@ effectively random order, despite the write barrier issued by CPU 1:
1079
1079
In the above example, CPU 2 perceives that B is 7, despite the load of *C
1080
1080
(which would be B) coming after the LOAD of C.
1081
1081
1082
- If, however, an address-dependency barrier were to be placed between the load of C
1083
- and the load of *C (ie: B) on CPU 2:
1082
+ If, however, an address-dependency barrier were to be placed between the load
1083
+ of C and the load of *C (ie: B) on CPU 2:
1084
1084
1085
1085
CPU 1 CPU 2
1086
1086
======================= =======================
@@ -2761,7 +2761,8 @@ is discarded from the CPU's cache and reloaded. To deal with this, the
2761
2761
appropriate part of the kernel must invalidate the overlapping bits of the
2762
2762
cache on each CPU.
2763
2763
2764
- See Documentation/core-api/cachetlb.rst for more information on cache management.
2764
+ See Documentation/core-api/cachetlb.rst for more information on cache
2765
+ management.
2765
2766
2766
2767
2767
2768
CACHE COHERENCY VS MMIO
@@ -2901,8 +2902,8 @@ AND THEN THERE'S THE ALPHA
2901
2902
The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that,
2902
2903
some versions of the Alpha CPU have a split data cache, permitting them to have
2903
2904
two semantically-related cache lines updated at separate times. This is where
2904
- the address-dependency barrier really becomes necessary as this synchronises both
2905
- caches with the memory coherence system, thus making it seem like pointer
2905
+ the address-dependency barrier really becomes necessary as this synchronises
2906
+ both caches with the memory coherence system, thus making it seem like pointer
2906
2907
changes vs new data occur in the right order.
2907
2908
2908
2909
The Alpha defines the Linux kernel's memory model, although as of v4.15
0 commit comments