@@ -553,12 +553,12 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
553
553
DATA DEPENDENCY BARRIERS (HISTORICAL)
554
554
-------------------------------------
555
555
556
- As of v4.15 of the Linux kernel, an smp_read_barrier_depends () was
557
- added to READ_ONCE() , which means that about the only people who
558
- need to pay attention to this section are those working on DEC Alpha
559
- architecture-specific code and those working on READ_ONCE() itself.
560
- For those who need it, and for those who are interested in the history,
561
- here is the story of data-dependency barriers.
556
+ As of v4.15 of the Linux kernel, an smp_mb () was added to READ_ONCE() for
557
+ DEC Alpha , which means that about the only people who need to pay attention
558
+ to this section are those working on DEC Alpha architecture-specific code
559
+ and those working on READ_ONCE() itself. For those who need it, and for
560
+ those who are interested in the history, here is the story of
561
+ data-dependency barriers.
562
562
563
563
The usage requirements of data dependency barriers are a little subtle, and
564
564
it's not always obvious that they're needed. To illustrate, consider the
@@ -2708,144 +2708,6 @@ the properties of the memory window through which devices are accessed and/or
2708
2708
the use of any special device communication instructions the CPU may have.
2709
2709
2710
2710
2711
- CACHE COHERENCY
2712
- ---------------
2713
-
2714
- Life isn't quite as simple as it may appear above, however: for while the
2715
- caches are expected to be coherent, there's no guarantee that that coherency
2716
- will be ordered. This means that while changes made on one CPU will
2717
- eventually become visible on all CPUs, there's no guarantee that they will
2718
- become apparent in the same order on those other CPUs.
2719
-
2720
-
2721
- Consider dealing with a system that has a pair of CPUs (1 & 2), each of which
2722
- has a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D):
2723
-
2724
- :
2725
- : +--------+
2726
- : +---------+ | |
2727
- +--------+ : +--->| Cache A |<------->| |
2728
- | | : | +---------+ | |
2729
- | CPU 1 |<---+ | |
2730
- | | : | +---------+ | |
2731
- +--------+ : +--->| Cache B |<------->| |
2732
- : +---------+ | |
2733
- : | Memory |
2734
- : +---------+ | System |
2735
- +--------+ : +--->| Cache C |<------->| |
2736
- | | : | +---------+ | |
2737
- | CPU 2 |<---+ | |
2738
- | | : | +---------+ | |
2739
- +--------+ : +--->| Cache D |<------->| |
2740
- : +---------+ | |
2741
- : +--------+
2742
- :
2743
-
2744
- Imagine the system has the following properties:
2745
-
2746
- (*) an odd-numbered cache line may be in cache A, cache C or it may still be
2747
- resident in memory;
2748
-
2749
- (*) an even-numbered cache line may be in cache B, cache D or it may still be
2750
- resident in memory;
2751
-
2752
- (*) while the CPU core is interrogating one cache, the other cache may be
2753
- making use of the bus to access the rest of the system - perhaps to
2754
- displace a dirty cacheline or to do a speculative load;
2755
-
2756
- (*) each cache has a queue of operations that need to be applied to that cache
2757
- to maintain coherency with the rest of the system;
2758
-
2759
- (*) the coherency queue is not flushed by normal loads to lines already
2760
- present in the cache, even though the contents of the queue may
2761
- potentially affect those loads.
2762
-
2763
- Imagine, then, that two writes are made on the first CPU, with a write barrier
2764
- between them to guarantee that they will appear to reach that CPU's caches in
2765
- the requisite order:
2766
-
2767
- CPU 1 CPU 2 COMMENT
2768
- =============== =============== =======================================
2769
- u == 0, v == 1 and p == &u, q == &u
2770
- v = 2;
2771
- smp_wmb(); Make sure change to v is visible before
2772
- change to p
2773
- <A:modify v=2> v is now in cache A exclusively
2774
- p = &v;
2775
- <B:modify p=&v> p is now in cache B exclusively
2776
-
2777
- The write memory barrier forces the other CPUs in the system to perceive that
2778
- the local CPU's caches have apparently been updated in the correct order. But
2779
- now imagine that the second CPU wants to read those values:
2780
-
2781
- CPU 1 CPU 2 COMMENT
2782
- =============== =============== =======================================
2783
- ...
2784
- q = p;
2785
- x = *q;
2786
-
2787
- The above pair of reads may then fail to happen in the expected order, as the
2788
- cacheline holding p may get updated in one of the second CPU's caches while
2789
- the update to the cacheline holding v is delayed in the other of the second
2790
- CPU's caches by some other cache event:
2791
-
2792
- CPU 1 CPU 2 COMMENT
2793
- =============== =============== =======================================
2794
- u == 0, v == 1 and p == &u, q == &u
2795
- v = 2;
2796
- smp_wmb();
2797
- <A:modify v=2> <C:busy>
2798
- <C:queue v=2>
2799
- p = &v; q = p;
2800
- <D:request p>
2801
- <B:modify p=&v> <D:commit p=&v>
2802
- <D:read p>
2803
- x = *q;
2804
- <C:read *q> Reads from v before v updated in cache
2805
- <C:unbusy>
2806
- <C:commit v=2>
2807
-
2808
- Basically, while both cachelines will be updated on CPU 2 eventually, there's
2809
- no guarantee that, without intervention, the order of update will be the same
2810
- as that committed on CPU 1.
2811
-
2812
-
2813
- To intervene, we need to interpolate a data dependency barrier or a read
2814
- barrier between the loads (which as of v4.15 is supplied unconditionally
2815
- by the READ_ONCE() macro). This will force the cache to commit its
2816
- coherency queue before processing any further requests:
2817
-
2818
- CPU 1 CPU 2 COMMENT
2819
- =============== =============== =======================================
2820
- u == 0, v == 1 and p == &u, q == &u
2821
- v = 2;
2822
- smp_wmb();
2823
- <A:modify v=2> <C:busy>
2824
- <C:queue v=2>
2825
- p = &v; q = p;
2826
- <D:request p>
2827
- <B:modify p=&v> <D:commit p=&v>
2828
- <D:read p>
2829
- smp_read_barrier_depends()
2830
- <C:unbusy>
2831
- <C:commit v=2>
2832
- x = *q;
2833
- <C:read *q> Reads from v after v updated in cache
2834
-
2835
-
2836
- This sort of problem can be encountered on DEC Alpha processors as they have a
2837
- split cache that improves performance by making better use of the data bus.
2838
- While most CPUs do imply a data dependency barrier on the read when a memory
2839
- access depends on a read, not all do, so it may not be relied on.
2840
-
2841
- Other CPUs may also have split caches, but must coordinate between the various
2842
- cachelets for normal memory accesses. The semantics of the Alpha removes the
2843
- need for hardware coordination in the absence of memory barriers, which
2844
- permitted Alpha to sport higher CPU clock rates back in the day. However,
2845
- please note that (again, as of v4.15) smp_read_barrier_depends() should not
2846
- be used except in Alpha arch-specific code and within the READ_ONCE() macro.
2847
-
2848
-
2849
2711
CACHE COHERENCY VS DMA
2850
2712
----------------------
2851
2713
@@ -3009,10 +2871,8 @@ caches with the memory coherence system, thus making it seem like pointer
3009
2871
changes vs new data occur in the right order.
3010
2872
3011
2873
The Alpha defines the Linux kernel's memory model, although as of v4.15
3012
- the Linux kernel's addition of smp_read_barrier_depends() to READ_ONCE()
3013
- greatly reduced Alpha's impact on the memory model.
3014
-
3015
- See the subsection on "Cache Coherency" above.
2874
+ the Linux kernel's addition of smp_mb() to READ_ONCE() on Alpha greatly
2875
+ reduced its impact on the memory model.
3016
2876
3017
2877
3018
2878
VIRTUAL MACHINE GUESTS
0 commit comments