Skip to content

Commit 227b09d

Browse files
committed
Document false sharing
1 parent 9bce842 commit 227b09d

File tree

1 file changed

+126
-0
lines changed

1 file changed

+126
-0
lines changed

README.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ is distributed under the [ISC license](LICENSE.md).
8888
- [Understanding transactions](#understanding-transactions)
8989
- [A three-stack lock-free queue](#a-three-stack-lock-free-queue)
9090
- [A rehashable lock-free hash table](#a-rehashable-lock-free-hash-table)
91+
- [Avoid false sharing](#avoid-false-sharing)
9192
- [Beware of torn reads](#beware-of-torn-reads)
9293

9394
## A quick tour
@@ -1984,6 +1985,131 @@ What we have here is a lock-free hash table with rehashing that should not be
19841985
highly prone to starvation. In other respects this is a fairly naive hash table
19851986
implementation. You might want to think about various ways to improve upon it.
19861987

1988+
### Avoid false sharing
1989+
1990+
[False sharing](https://en.wikipedia.org/wiki/False_sharing) is a form of
1991+
contention that arises when some location, that is being written to by at least
1992+
a single core, happens to be in memory next to — within the same cache
1993+
line aligned region of memory — another location that is accessed, read or
1994+
written, by other cores.
1995+
1996+
Perhaps contrary to how it is often described, false sharing doesn't require the
1997+
use of atomic variables or atomic instructions. Consider the following example:
1998+
1999+
```ocaml
2000+
# type state = { mutable counter : int; mutable finished: bool }
2001+
type state = { mutable counter : int; mutable finished : bool; }
2002+
2003+
# let state = { counter = 1_000; finished = false }
2004+
val state : state = {counter = 1000; finished = false}
2005+
2006+
# let reader = Domain.spawn @@ fun () ->
2007+
while not state.finished do
2008+
Domain.cpu_relax ()
2009+
done
2010+
val reader : unit Domain.t = <abstr>
2011+
2012+
# while 0 < state.counter do
2013+
state.counter <- state.counter - 1
2014+
done;
2015+
- : unit = ()
2016+
2017+
# state.finished <- true;
2018+
- : unit = ()
2019+
2020+
# Domain.join reader
2021+
- : unit = ()
2022+
```
2023+
2024+
The `state` is a record with two fields, `counter` and `finished`, next to each
2025+
other, which makes it rather likely for them to happen to reside in the same
2026+
cache line aligned region of memory. The main domain repeatedly mutates the
2027+
`counter` field and the other domain repeatedly reads the `finished` field. What
2028+
this means in practice is that the reads of the `finished` field by the other
2029+
domain will be very expensive, because the cache is repeatedly invalidated by
2030+
the `counter` updates done by the main domain.
2031+
2032+
The above example is contrived, of course, but this sort of false sharing can
2033+
happen very easily. Cache lines are typically relatively large &mdash; 8, 16, or
2034+
even 32 words wide. Typically many, if not most, heap allocated objects in OCaml
2035+
are smaller than a cache line, which means that false sharing may easily happen
2036+
even between seemingly unrelated objects.
2037+
2038+
To completely avoid false sharing one would basically need to make sure that
2039+
mutable locations (atomic or otherwise) are not allocated next to locations that
2040+
might be accessed from other domains. Unfortunately, that is difficult to
2041+
achieve without being expensive in itself as it tends to increase memory usage
2042+
and the amount of initializing stores.
2043+
2044+
The
2045+
[`Loc.make`](https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Loc/index.html#val-make)
2046+
function takes an optional `padded` argument, which can be explicitly specified
2047+
as `~padded:true` to request the location to be allocated in a way to avoid
2048+
false sharing. Using `~padded:true` on long lived shared memory locations that
2049+
are being repeatedly modified can improve performance significantly. Using
2050+
`~padded:true` on short lived shared memory locations is not recommended.
2051+
2052+
Using `~padded:true` does not eliminate all false sharing, however. Consider the
2053+
following sketch of a queue data structure:
2054+
2055+
```ocaml
2056+
type 'a queue = {
2057+
head: 'a list Loc.t;
2058+
tail: 'a list Loc.t
2059+
}
2060+
```
2061+
2062+
Even if you allocate the locations with padding
2063+
2064+
```ocaml
2065+
# let queue () = {
2066+
head = Loc.make ~padded:true [];
2067+
tail = Loc.make ~padded:true []
2068+
}
2069+
val queue : unit -> 'a queue = <fun>
2070+
```
2071+
2072+
the queue record will still be vulnerable to the same kind of false sharing as
2073+
in the earlier example:
2074+
2075+
```ocaml
2076+
# let a_queue : int queue = queue ()
2077+
val a_queue : int queue = {head = <abstr>; tail = <abstr>}
2078+
2079+
# let counter = ref 1_000
2080+
val counter : int ref = {contents = 1000}
2081+
```
2082+
2083+
Above the reference cell for the `counter` might exhibit false sharing with the
2084+
queue record (which is read-only) and significantly degrade the performance of
2085+
the queue for passing messages between domains.
2086+
2087+
To avoid the above kind of problems, a strategic approach is to also allocate
2088+
the queue record in a way to avoid false sharing. Unfortunately OCaml does not
2089+
currently provide a standard way to do so. The
2090+
[multicore-magic](https://github.com/ocaml-multicore/multicore-magic) library
2091+
provides a function
2092+
[`copy_as_padded`](https://ocaml-multicore.github.io/multicore-magic/doc/multicore-magic/Multicore_magic/index.html#val-copy_as_padded)
2093+
for the purpose. Using
2094+
[`copy_as_padded`](https://ocaml-multicore.github.io/multicore-magic/doc/multicore-magic/Multicore_magic/index.html#val-copy_as_padded)
2095+
one would write
2096+
2097+
```ocaml
2098+
# let queue () =
2099+
Multicore_magic.copy_as_padded {
2100+
head = Loc.make ~padded:true [];
2101+
tail = Loc.make ~padded:true []
2102+
}
2103+
val queue : unit -> 'a queue = <fun>
2104+
```
2105+
2106+
to allocate the queue record in a way to avoid false sharing.
2107+
2108+
Note that allocating long lived data structures, like queues, used for inter
2109+
domain communication in the way as described above to avoid false sharing does
2110+
not eliminate all false sharing, but it is likely to reduce false sharing
2111+
significantly with relatively low effort.
2112+
19872113
### Beware of torn reads
19882114

19892115
The algorithm underlying **kcas** ensures that it is not possible to read

0 commit comments

Comments
 (0)