You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/memory_consistency/_index.md
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,19 +1,21 @@
1
1
---
2
-
title: Explore The Arm Memory Model & Thread Synchronization
2
+
title: Explore Thread Synchronization in the Arm Memory Model
3
3
4
4
minutes_to_complete: 150
5
5
6
-
who_is_this_for: This is an advanced topic for engineers looking for a practical way to test different thread synchronization approaches within the context of the Arm memory model.
6
+
who_is_this_for: This is an advanced topic for developers seeking practical ways to test thread synchronization approaches in the Arm memory model.
7
7
8
8
learning_objectives:
9
-
- Test snippets of thread synchronization assembly against the formal definition of the Arm Memory Model
10
-
- Test snippets of thread synchronization assembly on Arm hardware to compare against the formal Arm Memory Model
9
+
- Test thread synchronization assembly snippets against the formal definition of the Arm Memory Model.
10
+
- Test thread synchronization assembly snippets on Arm hardware.
11
+
- Compare the results of different thread synchronization approaches.
11
12
12
13
prerequisites:
13
-
- An understanding of different memory consistency models like Sequential Consistency, Weak Ordering, Relaxed Consistency, Processor Consistency, etc.
14
+
- An understanding of memory consistency models (such as Sequential Consistency, Weak Ordering, Relaxed Consistency, and Processor Consistency).
14
15
- An understanding of thread synchronization.
15
-
- An understanding of Arm assembly, general-purpose registers, and how to find information on Arm assembly instructions.
16
-
- An understanding of memory barriers including acquire-release semantics.
16
+
- Familiarity with Arm Assembly Language, and the ability to find relevant information on Arm assembly instructions.
17
+
- Familiarity with general-purpose registers.
18
+
- Familiarity with memory barriers, including Acquire-Release Semantics.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/memory_consistency/arm_mem_model.md
+27-24Lines changed: 27 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,62 +4,65 @@ weight: 2
4
4
layout: "learningpathall"
5
5
---
6
6
7
-
## CPU Memory Model vs Language/Runtime Memory Models
7
+
## The Memory Consistency Model
8
8
9
-
Majority of developers will not need to be deeply familiar with the memory consistency model of the CPU their code will execute on. This is because programming languages and runtime engines abstract away the CPUs memory model by presenting the programmer with a language/runtime memory model. This abstraction is achieved by providing developers with a set of language/runtime specific memory ordering rules, synchronization constructs, and supporting libraries. As long as the developer uses these correctly, language compilers and runtime engines will make sure the code executes correctly on any CPU regardless of how strong or weakly ordered it is.
9
+
Most developers don't need deep knowledge of a CPU's memory consistency model. Programming languages and runtimes abstract the CPU’s model by providing their own memory ordering rules, synchronization constructs, and libraries. As long as the developer uses these correctly, compilers and runtime engines ensure that the code executes correctly on any CPU - whether its memory ordering is strong or weak.
10
10
11
-
That said, developers may want to dig deeper into this topic for various reasons including:
11
+
That said, developers might want to dig deeper into this topic for various reasons including:
12
12
13
-
-Extract more performance from weaklyordered CPUs (like Arm CPUs).
14
-
-Compilers and runtimes will do a good job of maximizing performance. Only in well understood niche cases will there be a potential for performance gain by going beyond what the compiler/runtime would do to higher level code. For most cases, all it takes to get more performance is to use the latest compilers, compiler switches, and runtimes.
15
-
-Develop confidence in the correctness of synchronization constructs.
16
-
-Develop an understanding of how compilers and runtimes select different machine instructions while still honoring the memory ordering rules of the language/runtime and CPU.
13
+
-Extracting additional performance from weakly-ordered CPUs, such as Arm CPUs:
14
+
-Although compilers and runtimes typically do a good job of maximizing performance, manual tuning in well understood niche cases might provide further improvements. In most cases, all it takes to improve performance is using latest compilers, compiler switches, and runtimes.
15
+
-Gaining confidence in the correctness of synchronization constructs.
16
+
-Understanding how compilers and runtimes select machine instructions while still honoring the memory ordering rules.
17
17
18
-
In this Learning Path, you will use publicly available tools to explore thread synchronization on Arm CPUs. You will gain enough working knowledge of the tools to be able to explore thread synchronization concepts. At the end of this Learning Path, you will be able access more information to get a deeper understanding of this subject.
18
+
In this Learning Path, you will use publicly available tools to explore thread synchronization on Arm CPUs. You will gain enough working knowledge of these tools to explore thread synchronization concepts.
19
+
20
+
{{% notice Learning Tip%}}
21
+
At the end of this Learning Path, you can find details of further resources that you can consult to gain a deeper understanding of this subject.{{% /notice %}}
19
22
20
23
## The Formal Definition of the Arm Memory Model
21
24
22
-
The formal definition of the Arm memory model is in the [Arm Architecture Reference Manual for A-profile architecture](https://developer.arm.com/documentation/ddi0487/la) (Arm ARM) under the section called `The AArch64 Application Level Memory Model`. The ordering requirements defined in the Arm ARM is a transliteration of the `aarch64.cat` file hosted on the Arm [herd7 simulator tool](https://developer.arm.com/herd7). In fact, this`aarch64.cat` file is the authoritative definition of the Arm memory model. As it is a formal definition, it is complex.
25
+
The formal definition of the Arm memory model is described in the [Arm Architecture Reference Manual for A-profile architecture](https://developer.arm.com/documentation/ddi0487/la) (often referred to as "the Arm ARM") under the section called `The AArch64 Application Level Memory Model`. The ordering requirements defined in the Arm ARM is a transliteration of the `aarch64.cat` file hosted on the Arm [herd7 simulator tool](https://developer.arm.com/herd7). In fact, the`aarch64.cat` file is the authoritative definition of the Arm memory model. As a formal definition, it is inherently complex.
23
26
24
-
## Herd7 Simulator & Litmus7 Tool
27
+
## Herd7 Simulator and Litmus7 Tool
25
28
26
-
The herd7 simulator provides a way to test snippets of Arm assembly against the formal definition of the Arm memory model (the `aarch64.cat` file mentioned above). The litmus7 tool can take the same snippets of assembly that run on herd7 and run them on actual Arm hardware. This allows for comparing the formal memory model to the memory model of an actual Arm CPU. These snippets of assembly are called litmus tests.
29
+
The herd7 simulator tests snippets of Arm assembly against the formal definition of the Arm memory model (the `aarch64.cat` file mentioned above). The litmus7 tool runs the same snippets of assembly on actual Arm hardware, enabling a direct comparison between the formal model and the real-world behavior of an actual Arm CPU. These snippets of assembly are called litmus tests.
27
30
28
-
It's important to understand that it is possible for an implementation of an Arm CPU to be more strongly ordered than the formally defined Arm memory model. This case is not a violation of the memory model because it will still execute code in a way that is compliant with the memory ordering rules.
31
+
It's important to note that an Arm CPU implementation might exhibit stronger ordering than the formal memory model. This is not a violation; the CPU still executes code in a way that is compliant with the memory ordering rules.
29
32
30
33
## Install the Tools
31
34
32
-
Herd7 and Litmus7 are part of the [diy7](http://diy.inria.fr/) tool suite. The diy7 tool suite can be installed by following the [installation instructions](http://diy.inria.fr/sources/index.html). You will install the tools on an Arm Linux system so that both herd7 and litmus7 can be compared side by side on the same system.
35
+
Herd7 and Litmus7 are part of the [diy7](http://diy.inria.fr/) tool suite. You can install the diy7 tool suite by following the [installation instructions](http://diy.inria.fr/sources/index.html). Install the tools on an Arm Linux system so that both herd7 and litmus7 can be compared side by side.
33
36
34
37
Start an Arm-based cloud instance. This example uses a `t4g.xlarge` AWS instance running Ubuntu 22.04 LTS, but other instances types are possible.
35
38
36
39
If you are new to cloud-based virtual machines, refer to [Get started with Servers and Cloud Computing](/learning-paths/servers-and-cloud-computing/intro/).
37
40
38
-
First confirm you are using a Arm-based instance with the following command.
41
+
First confirm you are using a Arm-based instance:
39
42
40
43
```bash
41
44
uname -m
42
45
```
43
-
You should see the following output.
46
+
You should see the following output:
44
47
45
48
```output
46
49
aarch64
47
50
```
48
51
49
-
Next, install OCaml Package Manager (opam). You will need `opam` to install the `herdtools7` tool suite:
52
+
Install OCaml Package Manager (opam). You need `opam` to install the `herdtools7` tool suite:
50
53
```bash
51
54
sudo apt update
52
55
sudo apt install opam -y
53
56
```
54
57
55
-
Setup`opam` to install the tools:
58
+
Set up`opam` to install the tools:
56
59
```bash
57
60
opam init
58
61
opam update
59
62
eval$(opam env)
60
63
```
61
64
62
-
Now install the `herdtool7` tool suite which include both `litmus7` and `herd7`:
65
+
Now install the `herdtool7` tool suite (which includes both `litmus7` and `herd7`):
63
66
64
67
```bash
65
68
opam install herdtools7
@@ -68,7 +71,7 @@ opam install herdtools7
68
71
69
72
## Herd7 and Litmus7 Example Commands
70
73
71
-
You can run `--help` on both the tools to review all the options available.
74
+
You can run `--help` on both tools to review all available options:
72
75
```
73
76
herd7 --help
74
77
```
@@ -80,27 +83,27 @@ The input to both `herd7` and `litmus7` tools are snippets of assembly code, cal
80
83
81
84
Shown below are some example of running the tools with a litmus test. In the next section, you will go through an actual litmus test example.
82
85
83
-
Example of running herd7.
86
+
Run herd7 with a litmus test:
84
87
```
85
88
herd7 ./test.litmus
86
89
```
87
90
88
-
Example of running litmus7.
91
+
Run litmus7 with a litmus test:
89
92
```
90
93
litmus7 ./test.litmus
91
94
```
92
95
93
-
Example of running litmus7 with 5,000,000 test iterations (default is 1,000,000).
96
+
Run litmus7 with 5,000,000 test iterations (default is 1,000,000):
94
97
```
95
98
litmus7 ./test.litmus -s 5000000
96
99
```
97
100
98
-
Example of running a litmus7 tests in parallel on 8 CPUs.
101
+
Run a litmus7 test in parallel on 8 CPUs:
99
102
```
100
103
litmus7 ./test.litmus -a 8
101
104
```
102
105
103
-
Example of running litmus7 and asking GCC to emit atomic instructions as required by the litmus test (the possible need for this is explained later).
106
+
Run litmus7 with GCC emitting atomic instructions as required by the litmus test (explained later):
0 commit comments