Skip to content

Commit c530a41

Browse files
committed
Added learning path on thread syncing
New learning path on using herd7 and litmus7 tools to explore the Arm memory model and thread synchronization. The idea is to enable people to explore their thread sync constructs within the context of the Arm memory model.
1 parent 212a06f commit c530a41

File tree

6 files changed

+572
-0
lines changed

6 files changed

+572
-0
lines changed
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
---
2+
title: Explore The Arm Memory Model & Thread Synchronization
3+
4+
minutes_to_complete: 150
5+
6+
who_is_this_for: This is an advanced topic for engineers looking for a practical way to test different thread synchronization approaches within the context of the Arm memory model.
7+
8+
learning_objectives:
9+
- Test snippets of thread synchronization assembly against the formal definition of the Arm Memory Model
10+
- Test snippets of thread synchronization assembly on Arm hardware to compare against the formal Arm Memory Model
11+
12+
prerequisites:
13+
- An understanding of different memory consistency models like Sequential Consistency, Weak Ordering, Relaxed Consistency, Processor Consistency, etc.
14+
- An understanding of thread synchronization.
15+
- An understanding of Arm assembly, general-purpose registers, and how to find information on Arm assembly instructions.
16+
- An understanding of memory barriers including acquire-release semantics.
17+
18+
author: Julio Suarez
19+
20+
skilllevels: Advanced
21+
subjects: Performance and Architecture
22+
cloud_service_providers:
23+
armips:
24+
- Neoverse
25+
operatingsystems:
26+
- Linux
27+
tools_software_languages:
28+
- Herd7
29+
- Litmus7
30+
- Arm ISA
31+
32+
test_images:
33+
- ubuntu:latest
34+
test_link: null
35+
test_maintenance: false
36+
test_status:
37+
- passed
38+
39+
further_reading:
40+
- resource:
41+
title: Arm Architecture Reference Manual for A-profile architecture
42+
link: https://developer.arm.com/documentation/ddi0487/la
43+
type: documentation
44+
- resource:
45+
title: Armv8 Barriers
46+
link: https://developer.arm.com/documentation/100941/0101/Barriers
47+
type: documentation
48+
- resource:
49+
title: Barrier Litmus Tests and Cookbook
50+
link: https://developer.arm.com/documentation/100941/0101/Barriers
51+
type: documentation
52+
- resource:
53+
title: diy7 documentation
54+
link: https://diy.inria.fr/doc/index.html
55+
type: documentation
56+
57+
weight: 1
58+
layout: learningpathall
59+
learning_path_main_page: 'yes'
60+
---
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY THIS FILE
4+
# ================================================================================
5+
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
6+
title: "Next Steps" # Always the same, html page title.
7+
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
8+
---
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
title: "Thread Synchronization, Arm Memory Model, and Tools"
3+
weight: 2
4+
layout: "learningpathall"
5+
---
6+
7+
## CPU Memory Model vs Language/Runtime Memory Models
8+
9+
It's important to understand that the majority of developers will not need to concern themselves with the memory consistency model of the CPU their code will execute on. This is because programming languages and runtime engines abstract away the CPUs memory model by presenting the programmer with a language/runtime memory model. This abstraction is achieved by providing developers with a set of language/runtime specific memory ordering rules, synchronization constructs, and supporting libraries. So long as the developer uses these correctly, language compilers and runtime engines will make sure the code executes correctly on any CPU regardless of how strong or weakly ordered it is.
10+
11+
That said, developers may want to dig deeper into this topic for various reasons including:
12+
13+
- Extract more performance from weakly ordered CPUs (like Arm).
14+
- Keep in mind that compilers and runtimes will do a good job of maximizing performance. Only in well understood niche cases will there be a potential for performance gain by going beyond what the compiler/runtime would do to higher level code. For most cases, all it takes to get more performance is to use the latest compilers, compiler switches, and runtimes.
15+
- Develop confidence in the correctness of synchronization constructs.
16+
- Develop an understanding of how compilers and runtimes select different machine instructions while still honoring the memory ordering rules of the language/runtime and CPU.
17+
- General learning.
18+
19+
## What We Will Do
20+
21+
In this Learning Path, we will use publicly available tools to explore thread synchronization on Arm CPUs. This will provide the reader with enough working knowledge of the tools to be able to explore on their own. At the end of this Learning Path, we will provide more information on where readers can find a more formal (and complex) treatment of this subject.
22+
23+
## The Formal Definition of the Arm Memory Model
24+
25+
The formal definition of the Arm memory model is in the [Arm Architecture Reference Manual for A-profile architecture](https://developer.arm.com/documentation/ddi0487/la) (Arm ARM) under the section called `The AArch64 Application Level Memory Model`. The ordering requirements defined in the Arm ARM is a transliteration of the `aarch64.cat` file hosted on the Arm [herd7 simulator tool](https://developer.arm.com/herd7). In fact, this `aarch64.cat` file is the authoritative definition of the Arm memory model. Since it is a formal definition, it is complex.
26+
27+
## Herd7 Simulator & Litmus7 Tool
28+
29+
The herd7 simulator provides a way to test snippets of Arm assembly against the formal definition of the Arm memory model (the `aarch64.cat` file mentioned above). The litmus7 tool can take the same snippets of assembly that run on herd7 and run them on actual Arm HW. This allows for comparing the formal memory model to the memory model of an actual Arm CPU. These snippets of assembly are called litmus tests.
30+
31+
It's important to understand that it is possible for an implementation of an Arm CPU to be more strongly ordered than the formally defined Arm memory model. This case is not a violation of the memory model because it will still execute code in a way that is compliant with the memory ordering rules.
32+
33+
## Installing the Tools
34+
35+
Herd7 and Litmus7 are part of the [diy7](http://diy.inria.fr/) tool suite. The diy7 tool suite can be installed by following the [installation instructions](http://diy.inria.fr/sources/index.html). We suggest installing this on an Arm system so that both herd7 and litmus7 can be compared side by side on the same system.
36+
37+
38+
## Running Herd7 and Litmus7 Example Commands
39+
40+
The test file is assumed to be called `test.litmus` and is in the current directory.
41+
42+
The help menu shows all options.
43+
```
44+
herd7 --help
45+
```
46+
```
47+
litmus7 --help
48+
```
49+
50+
Example of running herd7.
51+
```
52+
herd7 ./test.litmus
53+
```
54+
55+
Example of running litmus7.
56+
```
57+
litmus7 ./test.litmus
58+
```
59+
60+
Example of running litmus7 with 5,000,000 test iterations (default is 1,000,000).
61+
```
62+
litmus7 ./test.litmus -s 5000000
63+
```
64+
65+
Example of running a litmus7 tests in parallel on 8 CPUs.
66+
```
67+
litmus7 ./test.litmus -a 8
68+
```
69+
70+
Example of running litmus7 and asking GCC to emit atomic instructions as required by the litmus test (the possible need for this is explained later).
71+
```
72+
litmus7 ./test.litmus -ccopts="-mcpu=native"
73+
```

0 commit comments

Comments
 (0)