Skip to content

Commit 410f405

Browse files
Merge pull request #1760 from madeline-underwood/C++MemMod
C++mem mod_JA to check
2 parents 3cb59e9 + 648c4f4 commit 410f405

File tree

5 files changed

+58
-46
lines changed

5 files changed

+58
-46
lines changed
Lines changed: 19 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,45 @@
11
---
2-
title: Introduction to C++ memory models
2+
title: Introduction to C++ Memory Models
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## What is a memory model?
9+
## What is a Memory Model?
1010

11-
A language’s memory model defines how operations on shared data interleave at runtime, providing rules on what reorderings are allowed by compilers and hardware. In C++, the memory model specifies how threads interact with shared variables, ensuring consistent behavior across different compilers and architectures. You can think of memory ordering in 4 broad categories.
11+
A programming language’s memory model defines how operations on shared data can interleave at runtime. It sets rules for how compilers and hardware might reorder these operations.
1212

13-
- **Source Code Order**: The exact sequence in which you write statements. This is the most intuitive view because it directly reflects how code appears to you.
13+
In C++, the memory model specifically defines how threads interact with shared variables, ensuring consistent behavior across different compilers and architectures.
14+
15+
You can think of memory ordering as falling into four broad categories:
16+
17+
1. **Source Code Order** - the exact sequence in which you write statements. This is the most intuitive view because it directly reflects how code appears to you.
18+
19+
Here is an example:
1420

1521
```output
1622
int x = 5; // A
17-
int z = x * 5 // B
18-
int y = 42 // C
23+
int z = x * 5; // B
24+
int y = 42; // C
1925
```
2026

21-
- **Program Order**: The logical sequence recognized by the compiler, which may rearrange or optimize instructions under certain constraints to create a program that takes fewer cycles. Although the statements may appear in a particular order in your source code, the compiler could restructure them if it deems it safe. For example, the pseudo assembly below reorders the source line instructions above.
27+
2. **Program Order** - the logical sequence that the compiler recognizes, and it might rearrange or optimize instructions under certain constraints to create a program that executes in fewer cycles. Although your source code lists statements in a particular order, the compiler can restructure them if it deems it safe. For example, the pseudo-assembly below reorders the source instructions:
2228

2329
```output
2430
LDR R1 #5 // A
2531
LDR R2 #42 // C
2632
MULT R3, #R1, #5 // B
2733
```
2834

29-
- **Execution Order**: How instructions are actually issued and executed by the hardware. Modern CPUs often employ techniques to improve instruction-level parallelism such as out-of-order execution and speculation for performance. For instance, on an Arm-based system, you might see instructions issued in different order during runtime. The subtle difference between program order and execution order is that program order refers to the sequence seen in the binary whereas execution is the order in which those instructions are actually issued and retired. Even though the instructions are listed in one order, the CPU might reorder their micro-operations as long as it respects dependencies.
35+
3. **Execution Order** - this is the order in which the hardware actually issues and executes instructions. Modern CPUs often employ techniques to improve instruction-level parallelism such as out-of-order execution and speculation for performance. For instance, on an Arm-based system, you might see instructions issued in different order during runtime. The subtle difference between program order and execution order is that program order refers to the sequence seen in the binary whereas execution is the order in which those instructions are actually issued and retired. Even though the instructions are listed in one order, the CPU might reorder their micro-operations as long as it respects dependencies.
3036

31-
- **Hardware Perceived Order**: This is the perspective observed by other devices in the system, which can differ if the hardware buffers writes or merges memory operations. Crucially, the hardware-perceived order can vary between CPU architectures, for example between x86 and Arm, and this should be considered when porting applications. An abstract diagram from the academic paper is shown below [Maranget et. al, 2012]. A write operation in one of the 5 threads in the pentagon below may propagate to the other threads in any order.
37+
4. **Hardware Perceived Order** - this is the perspective observed by other devices in the system, which can differ if the hardware buffers writes or merges memory operations. Crucially, the hardware-perceived order can vary between CPU architectures, for example between x86 and Arm, and this should be considered when porting applications.
3238

33-
![abstract_model](./multi-copy-atomic.png)
39+
## High-level differences between the Arm Memory Model and the x86 Memory Model
3440

35-
## High-level differences between the Arm memory model and the x86 memory model
41+
The memory models of Arm and x86 architectures differ in terms of ordering guarantees and required synchronizations.
3642

37-
The memory models of Arm and x86 architectures differ in terms of ordering guarantees and required synchronizations. x86 processors implement a relatively strong memory model, commonly referred to as Total Store Order (TSO). Under TSO, loads and stores appear to execute in program order, with only limited reordering permitted. This strong ordering means that software running on x86 generally relies on fewer memory barrier instructions, making it easier to reason about concurrency.
43+
x86 processors implement a relatively strong memory model, commonly referred to as Total Store Order (TSO). Under TSO, loads and stores appear to execute in program order, with only limited reordering permitted. This strong ordering means that software running on x86 generally relies on fewer memory barrier instructions, making it easier to reason about concurrency.
3844

39-
In contrast, Arm’s memory model is more relaxed, allowing greater reordering of memory operations to optimize performance and energy efficiency. This relaxed model provides less intuitive ordering guarantees, meaning that loads and stores may be observed out of order by other processors. This means that source code needs to correctly follow the language standard to ensure reliable behavior.
45+
In contrast, Arm’s memory model is more relaxed, allowing greater reordering of memory operations to optimize performance and energy efficiency. This relaxed model provides less intuitive ordering guarantees, meaning that loads and stores can be observed out of order by other processors. This means that source code needs to correctly follow the language standard to ensure reliable behavior.

content/learning-paths/servers-and-cloud-computing/arm-cpp-memory-model/2.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: The C++ memory model and atomics
2+
title: The C++ Memory Model and Atomics
33
weight: 3
44

55
### FIXED, DO NOT MODIFY
@@ -8,9 +8,9 @@ layout: learningpathall
88

99
## The C++ memory model for single threads
1010

11-
For a long time, writing C++ programs on single-core systems was relatively straightforward. The compiler could reorder instructions however it wished, so long as the program’s observable behavior remained unchanged. This optimization freedom is commonly referred to as the “as-if” rule. Essentially, compilers can optimize away or move instructions around as if the code had not changed, provided they do not affect inputs, outputs, or volatile accesses.
11+
For a long time, writing C++ programs on single-core systems was straightforward. Compilers could reorder instructions freely, as long as the program’s observable behavior remained unchanged. This flexibility is commonly referred to as the “as-if” rule. Essentially, compilers could optimize away or move instructions around as if the code had not changed, provided the changes did not affect inputs, outputs, or volatile memory accesses.
1212

13-
The single-threaded world was simpler: you wrote code, the compiler made it faster (by safely reordering or eliminating instructions), and performance benefited. Over time, multi-core processors and multi-threaded applications became the norm. Suddenly, reordering instructions was not only about performance because it could change the meaning of programs with threads reading and writing shared data simultaneously.
13+
The single-threaded world was simpler: you wrote code, the compiler safely reordered or eliminated instructions to make it faster, and your program performed better. But as multi-core processors and multi-threaded applications became common, instruction reordering was not only about improving performance - it could actually change the meaning of programs, especially when multiple threads accessed shared data simultaneously.
1414

1515
### Expanding the memory model for multiple threads
1616

content/learning-paths/servers-and-cloud-computing/arm-cpp-memory-model/3.md

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Race condition example
2+
title: Walk through a Race condition example
33
weight: 4
44

55
### FIXED, DO NOT MODIFY
@@ -8,31 +8,35 @@ layout: learningpathall
88

99
## Example of a race condition when porting from x86 to Arm
1010

11-
Due to the differences in the hardware perceived ordering as explained in the earlier sections, source code written for x86 may behave differently when ported to Arm. To demonstrate this we will create a trivial example and run it both on an x86 and Arm cloud instance.
11+
Due to the differences in the hardware memory ordering, as explained in the earlier sections, source code written for x86 can behave differently when ported to Arm.
1212

13-
Start an Arm-based cloud instance. This example uses a `t4g.xlarge` AWS instance running Ubuntu 22.04 LTS, but other instances types are possible.
13+
To demonstrate this, this Learning Path walks you through a simple example that is run on both x86 and Arm cloud instance.
1414

15-
If you are new to cloud-based virtual machines, refer to [Get started with Servers and Cloud Computing](/learning-paths/servers-and-cloud-computing/intro/).
15+
### Get Started
16+
17+
Start an Arm-based cloud instance. This example uses a `t4g.xlarge` AWS instance running Ubuntu 22.04 LTS, but you can use other instances types.
18+
19+
If you are new to cloud-based virtual machines, see [Get started with Servers and Cloud Computing](/learning-paths/servers-and-cloud-computing/intro/).
1620

1721
First confirm you are using a Arm-based instance with the following command.
1822

1923
```bash
2024
uname -m
2125
```
22-
You should see the following output.
26+
You should see the following output:
2327

2428
```output
2529
aarch64
2630
```
2731

28-
Next, install the required software packages.
32+
Next, install the required software packages:
2933

3034
```bash
3135
sudo apt update
3236
sudo apt install g++ clang -y
3337
```
3438

35-
Use a text editor to copy and paste the following code snippet into a file named `relaxed_memory_ordering.cpp`.
39+
Use a text editor to copy and paste the following code snippet into a file named `relaxed_memory_ordering.cpp`:
3640

3741
```cpp
3842
#include <iostream>
@@ -85,31 +89,31 @@ int main() {
8589
}
8690
```
8791
88-
The code above is a small example of a data race condition. Thread A creates a node variable and assigns it the number 42. Thread B checks that the variable assigned to the Node is equal to 42. Both functions use the `memory_order_relaxed` model, which allows the possibility for thread B to read an uninitialized variable before it has been assigned the value 42 in thread A.
92+
The code above demonstrates a data race condition. Thread A creates a node variable and assigns it the value `42`. Thread B checks that the variable assigned to the Node equals 42. Both threads use `memory_order_relaxed` model, allowing thread B to potentially read an uninitialized variable before thread A assigns the value of `42`.
8993
9094
Compile the program using the GNU compiler:
9195
9296
```bash
9397
g++ relaxed_memory_ordering.cpp -o relaxed_memory_ordering -O3
9498
```
9599

96-
Run the command below to run the binary 10 times. Multiple runs increases the chance of observing a race condition.
100+
Run the binary 10 times to increase the chance of observing a race condition:
97101

98102
```bash
99103
for i in {1..10}; do ./relaxed_memory_ordering; done;
100104
```
101105

102-
If you do not see a race condition, the animation below shows a race condition being triggered on the 3rd run.
106+
If you do not see a race condition, the animation below shows a race condition being triggered on the third run:
103107

104108
![Arm64-race-cond](./aarch64-race-condition.gif)
105109

106-
As the graphic above illustrates, a race condition is not a guarantee but a probability.
110+
As the graphic above illustrates, a race condition is probabilistic, and not guaranteed.
107111

108-
Unfortunately, in production workloads there may be a more subtle probability that may surface under specific workloads. This is the reason race conditions are difficult to spot.
112+
Subtle issues can surface under specific workloads, making them challenging to detect.
109113

110114
### Behavior on an x86 instance
111115

112-
Due to the more strong memory model associated with x86 processors, programs that do not adhere to the C++ standard may give programmers a false sense of security. To demonstrate this, create an connect to an AWS `t2.2xlarge` instance that uses the x86 architecture.
116+
Due to the stronger memory model in x86 processors, programs not adhering to the C++ standard might give programmers a false sense of security. To demonstrate this, create an connect to an AWS `t2.2xlarge` instance that uses the x86 architecture.
113117

114118
Running the following command I can observe the underlying hardware is a Intel Xeon E5-2686 Processor
115119

content/learning-paths/servers-and-cloud-computing/arm-cpp-memory-model/4.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,20 @@ layout: learningpathall
88

99
## How can I detect infrequent race conditions?
1010

11-
ThreadSanitizer, commonly referred to as `TSan`, is a concurrency bug detection tool that identifies data races in multi-threaded programs. By instrumenting code at compile time, TSan dynamically tracks memory operations, monitoring lock usage and detecting inconsistencies in thread synchronization. When it finds a potential data race, it reports detailed information to aid debugging. TSan's overhead can be significant, but it provides valuable insights into concurrency issues often missed by static analysis.
11+
ThreadSanitizer (TSan) is a concurrency bug detection tool that identifies data races in multithreaded programs. By instrumenting code at compile time, `TSan` dynamically tracks memory operations, monitors lock usage, and detects inconsistencies in thread synchronization. When a potential data race is found, `TSan` provides detailed reports to help you debug.
1212

13-
TSan is available through both recent `clang` and `gcc` compilers.
13+
Although its runtime overhead can be significant, `TSan` provides valuable insights into concurrency issues often missed by static analysis tools.
1414

15-
Use the `clang++` compiler to compile the example and run the executable:
15+
`TSan` is available in recent versions of the `clang` and `gcc` compilers.
16+
17+
Compile and run the following example using the `clang++` compiler:
1618

1719
```bash
1820
clang++ relaxed_memory_ordering.cpp -o relaxed_memory_ordering -fsanitize=thread -fPIE -pie -g
1921
./relaxed_memory_ordering
2022
```
2123

22-
The output is similar to:
24+
The output will look similar to:
2325

2426
```output
2527
==================
@@ -32,16 +34,16 @@ SUMMARY: ThreadSanitizer: data race /home/ubuntu/src/relaxed_memory_ordering.cpp
3234
==================
3335
```
3436

35-
The output highlights a potential data race in the `threadB` function corresponding to the source code expression `n->x != 42`.
37+
This output highlights a potential data race in the `threadB` function, corresponding to the source code expression `n->x != 42`.
3638

3739
## Does TSan have any limitations?
3840

39-
Thread Sanitizer (TSan) is powerful for detecting data races but has notable drawbacks.
41+
While powerful, `TSan` has some notable drawbacks:
4042

41-
First, it only identifies concurrency issues at runtime, meaning any problematic code that isn’t exercised during testing goes unnoticed.
43+
* It identifies concurrency issues only at runtime, meaning code paths not exercised during testing remain unchecked.
4244

43-
Second, if race conditions exist in third-party binaries or libraries, TSan can’t instrument or fix them without access to their source code.
45+
* It cannot instrument or fix race conditions in third-party binaries or libraries without source code access.
4446

45-
Another major limitation is performance overhead: TSan can slow programs by 2 to 20x and requires extra memory, making it challenging for large-scale or real-time systems.
47+
* It introduces significant performance overhead, typically slowing programs by 2 to 20 times and requiring additional memory. This makes it challenging to use in large-scale or real-time systems.
4648

47-
For further information please refer to the [ThreadSanitizer documentation](https://github.com/google/sanitizers/wiki/threadsanitizercppmanual).
49+
For further information, see the [ThreadSanitizer documentation](https://github.com/google/sanitizers/wiki/threadsanitizercppmanual).

content/learning-paths/servers-and-cloud-computing/arm-cpp-memory-model/_index.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@ minutes_to_complete: 45
1010
who_is_this_for: This is an advanced topic for C++ developers porting applications from x86 to Arm and optimizing performance.
1111

1212
learning_objectives:
13-
- Learn about the C++ memory model.
14-
- Learn about the differences between the Arm and x86 memory model.
15-
- Learn best practices for writing C++ on Arm to avoid race conditions.
13+
- Describe at a high level what a memory model does, and the types of memory ordering.
14+
- Describe the differences between the Arm and x86 memory model.
15+
- Employ best practices for writing C++ on Arm to avoid race conditions.
1616

1717
prerequisites:
1818
- Access to an x86 and Arm cloud instance (virtual machine).
@@ -27,19 +27,19 @@ armips:
2727
- Neoverse
2828
tools_software_languages:
2929
- C++
30-
- ThreadSanitizer (TSan)
30+
- TSan
31+
- Runbook
3132
operatingsystems:
3233
- Linux
33-
- Runbook
34-
34+
3535
further_reading:
3636
- resource:
3737
title: C++ Memory Order Reference Manual
3838
link: https://en.cppreference.com/w/cpp/atomic/memory_order
3939
type: documentation
4040
- resource:
4141
title: Thread Sanitizer Manual
42-
link: Phttps://github.com/google/sanitizers/wiki/threadsanitizercppmanual
42+
link: https://github.com/google/sanitizers/wiki/threadsanitizercppmanual
4343
type: documentation
4444

4545

0 commit comments

Comments
 (0)