Skip to content

Commit bee9c86

Browse files
committed
Review C++ memory model Learning Path
1 parent c70471d commit bee9c86

File tree

5 files changed

+27
-29
lines changed

5 files changed

+27
-29
lines changed

content/learning-paths/servers-and-cloud-computing/arm-cpp-memory-model/1.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -16,21 +16,21 @@ You can think of memory ordering as falling into four broad categories:
1616

1717
1. **Source Code Order** - the exact sequence in which you write statements. This is the most intuitive view because it directly reflects how code appears to you.
1818

19-
Here is an example:
19+
Here is an example:
2020

21-
```output
22-
int x = 5; // A
23-
int z = x * 5; // B
24-
int y = 42; // C
25-
```
21+
```output
22+
int x = 5; // A
23+
int z = x * 5; // B
24+
int y = 42; // C
25+
```
2626
2727
2. **Program Order** - the logical sequence that the compiler recognizes, and it might rearrange or optimize instructions under certain constraints to create a program that executes in fewer cycles. Although your source code lists statements in a particular order, the compiler can restructure them if it deems it safe. For example, the pseudo-assembly below reorders the source instructions:
2828
29-
```output
30-
LDR R1 #5 // A
31-
LDR R2 #42 // C
32-
MULT R3, #R1, #5 // B
33-
```
29+
```output
30+
LDR R1 #5 // A
31+
LDR R2 #42 // C
32+
MULT R3, #R1, #5 // B
33+
```
3434
3535
3. **Execution Order** - this is the order in which the hardware actually issues and executes instructions. Modern CPUs often employ techniques to improve instruction-level parallelism such as out-of-order execution and speculation for performance. For instance, on an Arm-based system, you might see instructions issued in different order during runtime. The subtle difference between program order and execution order is that program order refers to the sequence seen in the binary whereas execution is the order in which those instructions are actually issued and retired. Even though the instructions are listed in one order, the CPU might reorder their micro-operations as long as it respects dependencies.
3636

content/learning-paths/servers-and-cloud-computing/arm-cpp-memory-model/2.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,13 @@ The single-threaded world was simpler: you wrote code, the compiler safely reord
1414

1515
### Expanding the memory model for multiple threads
1616

17-
When multi threading programming gained traction, compilers and CPUs needed precise rules about what reordering is allowed. This is where the formalized C++ memory model, introduced in C++11, steps in. Prior to C++11, concurrency in C++ was partially specified and relied on platform-specific behavior. Now, the language standard includes well-defined semantics ensuring that concurrent code can rely on a set of guaranteed rules.
17+
When multi-threaded programming gained traction, compilers and CPUs needed precise rules about what reordering is allowed. This is where the formalized C++ memory model, introduced in C++11, steps in. Prior to C++11, concurrency in C++ was partially specified and relied on platform-specific behavior. Now, the language standard includes well-defined semantics ensuring that concurrent code can rely on a set of guaranteed rules.
1818

19-
Under the new model, if a piece of data is shared between threads without proper synchronization, you can no longer assume it behaves like single-threaded code. Instead, operations on this shared data may be reordered unless you explicitly prevent it using atomic operations or other synchronization primitives such as mutexes. To ensure correctness, C++ provides an array of memory ordering options (such as `std::memory_order_relaxed`, `std::memory_order_acquire`, and `std::memory_order_release`) that govern how loads and stores can be observed in a multi-threaded environment. Details can be found on the C++ reference manual.
19+
Under the new model, if a piece of data is shared between threads without proper synchronization, you can no longer assume it behaves like single-threaded code. Instead, operations on this shared data may be reordered unless you explicitly prevent it using atomic operations or other synchronization primitives such as mutexes. To ensure correctness, C++ provides an array of memory ordering options (such as `std::memory_order_relaxed`, `std::memory_order_acquire`, and `std::memory_order_release`) that govern how loads and stores can be observed in a multi-threaded environment. Details can be found in the C++ reference manual.
2020

2121
## C++ atomic memory ordering
2222

23-
In C++, `std::memory_order` atomic operations allow developers to specify how memory accesses, including regular, non-atomic memory accesses are ordered among atomic operation. Choosing the right memory order is crucial for balancing performance and correctness. Assume we have 2 atomic integers with initial values of 0:
23+
In C++, `std::memory_order` atomic operations allow developers to specify how memory accesses, including regular, non-atomic memory accesses are ordered among atomic operations. Choosing the right memory order is crucial for balancing performance and correctness. Assume we have 2 atomic integers with initial values of 0:
2424

2525
```cpp
2626
std::atomic<int> x{0};
@@ -64,5 +64,5 @@ while (atomic_load(ptr, memory_order_acquire) is null) { } // Acquire: wait unti
6464
6565
Sequential consistency, `memory_order_seq_cst` is the strongest order and the default ordering if nothing is specified.
6666
67-
There are several other memory ordering possibilities. For information on all possible memory ordering possibilities in the C++11 standard and their nuances, please refer to the [C++ reference](https://en.cppreference.com/w/cpp/atomic/memory_order).
67+
There are several other memory ordering possibilities. For information on all memory ordering possibilities in the C++11 standard and their nuances, please refer to the [C++ reference](https://en.cppreference.com/w/cpp/atomic/memory_order).
6868

content/learning-paths/servers-and-cloud-computing/arm-cpp-memory-model/3.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,15 @@ layout: learningpathall
1010

1111
Due to the differences in the hardware memory ordering, as explained in the earlier sections, source code written for x86 can behave differently when ported to Arm.
1212

13-
To demonstrate this, this Learning Path walks you through a simple example that is run on both x86 and Arm cloud instance.
13+
To demonstrate this, this section walks you through a simple example that is run on both an x86 and an Arm cloud instance.
1414

1515
### Get Started
1616

17-
Start an Arm-based cloud instance. This example uses a `t4g.xlarge` AWS instance running Ubuntu 22.04 LTS, but you can use other instances types.
17+
Start an Arm-based cloud instance. This example uses a `t4g.xlarge` AWS instance running Ubuntu 22.04 LTS, but you can use other instance types.
1818

1919
If you are new to cloud-based virtual machines, see [Get started with Servers and Cloud Computing](/learning-paths/servers-and-cloud-computing/intro/).
2020

21-
First confirm you are using a Arm-based instance with the following command.
21+
First, confirm you are using a Arm-based instance with the following command.
2222

2323
```bash
2424
uname -m
@@ -115,12 +115,14 @@ Subtle issues can surface under specific workloads, making them challenging to d
115115

116116
Due to the stronger memory model in x86 processors, programs not adhering to the C++ standard might give programmers a false sense of security. To demonstrate this, create an connect to an AWS `t2.2xlarge` instance that uses the x86 architecture.
117117

118-
Running the following command I can observe the underlying hardware is a Intel Xeon E5-2686 Processor
118+
Running the command below you observe the underlying hardware is a Intel Xeon E5-2686 Processor.
119119

120120
```bash
121121
lscpu | grep -i "Model"
122122
```
123123

124+
Here is the output:
125+
124126
```output
125127
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
126128
Model: 79
@@ -199,7 +201,7 @@ int main() {
199201

200202
```
201203
202-
Compile and run on the Arm-based machine:
204+
Compile and run the new code on the Arm-based machine:
203205
204206
```bash
205207
g++ correct_memory_ordering.cpp -o correct_memory_ordering -O3

content/learning-paths/servers-and-cloud-computing/arm-cpp-memory-model/4.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ layout: learningpathall
88

99
## How can I detect infrequent race conditions?
1010

11-
ThreadSanitizer (TSan) is a concurrency bug detection tool that identifies data races in multithreaded programs. By instrumenting code at compile time, `TSan` dynamically tracks memory operations, monitors lock usage, and detects inconsistencies in thread synchronization. When a potential data race is found, `TSan` provides detailed reports to help you debug.
11+
ThreadSanitizer (TSan) is a concurrency bug detection tool that identifies data races in multithreaded programs. By instrumenting code at compile time, TSan dynamically tracks memory operations, monitors lock usage, and detects inconsistencies in thread synchronization. When a potential data race is found, TSan provides detailed reports to help you debug.
1212

13-
Although its runtime overhead can be significant, `TSan` provides valuable insights into concurrency issues often missed by static analysis tools.
13+
Although its runtime overhead can be significant, TSan provides valuable insights into concurrency issues often missed by static analysis tools.
1414

15-
`TSan` is available in recent versions of the `clang` and `gcc` compilers.
15+
TSan is available in recent versions of the `clang` and `gcc` compilers.
1616

1717
Compile and run the following example using the `clang++` compiler:
1818

@@ -38,7 +38,7 @@ This output highlights a potential data race in the `threadB` function, correspo
3838

3939
## Does TSan have any limitations?
4040

41-
While powerful, `TSan` has some notable drawbacks:
41+
While powerful, TSan has some notable drawbacks:
4242

4343
* It identifies concurrency issues only at runtime, meaning code paths not exercised during testing remain unchecked.
4444

content/learning-paths/servers-and-cloud-computing/arm-cpp-memory-model/_index.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,6 @@
11
---
22
title: Learn about the C++ memory model for porting applications to Arm
33

4-
draft: true
5-
cascade:
6-
draft: true
7-
84
minutes_to_complete: 45
95

106
who_is_this_for: This is an advanced topic for C++ developers porting applications from x86 to Arm and optimizing performance.
@@ -15,7 +11,7 @@ learning_objectives:
1511
- Employ best practices for writing C++ on Arm to avoid race conditions.
1612

1713
prerequisites:
18-
- Access to an x86 and Arm cloud instance (virtual machine).
14+
- Access to an x86 and an Arm cloud instance (virtual machine).
1915
- Proficiency in C++ programming.
2016

2117
author: Kieran Hejmadi

0 commit comments

Comments
 (0)