You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/arm-cpp-memory-model/1.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,21 +16,21 @@ You can think of memory ordering as falling into four broad categories:
16
16
17
17
1.**Source Code Order** - the exact sequence in which you write statements. This is the most intuitive view because it directly reflects how code appears to you.
18
18
19
-
Here is an example:
19
+
Here is an example:
20
20
21
-
```output
22
-
int x = 5; // A
23
-
int z = x * 5; // B
24
-
int y = 42; // C
25
-
```
21
+
```output
22
+
int x = 5; // A
23
+
int z = x * 5; // B
24
+
int y = 42; // C
25
+
```
26
26
27
27
2. **Program Order** - the logical sequence that the compiler recognizes, and it might rearrange or optimize instructions under certain constraints to create a program that executes in fewer cycles. Although your source code lists statements in a particular order, the compiler can restructure them if it deems it safe. For example, the pseudo-assembly below reorders the source instructions:
28
28
29
-
```output
30
-
LDR R1 #5 // A
31
-
LDR R2 #42 // C
32
-
MULT R3, #R1, #5 // B
33
-
```
29
+
```output
30
+
LDR R1 #5 // A
31
+
LDR R2 #42 // C
32
+
MULT R3, #R1, #5 // B
33
+
```
34
34
35
35
3. **Execution Order** - this is the order in which the hardware actually issues and executes instructions. Modern CPUs often employ techniques to improve instruction-level parallelism such as out-of-order execution and speculation for performance. For instance, on an Arm-based system, you might see instructions issued in different order during runtime. The subtle difference between program order and execution order is that program order refers to the sequence seen in the binary whereas execution is the order in which those instructions are actually issued and retired. Even though the instructions are listed in one order, the CPU might reorder their micro-operations as long as it respects dependencies.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/arm-cpp-memory-model/2.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,13 +14,13 @@ The single-threaded world was simpler: you wrote code, the compiler safely reord
14
14
15
15
### Expanding the memory model for multiple threads
16
16
17
-
When multi threading programming gained traction, compilers and CPUs needed precise rules about what reordering is allowed. This is where the formalized C++ memory model, introduced in C++11, steps in. Prior to C++11, concurrency in C++ was partially specified and relied on platform-specific behavior. Now, the language standard includes well-defined semantics ensuring that concurrent code can rely on a set of guaranteed rules.
17
+
When multi-threaded programming gained traction, compilers and CPUs needed precise rules about what reordering is allowed. This is where the formalized C++ memory model, introduced in C++11, steps in. Prior to C++11, concurrency in C++ was partially specified and relied on platform-specific behavior. Now, the language standard includes well-defined semantics ensuring that concurrent code can rely on a set of guaranteed rules.
18
18
19
-
Under the new model, if a piece of data is shared between threads without proper synchronization, you can no longer assume it behaves like single-threaded code. Instead, operations on this shared data may be reordered unless you explicitly prevent it using atomic operations or other synchronization primitives such as mutexes. To ensure correctness, C++ provides an array of memory ordering options (such as `std::memory_order_relaxed`, `std::memory_order_acquire`, and `std::memory_order_release`) that govern how loads and stores can be observed in a multi-threaded environment. Details can be found on the C++ reference manual.
19
+
Under the new model, if a piece of data is shared between threads without proper synchronization, you can no longer assume it behaves like single-threaded code. Instead, operations on this shared data may be reordered unless you explicitly prevent it using atomic operations or other synchronization primitives such as mutexes. To ensure correctness, C++ provides an array of memory ordering options (such as `std::memory_order_relaxed`, `std::memory_order_acquire`, and `std::memory_order_release`) that govern how loads and stores can be observed in a multi-threaded environment. Details can be found in the C++ reference manual.
20
20
21
21
## C++ atomic memory ordering
22
22
23
-
In C++, `std::memory_order` atomic operations allow developers to specify how memory accesses, including regular, non-atomic memory accesses are ordered among atomic operation. Choosing the right memory order is crucial for balancing performance and correctness. Assume we have 2 atomic integers with initial values of 0:
23
+
In C++, `std::memory_order` atomic operations allow developers to specify how memory accesses, including regular, non-atomic memory accesses are ordered among atomic operations. Choosing the right memory order is crucial for balancing performance and correctness. Assume we have 2 atomic integers with initial values of 0:
24
24
25
25
```cpp
26
26
std::atomic<int> x{0};
@@ -64,5 +64,5 @@ while (atomic_load(ptr, memory_order_acquire) is null) { } // Acquire: wait unti
64
64
65
65
Sequential consistency, `memory_order_seq_cst` is the strongest order and the default ordering if nothing is specified.
66
66
67
-
There are several other memory ordering possibilities. For information on all possible memory ordering possibilities in the C++11 standard and their nuances, please refer to the [C++ reference](https://en.cppreference.com/w/cpp/atomic/memory_order).
67
+
There are several other memory ordering possibilities. For information on all memory ordering possibilities in the C++11 standard and their nuances, please refer to the [C++ reference](https://en.cppreference.com/w/cpp/atomic/memory_order).
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/arm-cpp-memory-model/3.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,15 +10,15 @@ layout: learningpathall
10
10
11
11
Due to the differences in the hardware memory ordering, as explained in the earlier sections, source code written for x86 can behave differently when ported to Arm.
12
12
13
-
To demonstrate this, this Learning Path walks you through a simple example that is run on both x86 and Arm cloud instance.
13
+
To demonstrate this, this section walks you through a simple example that is run on both an x86 and an Arm cloud instance.
14
14
15
15
### Get Started
16
16
17
-
Start an Arm-based cloud instance. This example uses a `t4g.xlarge` AWS instance running Ubuntu 22.04 LTS, but you can use other instances types.
17
+
Start an Arm-based cloud instance. This example uses a `t4g.xlarge` AWS instance running Ubuntu 22.04 LTS, but you can use other instance types.
18
18
19
19
If you are new to cloud-based virtual machines, see [Get started with Servers and Cloud Computing](/learning-paths/servers-and-cloud-computing/intro/).
20
20
21
-
First confirm you are using a Arm-based instance with the following command.
21
+
First, confirm you are using a Arm-based instance with the following command.
22
22
23
23
```bash
24
24
uname -m
@@ -115,12 +115,14 @@ Subtle issues can surface under specific workloads, making them challenging to d
115
115
116
116
Due to the stronger memory model in x86 processors, programs not adhering to the C++ standard might give programmers a false sense of security. To demonstrate this, create an connect to an AWS `t2.2xlarge` instance that uses the x86 architecture.
117
117
118
-
Running the following command I can observe the underlying hardware is a Intel Xeon E5-2686 Processor
118
+
Running the command below you observe the underlying hardware is a Intel Xeon E5-2686 Processor.
119
119
120
120
```bash
121
121
lscpu | grep -i "Model"
122
122
```
123
123
124
+
Here is the output:
125
+
124
126
```output
125
127
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
126
128
Model: 79
@@ -199,7 +201,7 @@ int main() {
199
201
200
202
```
201
203
202
-
Compile and run on the Arm-based machine:
204
+
Compile and run the new code on the Arm-based machine:
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/arm-cpp-memory-model/4.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,11 +8,11 @@ layout: learningpathall
8
8
9
9
## How can I detect infrequent race conditions?
10
10
11
-
ThreadSanitizer (TSan) is a concurrency bug detection tool that identifies data races in multithreaded programs. By instrumenting code at compile time, `TSan` dynamically tracks memory operations, monitors lock usage, and detects inconsistencies in thread synchronization. When a potential data race is found, `TSan` provides detailed reports to help you debug.
11
+
ThreadSanitizer (TSan) is a concurrency bug detection tool that identifies data races in multithreaded programs. By instrumenting code at compile time, TSan dynamically tracks memory operations, monitors lock usage, and detects inconsistencies in thread synchronization. When a potential data race is found, TSan provides detailed reports to help you debug.
12
12
13
-
Although its runtime overhead can be significant, `TSan` provides valuable insights into concurrency issues often missed by static analysis tools.
13
+
Although its runtime overhead can be significant, TSan provides valuable insights into concurrency issues often missed by static analysis tools.
14
14
15
-
`TSan` is available in recent versions of the `clang` and `gcc` compilers.
15
+
TSan is available in recent versions of the `clang` and `gcc` compilers.
16
16
17
17
Compile and run the following example using the `clang++` compiler:
18
18
@@ -38,7 +38,7 @@ This output highlights a potential data race in the `threadB` function, correspo
38
38
39
39
## Does TSan have any limitations?
40
40
41
-
While powerful, `TSan` has some notable drawbacks:
41
+
While powerful, TSan has some notable drawbacks:
42
42
43
43
* It identifies concurrency issues only at runtime, meaning code paths not exercised during testing remain unchecked.
0 commit comments