|
| 1 | +--- |
| 2 | +title: Example of Race Condition |
| 3 | +weight: 4 |
| 4 | + |
| 5 | +### FIXED, DO NOT MODIFY |
| 6 | +layout: learningpathall |
| 7 | +--- |
| 8 | + |
| 9 | +## Example of a Race Condition when porting from x86 to AArch64 |
| 10 | + |
| 11 | +Due to the differences in the hardware perceived ordering as explained in the earlier sections, source code written for x86 may behave differently when ported to Arm. To demonstrate this we will create a trivial example and run it both on an x86 and Arm cloud instance. |
| 12 | + |
| 13 | +Start an Arm-based cloud instance, in this example I am using `t4g.xlarge` AWS instance running Ubuntu 22.04 LTS. If you are new to using cloud-based virtual machines, please see our [getting started guide](https://learn.arm.com/learning-paths/servers-and-cloud-computing/intro/). |
| 14 | + |
| 15 | +First confirm you are using a Arm-based instance with the following command. |
| 16 | + |
| 17 | +```bash |
| 18 | +uname -m |
| 19 | +``` |
| 20 | +You should see the following output. |
| 21 | + |
| 22 | +```output |
| 23 | +aarch64 |
| 24 | +``` |
| 25 | + |
| 26 | +Next, we will install the prerequisitve packages. |
| 27 | + |
| 28 | +```bash |
| 29 | +sudo apt update |
| 30 | +sudo apt install g++ clang |
| 31 | +``` |
| 32 | + |
| 33 | +Copy and paste the following code snippet into a file named `relaxed_memory_model.cpp`. |
| 34 | + |
| 35 | +```cpp |
| 36 | +#include <iostream> |
| 37 | +#include <atomic> |
| 38 | +#include <thread> |
| 39 | +#include <cassert> |
| 40 | +#include <chrono> |
| 41 | + |
| 42 | +struct Node { |
| 43 | + int x; |
| 44 | +}; |
| 45 | +std::atomic<Node*> node{nullptr}; |
| 46 | + |
| 47 | +void threadA() { |
| 48 | + auto n = new Node(); |
| 49 | + n->x = 42; |
| 50 | + node.store(n, std::memory_order_relaxed); |
| 51 | +} |
| 52 | + |
| 53 | +void threadB() { |
| 54 | + Node* n = nullptr; |
| 55 | + while ((n = node.load(std::memory_order_relaxed)) == nullptr) { |
| 56 | + std::this_thread::sleep_for(std::chrono::nanoseconds(50)); // Small sleep to improve scheduling |
| 57 | + } |
| 58 | + if (n->x != 42) { |
| 59 | + std::cerr << "Race condition detected: n->x = " << n->x << std::endl; |
| 60 | + std::terminate(); |
| 61 | + } |
| 62 | +} |
| 63 | + |
| 64 | +void runTest() { |
| 65 | + for (int i = 0; i < 100000; ++i) { // Run many iterations but eventually time out |
| 66 | + node.store(nullptr, std::memory_order_relaxed); |
| 67 | + std::thread t1(threadA); |
| 68 | + std::thread t2(threadB); |
| 69 | + std::thread t3(threadA); |
| 70 | + std::thread t4(threadA); |
| 71 | + t1.join(); |
| 72 | + t2.join(); |
| 73 | + t3.join(); |
| 74 | + t4.join(); |
| 75 | + delete node.load(); |
| 76 | + } |
| 77 | +} |
| 78 | + |
| 79 | +int main() { |
| 80 | + runTest(); |
| 81 | + std::cout << "No Race Condition Occurred in this run" << std::endl; |
| 82 | + return 0; |
| 83 | +} |
| 84 | +``` |
| 85 | +
|
| 86 | +The code snippet above is a trivial example of a data race condition. Thread A creates a node variable and assigns it the number 42. On the otherhand, thread B checks than the variable assigned to the Node is equal to 42. Both functions use the `memory_order_relaxed` model, which allows the possibility for thread B to read an unintialised variable before it has been assigned the value 42 in thread A. |
| 87 | +
|
| 88 | +```bash |
| 89 | +g++ relaxed_memory_ordering.cpp -o relaxed_memory_ordering -O3 |
| 90 | +``` |
| 91 | + |
| 92 | +```output |
| 93 | +./relaxed_memory_ordering |
| 94 | +... |
| 95 | +~ 5-30 second wait |
| 96 | +... |
| 97 | + Race condition detected: n->x = 42 |
| 98 | + terminate called without an active exception |
| 99 | + Aborted (core dumped) |
| 100 | +``` |
| 101 | + |
| 102 | +It is worth noting that this is only a probability of a race condition. Our contrived example is designed to trigger frequently. Unfortunately, in production workloads there may be a more subtle probability that may surface in production or under specific workloads. This is the reason race conditions are difficult to spot. |
| 103 | + |
| 104 | +### Behaviour on x86 instance |
| 105 | + |
| 106 | +Due to the more strong memory model associated with x86 processors, programs that do not adhere to the C++ standard may give programmers a false sense of security. To demonstrate this I connected to an AWS `t2.2xlarge` instance that uses the x86 architecture. |
| 107 | + |
| 108 | +Running the following command I can observe the underlying hardware is a Intel Xeon E5-2686 Processor |
| 109 | + |
| 110 | +```bash |
| 111 | +lscpu | grep -i "Model" |
| 112 | +``` |
| 113 | + |
| 114 | +```output |
| 115 | +Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz |
| 116 | +Model: 79 |
| 117 | +``` |
| 118 | +Follow the instructions above and recompiling leads to no race conditions on this x86-based machine. |
| 119 | + |
| 120 | +```output |
| 121 | +./relaxed_memory_ordering |
| 122 | +No race condition occurred in this run |
| 123 | +``` |
| 124 | + |
| 125 | + |
| 126 | +## Using correct memory ordering of Atomics |
| 127 | + |
| 128 | +As the example above shows, not adhering to the C++ standard can lead to a false sensitivity when running on x86 platforms. To fix the race condition when porting we need to use the correct memory ordering for each thread. The following snippet of C++ updates `threadA` to use the `memory_order_release`, `threadB` to use `memory_order_acquire` and the `runTest` fuction to use `memory_order_release` on the Node object. |
| 129 | + |
| 130 | +Save the adjusted code snippet below into a file named `correct_memory_ordering.cpp`. |
| 131 | + |
| 132 | +```cpp |
| 133 | +#include <iostream> |
| 134 | +#include <atomic> |
| 135 | +#include <thread> |
| 136 | +#include <cassert> |
| 137 | +#include <chrono> |
| 138 | + |
| 139 | +struct Node { |
| 140 | + int x; |
| 141 | +}; |
| 142 | +std::atomic<Node*> node{nullptr}; |
| 143 | + |
| 144 | +void threadA() { |
| 145 | + auto n = new Node(); |
| 146 | + n->x = 42; |
| 147 | + node.store(n, std::memory_order_release); |
| 148 | +} |
| 149 | + |
| 150 | +void threadB() { |
| 151 | + Node* n = nullptr; |
| 152 | + while ((n = node.load(std::memory_order_acquire)) == nullptr) { |
| 153 | + std::this_thread::sleep_for(std::chrono::nanoseconds(50)); // Small sleep to improve scheduling |
| 154 | + } |
| 155 | + if (n->x != 42) { |
| 156 | + std::cerr << "Race condition detected: n->x = " << n->x << std::endl; |
| 157 | + std::terminate(); |
| 158 | + } |
| 159 | +} |
| 160 | + |
| 161 | +void runTest() { |
| 162 | + for (int i = 0; i < 100000; ++i) { // Run many iterations but eventually time out |
| 163 | + node.store(nullptr, std::memory_order_release); |
| 164 | + std::thread t1(threadA); |
| 165 | + std::thread t2(threadB); |
| 166 | + std::thread t3(threadA); |
| 167 | + std::thread t4(threadA); |
| 168 | + t1.join(); |
| 169 | + t2.join(); |
| 170 | + t3.join(); |
| 171 | + t4.join(); |
| 172 | + delete node.load(); |
| 173 | + } |
| 174 | +} |
| 175 | + |
| 176 | +int main() { |
| 177 | + runTest(); |
| 178 | + std::cout << "No Race Condition Occurred in this run" << std::endl; |
| 179 | + return 0; |
| 180 | +} |
| 181 | + |
| 182 | +``` |
| 183 | +
|
| 184 | +Compiling with the following command and run on an Aarch64 based machine. |
| 185 | +
|
| 186 | +```bash |
| 187 | +g++ correct_memory_ordering.cpp -o correct_memory_ordering -O3 |
| 188 | +``` |
| 189 | + |
| 190 | +```output |
| 191 | +./correct_memory_ordering |
| 192 | +No Race Condition Occurred in this run |
| 193 | +``` |
| 194 | + |
0 commit comments