You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2025-01-19-DOPvsOOP.md
+249-4Lines changed: 249 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,18 +3,263 @@ title: Data-Oriented Programming vs Object-Oriented Programming
3
3
tags: [C++, programming]
4
4
style: dark
5
5
color: danger
6
-
description: An practical introduction to the useful programming concept of DDG
6
+
description: An practical introduction to the useful programming concept of DOP
7
7
---
8
8
9
9
## Introduction
10
10
11
-
_Still in progress..._
11
+
**Memory alignment matters. And it gets worse at a large scale. As someone said, 1 ms could make the difference between getting frustrated or not waiting for Word to open.**
12
+
13
+
In this post, we will experiment with how the alignment of attributes in classes/structures affects the computational cost of code (in C++) in terms of execution time.
14
+
15
+
The idea is that we will create many instances of various classes with multiple attributes and a method that updates or _does something_ with those attributes, many times. We will evaluate and compare the execution time of each one. We will introduce concepts from DOP (Data-Oriented Programming) and put them into practice to see how they can help us in our daily life as ~~**_high-performance_**~~ programmers.
16
+
17
+
1. We will create a class ```Entity_OOP_Bad``` as any innocent subscriber to OOP would do.
18
+
2. We will paint the previous class with our knowledge of DOP and turn it into ```Entity_OOP_Good```.
19
+
_Still in progress..._
20
+
3. We will further maximize the efficiency of our code in ```Entity_OOP_GoodWithFooPadding```.
21
+
22
+
* How will we evaluate performance? The quality assessment will be purely based on execution time with ```chrono``` and CPU cycles with ```__rdtsc``` from ```x86intrin.h``` library.
23
+
24
+
For the curious, this is my machine:
25
+
```bash
26
+
$ uname -a
27
+
Linux pop-os 6.9.3-76060903-generic #202405300957~1732141768~22.04~f2697e1 SMP PREEMPT_DYNAMIC Wed N x86_64 x86_64 x86_64 GNU/Linux
OOP class with proper attribute ordering, minimizing padding
115
+
Here, padding is reduced by grouping similar types together.
116
+
The attributes are reordered from largest to smallest size (first double, then float, followed by int, char, and finally bool).
117
+
This minimizes the amount of padding required, making the structure more compact in memory.
118
+
On a more technical level, when performing operations on the attributes, the machine code will perform register lookups starting from rax (rax+4, rax+20...) with fewer shifts, and thus more efficiently, if the attributes are properly ordered.
std::cout << "OOP (Good Order by DOP) CPU cycles: " << (end_cycles - start_cycles) << "\n";
163
+
std::cout << "OOP (Good Order by DOP) Execution time: " << elapsedOOPDOP.count() << " seconds\n";
164
+
```
165
+
```text
166
+
OOP (Good Order by DOP) CPU cycles: 15459546
167
+
OOP (Good Order by DOP) Execution time: 0.00575244 seconds
168
+
```
169
+
170
+
Again a better result. This indicates that we are not thinking nonsense, but we can go even further, and this is just transfering to code naive knowledge about CPU architecture...
171
+
172
+
> [!NOTE]
173
+
> Con el comando ```$ lscpu``` you can view the information about my CPU, to see the size in bytes that the CPU queries in each cycle, in order to know how to maximize the efficiency of my structure to avoid unnecessary gaps and perform operations in the fewest number of cycles (L1 and L2 cache sizes, 64-bit data bus size, etc.).
174
+
175
+
### ```Entity_OOP_GoodWithFooPadding````
176
+
Now we manually add the necessary padding to align the data with the 64-bit boundaries of our CPU's memory architecture:
std::cout << "OOP (Good Order by DOP and Foo Padding) CPU cycles: " << (end_cycles - start_cycles) << "\n";
222
+
std::cout << "OOP (Good Order by DOP and Foo Padding) Execution time: " << elapsedOOPDOP_GoodWithFooPadding.count() << " seconds\n";
223
+
```
224
+
```text
225
+
OOP (Good Order by DOP and Foo Padding) CPU cycles: 14294218
226
+
OOP (Good Order by DOP and Foo Padding) Execution time: 0.00531921 seconds
227
+
```
228
+
229
+
Even faster. We have found an evidence to the presented hypotesis. Lets summarize the resultd:
230
+
231
+
```cpp
232
+
std::cout << "With DOP, the processing is " << (elapsedOOPBad.count() - elapsedOOPDOP.count()) * 1e3 << " ms faster\n";
233
+
std::cout << "With DOP and Foo Padding, the processing is " << (elapsedOOPBad.count() - elapsedOOPDOP_GoodWithFooPadding.count()) * 1e3 << " ms faster\n";
234
+
```
235
+
```text
236
+
With DOP, the processing is 0.931258 ms faster
237
+
With DOP and Foo Padding, the processing is 1.36449 ms faster
238
+
```
239
+
240
+
### Larger scale
241
+
242
+
One may wonder, but what if this was something more casual than causal? And what if it was just a quick coincidence? We can run this $$n$$ times to see if Gauss is on our side (is it true that DOP works or not?).
243
+
244
+
Graph results after running the test many (1000) times and analyzing which methods were the fastest:
Mostly, the results align with what was experienced before; careful structuring of variables in memory enhances performance on both small and large scales, even with the optimizations that modern compilers may add.
250
+
251
+
**GRAPH**
12
252
13
253
### Conclusion
254
+
Modern CPUs access memory in blocks (typically 8 bytes or more). If the data is properly aligned in memory, access is faster because it can load and store the data in a single memory cycle. If the data is not properly aligned, the CPU may have to perform more memory accesses, which introduces performance penalties due to the need to correct the alignment at runtime.
14
255
15
-
Each one has its field of application, and it cannot be said that one is better than the other without establishing a particular framework because their philosophy is different. They are not supplementary; rather, they reinforce each other, but especially in the sense that "DOP reinforces OOP."
256
+
One lesson learned is that programming often needs to be approached with a statistical mindset: structure your code in the most probabilistically favorable way for it to execute under normal conditions. If a switch case is likely to hit a specific case most of the time, place that one first. If you can do something at compile-time that results in reasonable performance, do it there instead of at runtime. Reduce the number of calls by studying and thinking about which cases are more probable in your problem; save work for the CPU, whose threads you can't really control deterministically.
257
+
258
+
We painted the paradigms as "good" or "bad", but this goes no further than satire. Each one has its field of application, and it cannot be said that one is better than the other without establishing a particular framework because their philosophy is different. They are not supplementary; rather, they reinforce each other, but especially in the sense that DOP **reinforces OOP**.
0 commit comments