Skip to content

Commit 3faef1c

Browse files
committed
Explanation post graph dop
1 parent 58e2740 commit 3faef1c

File tree

1 file changed

+62
-62
lines changed

1 file changed

+62
-62
lines changed

_posts/2025-01-19-DOPvsOOP.md

Lines changed: 62 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ Vulnerability L1tf: Not affected
5151
## Hands-on code
5252
Let us define the number of class instances (entities) we will create to test at large scale to do the test. Better to be a large bnumber, e·g.:
5353
```cpp
54-
const int num_entities = 1000000;
54+
const int num_entities = 1000000;
5555
```
5656

5757
### ```Entity_OOP_Bad````
@@ -97,15 +97,15 @@ public:
9797
```
9898
Now, lets test it:
9999
```cpp
100-
std::chrono::duration<double> elapsedOOPDOP;
101-
std::vector<Entity_OOP_Bad> entities(num_entities);
102-
auto start = std::chrono::high_resolution_clock::now();
103-
unsigned long long start_cycles = __rdtsc();
104-
for (auto& entity : entities) entity.modifyParams();
105-
unsigned long long end_cycles = __rdtsc();
106-
elapsedOOPBad = std::chrono::high_resolution_clock::now() - start;
107-
std::cout << "OOP (Bad Order) CPU cycles: " << (end_cycles - start_cycles) << "\n";
108-
std::cout << "OOP (Bad Order) Execution time: " << elapsedOOPBad.count() << " seconds\n";
100+
std::chrono::duration<double> elapsedOOPDOP;
101+
std::vector<Entity_OOP_Bad> entities(num_entities);
102+
auto start = std::chrono::high_resolution_clock::now();
103+
unsigned long long start_cycles = __rdtsc();
104+
for (auto& entity : entities) entity.modifyParams();
105+
unsigned long long end_cycles = __rdtsc();
106+
elapsedOOPBad = std::chrono::high_resolution_clock::now() - start;
107+
std::cout << "OOP (Bad Order) CPU cycles: " << (end_cycles - start_cycles) << "\n";
108+
std::cout << "OOP (Bad Order) Execution time: " << elapsedOOPBad.count() << " seconds\n";
109109
```
110110
```text
111111
OOP (Bad Order) CPU cycles: 17961504
@@ -121,50 +121,50 @@ The attributes are reordered from largest to smallest size (first double, then f
121121
This minimizes the amount of padding required, making the structure more compact in memory.
122122
On a more technical level, when performing operations on the attributes, the machine code will perform register lookups starting from rax (rax+4, rax+20...) with fewer shifts, and thus more efficiently, if the attributes are properly ordered.
123123
```cpp
124-
class Entity_OOP_Good {
125-
public:
126-
struct atributes {
127-
double dx, dy, dz; // 8 bytes each (24 bytes total)
128-
float x, y, z; // 4 bytes each (12 bytes total)
129-
int score; // 4 bytes
130-
int score1; // 4 bytes
131-
int score2; // 4 bytes
132-
uint16_t something; // 2 bytes
133-
uint16_t something1; // 2 bytes
134-
uint16_t something2; // 2 bytes
135-
char id; // 1 byte
136-
bool active; // 1 byte
137-
// _______
138-
// 56 bytes total, alignment 8 bytes
139-
};
140-
141-
atributes mAtributes;
142-
143-
void modifyParams(){
144-
this->mAtributes.x = this->mAtributes.y = this->mAtributes.z = 0.0f;
145-
this->mAtributes.dx = this->mAtributes.dy = this->mAtributes.dz = 0.1;
146-
this->mAtributes.active = true;
147-
this->mAtributes.id = 'A';
148-
this->mAtributes.score = 100;
149-
this->mAtributes.score1 = 100;
150-
this->mAtributes.score2 = 100;
151-
this->mAtributes.something *= 2;
152-
this->mAtributes.something1 *= 2;
153-
this->mAtributes.something2 *= 2;
154-
}
124+
class Entity_OOP_Good {
125+
public:
126+
struct atributes {
127+
double dx, dy, dz; // 8 bytes each (24 bytes total)
128+
float x, y, z; // 4 bytes each (12 bytes total)
129+
int score; // 4 bytes
130+
int score1; // 4 bytes
131+
int score2; // 4 bytes
132+
uint16_t something; // 2 bytes
133+
uint16_t something1; // 2 bytes
134+
uint16_t something2; // 2 bytes
135+
char id; // 1 byte
136+
bool active; // 1 byte
137+
// _______
138+
// 56 bytes total, alignment 8 bytes
155139
};
140+
141+
atributes mAtributes;
142+
143+
void modifyParams(){
144+
this->mAtributes.x = this->mAtributes.y = this->mAtributes.z = 0.0f;
145+
this->mAtributes.dx = this->mAtributes.dy = this->mAtributes.dz = 0.1;
146+
this->mAtributes.active = true;
147+
this->mAtributes.id = 'A';
148+
this->mAtributes.score = 100;
149+
this->mAtributes.score1 = 100;
150+
this->mAtributes.score2 = 100;
151+
this->mAtributes.something *= 2;
152+
this->mAtributes.something1 *= 2;
153+
this->mAtributes.something2 *= 2;
154+
}
155+
};
156156
```
157157
Now, lets test it:
158158
```cpp
159-
std::chrono::duration<double> elapsedOOPDOP;
160-
std::vector<Entity_OOP_Good> entities(num_entities);
161-
auto start = std::chrono::high_resolution_clock::now();
162-
unsigned long long start_cycles = __rdtsc();
163-
for (auto& entity : entities) entity.modifyParams();
164-
unsigned long long end_cycles = __rdtsc();
165-
elapsedOOPDOP = std::chrono::high_resolution_clock::now() - start;
166-
std::cout << "OOP (Good Order by DOP) CPU cycles: " << (end_cycles - start_cycles) << "\n";
167-
std::cout << "OOP (Good Order by DOP) Execution time: " << elapsedOOPDOP.count() << " seconds\n";
159+
std::chrono::duration<double> elapsedOOPDOP;
160+
std::vector<Entity_OOP_Good> entities(num_entities);
161+
auto start = std::chrono::high_resolution_clock::now();
162+
unsigned long long start_cycles = __rdtsc();
163+
for (auto& entity : entities) entity.modifyParams();
164+
unsigned long long end_cycles = __rdtsc();
165+
elapsedOOPDOP = std::chrono::high_resolution_clock::now() - start;
166+
std::cout << "OOP (Good Order by DOP) CPU cycles: " << (end_cycles - start_cycles) << "\n";
167+
std::cout << "OOP (Good Order by DOP) Execution time: " << elapsedOOPDOP.count() << " seconds\n";
168168
```
169169
```text
170170
OOP (Good Order by DOP) CPU cycles: 15459546
@@ -215,15 +215,15 @@ public:
215215
```
216216
Now, lets test it:
217217
```cpp
218-
std::chrono::duration<double> elapsedOOPDOP_GoodWithFooPadding;
219-
std::vector<Entity_OOP_GoodWithFooPadding> entities(num_entities);
220-
auto start = std::chrono::high_resolution_clock::now();
221-
unsigned long long start_cycles = __rdtsc();
222-
for (auto& entity : entities) entity.modifyParams();
223-
unsigned long long end_cycles = __rdtsc();
224-
elapsedOOPDOP_GoodWithFooPadding = std::chrono::high_resolution_clock::now() - start;
225-
std::cout << "OOP (Good Order by DOP and Foo Padding) CPU cycles: " << (end_cycles - start_cycles) << "\n";
226-
std::cout << "OOP (Good Order by DOP and Foo Padding) Execution time: " << elapsedOOPDOP_GoodWithFooPadding.count() << " seconds\n";
218+
std::chrono::duration<double> elapsedOOPDOP_GoodWithFooPadding;
219+
std::vector<Entity_OOP_GoodWithFooPadding> entities(num_entities);
220+
auto start = std::chrono::high_resolution_clock::now();
221+
unsigned long long start_cycles = __rdtsc();
222+
for (auto& entity : entities) entity.modifyParams();
223+
unsigned long long end_cycles = __rdtsc();
224+
elapsedOOPDOP_GoodWithFooPadding = std::chrono::high_resolution_clock::now() - start;
225+
std::cout << "OOP (Good Order by DOP and Foo Padding) CPU cycles: " << (end_cycles - start_cycles) << "\n";
226+
std::cout << "OOP (Good Order by DOP and Foo Padding) Execution time: " << elapsedOOPDOP_GoodWithFooPadding.count() << " seconds\n";
227227
```
228228
```text
229229
OOP (Good Order by DOP and Foo Padding) CPU cycles: 14294218
@@ -233,8 +233,8 @@ OOP (Good Order by DOP and Foo Padding) Execution time: 0.00531921 seconds
233233
Even faster. We have found an evidence to the presented hypotesis. Lets summarize the resultd:
234234

235235
```cpp
236-
std::cout << "With DOP, the processing is " << (elapsedOOPBad.count() - elapsedOOPDOP.count()) * 1e3 << " ms faster\n";
237-
std::cout << "With DOP and Foo Padding, the processing is " << (elapsedOOPBad.count() - elapsedOOPDOP_GoodWithFooPadding.count()) * 1e3 << " ms faster\n";
236+
std::cout << "With DOP, the processing is " << (elapsedOOPBad.count() - elapsedOOPDOP.count()) * 1e3 << " ms faster\n";
237+
std::cout << "With DOP and Foo Padding, the processing is " << (elapsedOOPBad.count() - elapsedOOPDOP_GoodWithFooPadding.count()) * 1e3 << " ms faster\n";
238238
```
239239
```text
240240
With DOP, the processing is 0.931258 ms faster
@@ -245,7 +245,7 @@ With DOP and Foo Padding, the processing is 1.36449 ms faster
245245

246246
One may wonder, but what if this was something more casual than causal? And what if it was just a quick coincidence? We can run this $$n$$ times to see if Gauss is on our side (is it true that DOP works or not?).
247247

248-
Graph results after running the test many (1000) times and analyzing which methods were the fastest:
248+
Below are the graph results after running the test many (1000) times and analyzing which methods were the fastest. There are three methods and three positions (gold, silver, and bronze), depending on which method was the fastest or slowest. The graphs show a summary of the positions in which each method ended up in all the iterations.
249249

250250
![podium_comparison_ms](../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms.png)
251251
![podium_comparison_ticks](../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks.png)

0 commit comments

Comments
 (0)