Explanation post graph dop

agarnung · agarnung · commit 3faef1cfa0c5 · 2025-01-20T16:39:21.000+01:00
diff --git a/_posts/2025-01-19-DOPvsOOP.md b/_posts/2025-01-19-DOPvsOOP.md
@@ -51,7 +51,7 @@ Vulnerability L1tf:                   Not affected
 ## Hands-on code
 Let us define the number of class instances (entities) we will create to test at large scale to do the test. Better to be a large bnumber, e·g.:
 ```cpp
-    const int num_entities = 1000000;
+const int num_entities = 1000000;
 ```
 
 ### ```Entity_OOP_Bad````
@@ -97,15 +97,15 @@ public:
 ```
 Now, lets test it:
 ```cpp
-    std::chrono::duration<double> elapsedOOPDOP;
-    std::vector<Entity_OOP_Bad> entities(num_entities);
-    auto start = std::chrono::high_resolution_clock::now();
-    unsigned long long start_cycles = __rdtsc();
-    for (auto& entity : entities) entity.modifyParams();
-    unsigned long long end_cycles = __rdtsc();
-    elapsedOOPBad = std::chrono::high_resolution_clock::now() - start;
-    std::cout << "OOP (Bad Order) CPU cycles: " << (end_cycles - start_cycles) << "\n";
-    std::cout << "OOP (Bad Order) Execution time: " << elapsedOOPBad.count() << " seconds\n";
+std::chrono::duration<double> elapsedOOPDOP;
+std::vector<Entity_OOP_Bad> entities(num_entities);
+auto start = std::chrono::high_resolution_clock::now();
+unsigned long long start_cycles = __rdtsc();
+for (auto& entity : entities) entity.modifyParams();
+unsigned long long end_cycles = __rdtsc();
+elapsedOOPBad = std::chrono::high_resolution_clock::now() - start;
+std::cout << "OOP (Bad Order) CPU cycles: " << (end_cycles - start_cycles) << "\n";
+std::cout << "OOP (Bad Order) Execution time: " << elapsedOOPBad.count() << " seconds\n";
 ```
 ```text
 OOP (Bad Order) CPU cycles: 17961504
@@ -121,50 +121,50 @@ The attributes are reordered from largest to smallest size (first double, then f
 This minimizes the amount of padding required, making the structure more compact in memory.
 On a more technical level, when performing operations on the attributes, the machine code will perform register lookups starting from rax (rax+4, rax+20...) with fewer shifts, and thus more efficiently, if the attributes are properly ordered.
 ```cpp
-    class Entity_OOP_Good {
-    public:
-        struct atributes {
-            double dx, dy, dz;   // 8 bytes each (24 bytes total)
-            float x, y, z;       // 4 bytes each (12 bytes total)
-            int score;           // 4 bytes
-            int score1;          // 4 bytes
-            int score2;          // 4 bytes
-            uint16_t something;  // 2 bytes
-            uint16_t something1; // 2 bytes
-            uint16_t something2; // 2 bytes
-            char id;             // 1 byte
-            bool active;         // 1 byte
-                                 // _______
-                                 // 56 bytes total, alignment 8 bytes
-        };
-
-        atributes mAtributes;
-
-        void modifyParams(){
-            this->mAtributes.x = this->mAtributes.y = this->mAtributes.z = 0.0f;
-            this->mAtributes.dx = this->mAtributes.dy = this->mAtributes.dz = 0.1;
-            this->mAtributes.active = true;
-            this->mAtributes.id = 'A';
-            this->mAtributes.score = 100;
-            this->mAtributes.score1 = 100;
-            this->mAtributes.score2 = 100;
-            this->mAtributes.something *= 2;
-            this->mAtributes.something1 *= 2;
-            this->mAtributes.something2 *= 2;
-        }
+class Entity_OOP_Good {
+public:
+    struct atributes {
+        double dx, dy, dz;   // 8 bytes each (24 bytes total)
+        float x, y, z;       // 4 bytes each (12 bytes total)
+        int score;           // 4 bytes
+        int score1;          // 4 bytes
+        int score2;          // 4 bytes
+        uint16_t something;  // 2 bytes
+        uint16_t something1; // 2 bytes
+        uint16_t something2; // 2 bytes
+        char id;             // 1 byte
+        bool active;         // 1 byte
+                                // _______
+                                // 56 bytes total, alignment 8 bytes
     };
+
+    atributes mAtributes;
+
+    void modifyParams(){
+        this->mAtributes.x = this->mAtributes.y = this->mAtributes.z = 0.0f;
+        this->mAtributes.dx = this->mAtributes.dy = this->mAtributes.dz = 0.1;
+        this->mAtributes.active = true;
+        this->mAtributes.id = 'A';
+        this->mAtributes.score = 100;
+        this->mAtributes.score1 = 100;
+        this->mAtributes.score2 = 100;
+        this->mAtributes.something *= 2;
+        this->mAtributes.something1 *= 2;
+        this->mAtributes.something2 *= 2;
+    }
+};
 ```
 Now, lets test it:
 ```cpp
-    std::chrono::duration<double> elapsedOOPDOP;
-    std::vector<Entity_OOP_Good> entities(num_entities);
-    auto start = std::chrono::high_resolution_clock::now();
-    unsigned long long start_cycles = __rdtsc();
-    for (auto& entity : entities) entity.modifyParams();
-    unsigned long long end_cycles = __rdtsc();
-    elapsedOOPDOP = std::chrono::high_resolution_clock::now() - start;
-    std::cout << "OOP (Good Order by DOP) CPU cycles: " << (end_cycles - start_cycles) << "\n";
-    std::cout << "OOP (Good Order by DOP) Execution time: " << elapsedOOPDOP.count() << " seconds\n";
+std::chrono::duration<double> elapsedOOPDOP;
+std::vector<Entity_OOP_Good> entities(num_entities);
+auto start = std::chrono::high_resolution_clock::now();
+unsigned long long start_cycles = __rdtsc();
+for (auto& entity : entities) entity.modifyParams();
+unsigned long long end_cycles = __rdtsc();
+elapsedOOPDOP = std::chrono::high_resolution_clock::now() - start;
+std::cout << "OOP (Good Order by DOP) CPU cycles: " << (end_cycles - start_cycles) << "\n";
+std::cout << "OOP (Good Order by DOP) Execution time: " << elapsedOOPDOP.count() << " seconds\n";
 ```
 ```text
 OOP (Good Order by DOP) CPU cycles: 15459546
@@ -215,15 +215,15 @@ public:
 ```
 Now, lets test it:
 ```cpp
-    std::chrono::duration<double> elapsedOOPDOP_GoodWithFooPadding;
-    std::vector<Entity_OOP_GoodWithFooPadding> entities(num_entities);
-    auto start = std::chrono::high_resolution_clock::now();
-    unsigned long long start_cycles = __rdtsc();
-    for (auto& entity : entities) entity.modifyParams();
-    unsigned long long end_cycles = __rdtsc();
-    elapsedOOPDOP_GoodWithFooPadding = std::chrono::high_resolution_clock::now() - start;
-    std::cout << "OOP (Good Order by DOP and Foo Padding) CPU cycles: " << (end_cycles - start_cycles) << "\n";
-    std::cout << "OOP (Good Order by DOP and Foo Padding) Execution time: " << elapsedOOPDOP_GoodWithFooPadding.count() << " seconds\n";
+std::chrono::duration<double> elapsedOOPDOP_GoodWithFooPadding;
+std::vector<Entity_OOP_GoodWithFooPadding> entities(num_entities);
+auto start = std::chrono::high_resolution_clock::now();
+unsigned long long start_cycles = __rdtsc();
+for (auto& entity : entities) entity.modifyParams();
+unsigned long long end_cycles = __rdtsc();
+elapsedOOPDOP_GoodWithFooPadding = std::chrono::high_resolution_clock::now() - start;
+std::cout << "OOP (Good Order by DOP and Foo Padding) CPU cycles: " << (end_cycles - start_cycles) << "\n";
+std::cout << "OOP (Good Order by DOP and Foo Padding) Execution time: " << elapsedOOPDOP_GoodWithFooPadding.count() << " seconds\n";
 ```
 ```text
 OOP (Good Order by DOP and Foo Padding) CPU cycles: 14294218
@@ -233,8 +233,8 @@ OOP (Good Order by DOP and Foo Padding) Execution time: 0.00531921 seconds
 Even faster. We have found an evidence to the presented hypotesis. Lets summarize the resultd:
 
 ```cpp
-    std::cout << "With DOP, the processing is " << (elapsedOOPBad.count() - elapsedOOPDOP.count()) * 1e3 << " ms faster\n";
-    std::cout << "With DOP and Foo Padding, the processing is " << (elapsedOOPBad.count() - elapsedOOPDOP_GoodWithFooPadding.count()) * 1e3 << " ms faster\n";
+std::cout << "With DOP, the processing is " << (elapsedOOPBad.count() - elapsedOOPDOP.count()) * 1e3 << " ms faster\n";
+std::cout << "With DOP and Foo Padding, the processing is " << (elapsedOOPBad.count() - elapsedOOPDOP_GoodWithFooPadding.count()) * 1e3 << " ms faster\n";
 ```
 ```text
 With DOP, the processing is 0.931258 ms faster
@@ -245,7 +245,7 @@ With DOP and Foo Padding, the processing is 1.36449 ms faster
 
 One may wonder, but what if this was something more casual than causal? And what if it was just a quick coincidence? We can run this $$n$$ times to see if Gauss is on our side (is it true that DOP works or not?).
 
-Graph results after running the test many (1000) times and analyzing which methods were the fastest:
+Below are the graph results after running the test many (1000) times and analyzing which methods were the fastest. There are three methods and three positions (gold, silver, and bronze), depending on which method was the fastest or slowest. The graphs show a summary of the positions in which each method ended up in all the iterations.
 
 ![podium_comparison_ms](../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms.png)
 ![podium_comparison_ticks](../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks.png)