@@ -263,66 +263,85 @@ Mostly, the results align with what was experienced before; careful structuring
263263### Compiler customization
264264Let’s be more austere. In the following, we will enable some [ compiler flags] ( https://caiorss.github.io/C-Cpp-Notes/compiler-flags-options.html ) for the g++ (GCC) compiler and analyze whether the graphs vary significantly or not. We are using ``` Qt 6.8.1 ``` and specifying the flags in the ``` .pro ``` file via the ```QMAKE_CXXFLAGS`` variable.
265265
266- 1 . With no flags:
267- ![ podium_comparison_ms_1] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_1.png )
268- ![ podium_comparison_ticks_1] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_1.png )
266+ #### 1. With no flags:
267+ <div style =" display : flex ; justify-content : space-between ;" >
268+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_1.png " alt =" podium_comparison_ms_1 " style =" flex : 1 ; max-width : 48% ;" >
269+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_1.png " alt =" podium_comparison_ticks_1 " style =" flex : 1 ; max-width : 48% ;" >
270+ </div >
269271
270- 2 . No optimization:
272+ #### 2. No optimization:
271273``` .pro
272274QMAKE_CXXFLAGS += O0
273275```
274276Faster compilation time and better for debugging.
275- ![ podium_comparison_ms_2] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_2.png )
276- ![ podium_comparison_ticks_2] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_2.png )
277+ <div style =" display : flex ; justify-content : space-between ;" >
278+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_2.png " alt =" podium_comparison_ms_2 " style =" flex : 1 ; max-width : 48% ;" >
279+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_2.png " alt =" podium_comparison_ticks_2 " style =" flex : 1 ; max-width : 48% ;" >
280+ </div >
277281
278- 3 . O2 optimization:
282+ #### 3. O2 optimization:
279283``` .pro
280284QMAKE_CXXFLAGS += O2
281285```
282286High level of optimization. Slower compilation time, better for releasing.
283- ![ podium_comparison_ms_3] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_3.png )
284- ![ podium_comparison_ticks_3] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_3.png )
287+ <div style =" display : flex ; justify-content : space-between ;" >
288+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_3.png " alt =" podium_comparison_ms_3 " style =" flex : 1 ; max-width : 48% ;" >
289+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_3.png " alt =" podium_comparison_ticks_3 " style =" flex : 1 ; max-width : 48% ;" >
290+ </div >
285291
286- 4 . O3 optimization:
292+ #### 4. O3 optimization:
287293``` .pro
288294QMAKE_CXXFLAGS += O3
289295```
290296Higher (most aggressive) level of optimization. Slower compilation time, better for releasing.
291- ![ podium_comparison_ms_4] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_4.png )
292- ![ podium_comparison_ticks_4] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_4.png )
297+ <div style =" display : flex ; justify-content : space-between ;" >
298+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_4.png " alt =" podium_comparison_ms_4 " style =" flex : 1 ; max-width : 48% ;" >
299+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_4.png " alt =" podium_comparison_ticks_4 " style =" flex : 1 ; max-width : 48% ;" >
300+ </div >
293301
294- 5 . No optimization, march native:
302+ #### 5. No optimization, march native:
295303``` .pro
296304QMAKE_CXXFLAGS += -march=native
297305```
298306To utilize all specific characteristics of your CPU hardware.
299- ![ podium_comparison_ms_5] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_5.png )
300- ![ podium_comparison_ticks_5] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_5.png )
307+ <div style =" display : flex ; justify-content : space-between ;" >
308+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_5.png " alt =" podium_comparison_ms_5 " style =" flex : 1 ; max-width : 48% ;" >
309+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_5.png " alt =" podium_comparison_ticks_5 " style =" flex : 1 ; max-width : 48% ;" >
310+ </div >
301311
302- 6 . O3 optimization, march native:
312+ #### 6. O3 optimization, march native:
303313``` .pro
304314QMAKE_CXXFLAGS += -O3 -march=native
305315```
306- ![ podium_comparison_ms_6] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_6.png )
307- ![ podium_comparison_ticks_6] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_6.png )
316+ <div style =" display : flex ; justify-content : space-between ;" >
317+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_6.png " alt =" podium_comparison_ms_6 " style =" flex : 1 ; max-width : 48% ;" >
318+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_6.png " alt =" podium_comparison_ticks_6 " style =" flex : 1 ; max-width : 48% ;" >
319+ </div >
308320
309- 7 . Vectorizing:
321+ #### 7. Vectorizing:
310322``` .pro
311323QMAKE_CXXFLAGS += -ftree-vectorize -mavx -mavx2 -msse4.2
312324```
313325Leveraging advanced parallel processing with SIMD (AVX and AVX2) capabilities.
314- ![ podium_comparison_ms_7] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_7.png )
315- ![ podium_comparison_ticks_7] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_7.png )
326+ <div style =" display : flex ; justify-content : space-between ;" >
327+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_7.png " alt =" podium_comparison_ms_7 " style =" flex : 1 ; max-width : 48% ;" >
328+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_7.png " alt =" podium_comparison_ticks_7 " style =" flex : 1 ; max-width : 48% ;" >
329+ </div >
316330
317- 8 . All for one and one for all:
331+ #### 8. All for one and one for all:
318332``` .pro
319333QMAKE_CXXFLAGS += -O3 -march=native -funroll-loops -fomit-frame-pointer -finline-functions -ftree-vectorize -mavx -mavx2 -msse4.2
320334```
321335``` -funroll-loops ``` : Optimizes loops by unrolling them, which can speed up repetitive iterations.
336+
322337``` -fomit-frame-pointer ``` : Removes the frame pointer register to optimize register usage.
338+
323339``` -finline-functions ``` : Forces the inlining of small functions, improving performance.
324- ![ podium_comparison_ms_8] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_8.png )
325- ![ podium_comparison_ticks_8] ( ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_8.png )
340+
341+ <div style =" display : flex ; justify-content : space-between ;" >
342+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ms_8.png " alt =" podium_comparison_ms_8 " style =" flex : 1 ; max-width : 48% ;" >
343+ <img src =" ../assets/blog_images/2025-01-19-DOPvsOOP/podium_comparison_ticks_8.png " alt =" podium_comparison_ticks_8 " style =" flex : 1 ; max-width : 48% ;" >
344+ </div >
326345
327346### Conclusion
328347Modern CPUs access memory in blocks (typically 8 bytes or more). If the data is properly aligned in memory, access is faster because it can load and store the data in a single memory cycle. If the data is not properly aligned, the CPU may have to perform more memory accesses, which introduces performance penalties due to the need to correct the alignment at runtime.
0 commit comments