@@ -751,300 +751,3 @@ GitHub: github.com/simdjson/simdjson
751751Thank you!
752752
753753---
754-
755- # BONUS: Assembly Deep Dive
756-
757- Want to see the actual machine code?
758-
759- Let's look under the hood! 🔧
760-
761- ---
762-
763- # The Shocking Truth: Instruction Counts
764-
765- <div style =" display : flex ; align-items : center ; gap : 30px " >
766- <div style =" flex : 1.5 " >
767-
768- ![ Instruction Count Analysis] ( images/bonus_chart1_instructions.png )
769-
770- </div >
771- <div style =" flex : 1 " >
772-
773- ### The Numbers:
774- - ** Manual:** 1,635 instructions
775- - ** Reflection:** 648 instructions
776- - ** Speedup:** 2.5x fewer!
777-
778- ### You Write:
779- - ** Manual:** 70+ lines of C++
780- - ** Reflection:** 1 line!
781-
782- [ Try it yourself →] ( https://godbolt.org/z/94jPx6bEb )
783-
784- </div >
785- </div >
786-
787- ---
788-
789- # Field Names: The Power of Compile-Time Constants
790-
791- <div style =" display : flex ; gap : 20px " >
792- <div style =" flex : 1 " >
793-
794- ### Manual: Byte-by-byte
795- ``` asm
796- mov byte ptr [rdx], 34 ; '"'
797- mov byte ptr [rdx+1], 109 ; 'm'
798- mov byte ptr [rdx+2], 97 ; 'a'
799- mov byte ptr [rdx+3], 107 ; 'k'
800- mov byte ptr [rdx+4], 101 ; 'e'
801- mov byte ptr [rdx+5], 34 ; '"'
802- mov byte ptr [rdx+6], 58 ; ':'
803- ; ... plus bounds checks
804- ```
805- ** 50+ instructions per field name**
806-
807- </div >
808- <div style =" flex : 1 " >
809-
810- ### Reflection: 64-bit constant
811- ``` asm
812- movabs rax, 0x223A656B616D22
813- ; "make":" as single value!
814-
815- mov qword ptr [rdx], rax
816- ; Store 8 bytes at once!
817- ```
818- ** 2 instructions per field name**
819-
820- </div >
821- </div >
822-
823- ![ Field Name Encoding] ( images/bonus_chart3_fields.png )
824-
825- * Source: compiler_explorer_instruction_comparison.asm - 25x fewer instructions for field operations*
826-
827- ---
828-
829- # Branch Prediction: The Hidden Performance Killer
830-
831- <div style =" columns : 2 ; column-gap : 40px " >
832-
833- ### Manual: 311 branches! 😱
834- ``` asm
835- cmp al, 34 ; quote?
836- je .LBB0_19 ; branch!
837- cmp al, 92 ; backslash?
838- je .LBB0_27 ; branch!
839- cmp al, 10 ; newline?
840- je .LBB0_35 ; branch!
841- cmp al, 13 ; return?
842- je .LBB0_42 ; branch!
843- ; ... 300+ more conditions
844- ```
845-
846- ** Problem:** Each branch = potential CPU pipeline stall
847-
848- <div style =" break-before : column " ></div >
849-
850- ### Reflection: 20 branches 🎯
851- ``` asm
852- call simdjson::to_json_string
853- ; Most logic inside optimized
854- ; library with straight-line
855- ; SIMD code
856- ```
857-
858- ** Benefit:**
859- - 15x fewer misprediction opportunities
860- - Better CPU pipeline utilization
861- - Predictable control flow
862-
863- </div >
864-
865- * Measured from assembly: 311 je/jne/jb/ja instructions vs 20*
866-
867- ---
868-
869- # Memory Allocation: Death by a Thousand Cuts
870-
871- <style scoped >
872- table {
873- font-size : 0.9em ;
874- }
875- </style >
876-
877- | Operation | Manual | Reflection | Impact |
878- | -----------| --------| ------------| --------|
879- | String appends | 40 | 5 | 8x fewer |
880- | Memory reallocations | 235 | 1 | ** 235x fewer!** |
881- | Escape checks | 600+ | (inside lib) | Bulk SIMD |
882-
883- ### Manual: Growing pain
884- ``` cpp
885- std::string json = " {" ; // alloc 1
886- json += " \" make\" :\" " ; // realloc 2
887- json += car.make; // realloc 3
888- json += " \" ,\" model\" :\" " ; // realloc 4
889- // ... 231 more reallocations!
890- ```
891-
892- ### Reflection: Pre-sized perfection
893- ``` cpp
894- return simdjson::to_json(car); // 1 allocation, perfectly sized!
895- ```
896-
897- * Source: Assembly analysis of compiler_explorer_instruction_comparison.asm*
898-
899- ---
900-
901- # Real Code Comparison
902-
903- ## What developers write (Manual):
904- ``` cpp
905- std::string serialize_manual (const Car& car) {
906- std::string json = "{";
907- json += "\" make\" :\" ";
908- for (char c : car.make) {
909- switch(c) {
910- case '"': json += "\\\" "; break;
911- case '\\ ': json += "\\\\ "; break;
912- case '\n': json += "\\ n"; break;
913- // ... more escape cases
914- default: json += c;
915- }
916- }
917- json += "\" ,\" model\" :\" ";
918- // ... 70+ more lines of similar code
919- }
920- ```
921-
922- ## What developers write (Reflection):
923- ```cpp
924- std::string serialize_reflection(const Car& car) {
925- return simdjson::to_json(car); // That's it!
926- }
927- ```
928-
929- Try both: https://godbolt.org/z/1n539e7cq
930-
931- ---
932-
933- # Branch Complexity Analysis
934-
935- ![ Branch Complexity] ( images/bonus_chart2_branches.png )
936-
937- ### What the Numbers Mean:
938- - ** Manual:** 311 conditional branches in assembly
939- - ** Reflection:** 20 conditional branches in assembly
940- - ** Impact:** Fewer branches = fewer potential mispredictions
941- - ** Note:** Actual performance depends on data patterns
942-
943- ---
944-
945-
946-
947- # How Reflection Optimizes
948-
949- ## Compile-Time Field Discovery
950- ``` cpp
951- template for (constexpr auto member :
952- std::meta::nonstatic_data_members_of (^^Car)) {
953- // Field names known at compile time!
954- // Compiler generates optimal code for each field
955- }
956- ```
957-
958- ## Result: Pre-computed Constants
959- - Field names → 64-bit integers
960- - String lengths → compile-time constants
961- - Escape sequences → eliminated entirely
962- - Buffer sizes → calculated at compile time
963-
964- ---
965-
966- # Escape Processing: Different Approaches
967-
968- ## Manual: Character-by-character checking
969- ```cpp
970- for (char c : str) {
971- if (c == '"') output += "\\\"";
972- else if (c == '\\') output += "\\\\";
973- else if (c < 0x20) {
974- // Unicode escape sequence
975- snprintf(buf, 7, "\\u%04x", c);
976- output += buf;
977- }
978- // ... more checks
979- }
980- ```
981-
982- ## Reflection: Library handles escaping
983- - Escaping logic encapsulated in simdjson
984- - Implementation may use SIMD for bulk processing
985- - Details hidden inside ` simdjson::to_json_string `
986-
987- ---
988-
989- # Try It Yourself!
990-
991- ## Compiler Explorer Links:
992-
993- 1 . ** Basic Comparison** (Manual vs Reflection):
994- https://godbolt.org/z/1n539e7cq
995-
996- 2 . ** Reflection-Only Serialization** :
997- https://godbolt.org/z/94jPx6bEb
998-
999- 3 . ** Full simdjson Integration** (requires reflection support):
1000- ``` bash
1001- clang++ -std=c++26 -freflection \
1002- -fexpansion-statements -O3
1003- ```
1004-
1005- ## What to Look For:
1006- - Search for ` movabs ` instructions with large numbers
1007- - Count the ` je/jne/jb/ja ` branch instructions
1008- - Look at the size of each function
1009- - Notice the ` .rodata ` section with pre-computed strings
1010-
1011- ---
1012-
1013- # Why This Matters for Real Applications
1014-
1015- ## Benefits Compound:
1016- 1 . Fewer instructions → Better I-cache usage
1017- 2 . Fewer branches → Better speculation
1018- 3 . Compile-time strings → Better D-cache usage
1019- 4 . SIMD-ready layout → Vectorization opportunities
1020-
1021- ---
1022-
1023- # Key Takeaways from Assembly Analysis
1024-
1025- 1 . ** Reflection generates highly optimized code**
1026- - Consistently applies optimizations
1027- - Eliminates manual boilerplate
1028- - Reduces opportunity for errors
1029-
1030- 2 . ** Compile-time is powerful**
1031- - Field names become constants
1032- - No runtime string building
1033- - Pre-computed buffer sizes
1034-
1035- 3 . ** Modern C++ delivers on its promises**
1036- - Zero-overhead abstraction is real
1037- - Better performance AND better ergonomics
1038-
1039- 4 . ** simdjson + reflection = excellent match**
1040- - Compile-time structure analysis
1041- - Optimized library implementation
1042- - Significant reduction in code complexity
1043-
1044- ---
1045-
1046- # End of Bonus Section
1047-
1048- Return to main presentation or explore the code yourself!
1049-
1050- Remember: The assembly doesn't lie! 🚀
0 commit comments