@@ -1175,8 +1175,7 @@ Welcome to the future of C++ serialization! 🚀
11751175- The authors of P2996 for making compile-time reflection a reality
11761176
11771177** Compiler Implementation Teams**
1178- - EDG team for the experimental implementation
1179- - Contributors to experimental GCC and Clang branches
1178+ - Everyone that implemented P2996 and made it publicly available.
11801179- Early adopters testing and providing feedback
11811180
11821181** Compiler Explorer Team**
@@ -1197,3 +1196,302 @@ Daniel Lemire and Francisco Geiman Thiesen
11971196GitHub: github.com/simdjson/simdjson
11981197
11991198Thank you!
1199+
1200+ ---
1201+
1202+ # BONUS: Assembly Deep Dive
1203+
1204+ Want to see the actual machine code?
1205+
1206+ Let's look under the hood! 🔧
1207+
1208+ ---
1209+
1210+ # The Shocking Truth: Instruction Counts
1211+
1212+ <div style =" display : flex ; align-items : center ; gap : 30px " >
1213+ <div style =" flex : 1.5 " >
1214+
1215+ ![ Instruction Count Analysis] ( images/bonus_chart1_instructions.png )
1216+
1217+ </div >
1218+ <div style =" flex : 1 " >
1219+
1220+ ### The Numbers:
1221+ - ** Manual:** 1,635 instructions
1222+ - ** Reflection:** 648 instructions
1223+ - ** Speedup:** 2.5x fewer!
1224+
1225+ ### You Write:
1226+ - ** Manual:** 70+ lines of C++
1227+ - ** Reflection:** 1 line!
1228+
1229+ [ Try it yourself →] ( https://godbolt.org/z/94jPx6bEb )
1230+
1231+ </div >
1232+ </div >
1233+
1234+ ---
1235+
1236+ # Field Names: The Power of Compile-Time Constants
1237+
1238+ <div style =" display : flex ; gap : 20px " >
1239+ <div style =" flex : 1 " >
1240+
1241+ ### Manual: Byte-by-byte
1242+ ``` asm
1243+ mov byte ptr [rdx], 34 ; '"'
1244+ mov byte ptr [rdx+1], 109 ; 'm'
1245+ mov byte ptr [rdx+2], 97 ; 'a'
1246+ mov byte ptr [rdx+3], 107 ; 'k'
1247+ mov byte ptr [rdx+4], 101 ; 'e'
1248+ mov byte ptr [rdx+5], 34 ; '"'
1249+ mov byte ptr [rdx+6], 58 ; ':'
1250+ ; ... plus bounds checks
1251+ ```
1252+ ** 50+ instructions per field name**
1253+
1254+ </div >
1255+ <div style =" flex : 1 " >
1256+
1257+ ### Reflection: 64-bit constant
1258+ ``` asm
1259+ movabs rax, 0x223A656B616D22
1260+ ; "make":" as single value!
1261+
1262+ mov qword ptr [rdx], rax
1263+ ; Store 8 bytes at once!
1264+ ```
1265+ ** 2 instructions per field name**
1266+
1267+ </div >
1268+ </div >
1269+
1270+ ![ Field Name Encoding] ( images/bonus_chart3_fields.png )
1271+
1272+ * Source: compiler_explorer_instruction_comparison.asm - 25x fewer instructions for field operations*
1273+
1274+ ---
1275+
1276+ # Branch Prediction: The Hidden Performance Killer
1277+
1278+ <div style =" columns : 2 ; column-gap : 40px " >
1279+
1280+ ### Manual: 311 branches! 😱
1281+ ``` asm
1282+ cmp al, 34 ; quote?
1283+ je .LBB0_19 ; branch!
1284+ cmp al, 92 ; backslash?
1285+ je .LBB0_27 ; branch!
1286+ cmp al, 10 ; newline?
1287+ je .LBB0_35 ; branch!
1288+ cmp al, 13 ; return?
1289+ je .LBB0_42 ; branch!
1290+ ; ... 300+ more conditions
1291+ ```
1292+
1293+ ** Problem:** Each branch = potential CPU pipeline stall
1294+
1295+ <div style =" break-before : column " ></div >
1296+
1297+ ### Reflection: 20 branches 🎯
1298+ ``` asm
1299+ call simdjson::to_json_string
1300+ ; Most logic inside optimized
1301+ ; library with straight-line
1302+ ; SIMD code
1303+ ```
1304+
1305+ ** Benefit:**
1306+ - 15x fewer misprediction opportunities
1307+ - Better CPU pipeline utilization
1308+ - Predictable control flow
1309+
1310+ </div >
1311+
1312+ * Measured from assembly: 311 je/jne/jb/ja instructions vs 20*
1313+
1314+ ---
1315+
1316+ # Memory Allocation: Death by a Thousand Cuts
1317+
1318+ <style scoped >
1319+ table {
1320+ font-size : 0.9em ;
1321+ }
1322+ </style >
1323+
1324+ | Operation | Manual | Reflection | Impact |
1325+ | -----------| --------| ------------| --------|
1326+ | String appends | 40 | 5 | 8x fewer |
1327+ | Memory reallocations | 235 | 1 | ** 235x fewer!** |
1328+ | Escape checks | 600+ | (inside lib) | Bulk SIMD |
1329+
1330+ ### Manual: Growing pain
1331+ ``` cpp
1332+ std::string json = " {" ; // alloc 1
1333+ json += " \" make\" :\" " ; // realloc 2
1334+ json += car.make; // realloc 3
1335+ json += " \" ,\" model\" :\" " ; // realloc 4
1336+ // ... 231 more reallocations!
1337+ ```
1338+
1339+ ### Reflection: Pre-sized perfection
1340+ ``` cpp
1341+ return simdjson::to_json(car); // 1 allocation, perfectly sized!
1342+ ```
1343+
1344+ * Source: Assembly analysis of compiler_explorer_instruction_comparison.asm*
1345+
1346+ ---
1347+
1348+ # Real Code Comparison
1349+
1350+ ## What developers write (Manual):
1351+ ``` cpp
1352+ std::string serialize_manual (const Car& car) {
1353+ std::string json = "{";
1354+ json += "\" make\" :\" ";
1355+ for (char c : car.make) {
1356+ switch(c) {
1357+ case '"': json += "\\\" "; break;
1358+ case '\\ ': json += "\\\\ "; break;
1359+ case '\n': json += "\\ n"; break;
1360+ // ... more escape cases
1361+ default: json += c;
1362+ }
1363+ }
1364+ json += "\" ,\" model\" :\" ";
1365+ // ... 70+ more lines of similar code
1366+ }
1367+ ```
1368+
1369+ ## What developers write (Reflection):
1370+ ```cpp
1371+ std::string serialize_reflection(const Car& car) {
1372+ return simdjson::to_json(car); // That's it!
1373+ }
1374+ ```
1375+
1376+ Try both: https://godbolt.org/z/1n539e7cq
1377+
1378+ ---
1379+
1380+ # Branch Complexity Analysis
1381+
1382+ ![ Branch Complexity] ( images/bonus_chart2_branches.png )
1383+
1384+ ### What the Numbers Mean:
1385+ - ** Manual:** 311 conditional branches in assembly
1386+ - ** Reflection:** 20 conditional branches in assembly
1387+ - ** Impact:** Fewer branches = fewer potential mispredictions
1388+ - ** Note:** Actual performance depends on data patterns
1389+
1390+ ---
1391+
1392+
1393+
1394+ # How Reflection Optimizes
1395+
1396+ ## Compile-Time Field Discovery
1397+ ``` cpp
1398+ template for (constexpr auto member :
1399+ std::meta::nonstatic_data_members_of (^^Car)) {
1400+ // Field names known at compile time!
1401+ // Compiler generates optimal code for each field
1402+ }
1403+ ```
1404+
1405+ ## Result: Pre-computed Constants
1406+ - Field names → 64-bit integers
1407+ - String lengths → compile-time constants
1408+ - Escape sequences → eliminated entirely
1409+ - Buffer sizes → calculated at compile time
1410+
1411+ ---
1412+
1413+ # Escape Processing: Different Approaches
1414+
1415+ ## Manual: Character-by-character checking
1416+ ```cpp
1417+ for (char c : str) {
1418+ if (c == '"') output += "\\\"";
1419+ else if (c == '\\') output += "\\\\";
1420+ else if (c < 0x20) {
1421+ // Unicode escape sequence
1422+ snprintf(buf, 7, "\\u%04x", c);
1423+ output += buf;
1424+ }
1425+ // ... more checks
1426+ }
1427+ ```
1428+
1429+ ## Reflection: Library handles escaping
1430+ - Escaping logic encapsulated in simdjson
1431+ - Implementation may use SIMD for bulk processing
1432+ - Details hidden inside ` simdjson::to_json_string `
1433+
1434+ ---
1435+
1436+ # Try It Yourself!
1437+
1438+ ## Compiler Explorer Links:
1439+
1440+ 1 . ** Basic Comparison** (Manual vs Reflection):
1441+ https://godbolt.org/z/1n539e7cq
1442+
1443+ 2 . ** Reflection-Only Serialization** :
1444+ https://godbolt.org/z/94jPx6bEb
1445+
1446+ 3 . ** Full simdjson Integration** (requires reflection support):
1447+ ``` bash
1448+ clang++ -std=c++26 -freflection \
1449+ -fexpansion-statements -O3
1450+ ```
1451+
1452+ ## What to Look For:
1453+ - Search for ` movabs ` instructions with large numbers
1454+ - Count the ` je/jne/jb/ja ` branch instructions
1455+ - Look at the size of each function
1456+ - Notice the ` .rodata ` section with pre-computed strings
1457+
1458+ ---
1459+
1460+ # Why This Matters for Real Applications
1461+
1462+ ## Benefits Compound:
1463+ 1 . Fewer instructions → Better I-cache usage
1464+ 2 . Fewer branches → Better speculation
1465+ 3 . Compile-time strings → Better D-cache usage
1466+ 4 . SIMD-ready layout → Vectorization opportunities
1467+
1468+ ---
1469+
1470+ # Key Takeaways from Assembly Analysis
1471+
1472+ 1 . ** Reflection generates highly optimized code**
1473+ - Consistently applies optimizations
1474+ - Eliminates manual boilerplate
1475+ - Reduces opportunity for errors
1476+
1477+ 2 . ** Compile-time is powerful**
1478+ - Field names become constants
1479+ - No runtime string building
1480+ - Pre-computed buffer sizes
1481+
1482+ 3 . ** Modern C++ delivers on its promises**
1483+ - Zero-overhead abstraction is real
1484+ - Better performance AND better ergonomics
1485+
1486+ 4 . ** simdjson + reflection = excellent match**
1487+ - Compile-time structure analysis
1488+ - Optimized library implementation
1489+ - Significant reduction in code complexity
1490+
1491+ ---
1492+
1493+ # End of Bonus Section
1494+
1495+ Return to main presentation or explore the code yourself!
1496+
1497+ Remember: The assembly doesn't lie! 🚀
0 commit comments