Skip to content

Commit 9faa85f

Browse files
Adding bonus section at the end.
1 parent c1c8c44 commit 9faa85f

8 files changed

+4827
-2
lines changed

cppcon2025/compiler_explorer_serialization_reflection_based.asm

Lines changed: 4310 additions & 0 deletions
Large diffs are not rendered by default.

cppcon2025/cppcon_2025_slides.md

Lines changed: 300 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1175,8 +1175,7 @@ Welcome to the future of C++ serialization! 🚀
11751175
- The authors of P2996 for making compile-time reflection a reality
11761176

11771177
**Compiler Implementation Teams**
1178-
- EDG team for the experimental implementation
1179-
- Contributors to experimental GCC and Clang branches
1178+
- Everyone that implemented P2996 and made it publicly available.
11801179
- Early adopters testing and providing feedback
11811180

11821181
**Compiler Explorer Team**
@@ -1197,3 +1196,302 @@ Daniel Lemire and Francisco Geiman Thiesen
11971196
GitHub: github.com/simdjson/simdjson
11981197

11991198
Thank you!
1199+
1200+
---
1201+
1202+
# BONUS: Assembly Deep Dive
1203+
1204+
Want to see the actual machine code?
1205+
1206+
Let's look under the hood! 🔧
1207+
1208+
---
1209+
1210+
# The Shocking Truth: Instruction Counts
1211+
1212+
<div style="display: flex; align-items: center; gap: 30px">
1213+
<div style="flex: 1.5">
1214+
1215+
![Instruction Count Analysis](images/bonus_chart1_instructions.png)
1216+
1217+
</div>
1218+
<div style="flex: 1">
1219+
1220+
### The Numbers:
1221+
- **Manual:** 1,635 instructions
1222+
- **Reflection:** 648 instructions
1223+
- **Speedup:** 2.5x fewer!
1224+
1225+
### You Write:
1226+
- **Manual:** 70+ lines of C++
1227+
- **Reflection:** 1 line!
1228+
1229+
[Try it yourself →](https://godbolt.org/z/94jPx6bEb)
1230+
1231+
</div>
1232+
</div>
1233+
1234+
---
1235+
1236+
# Field Names: The Power of Compile-Time Constants
1237+
1238+
<div style="display: flex; gap: 20px">
1239+
<div style="flex: 1">
1240+
1241+
### Manual: Byte-by-byte
1242+
```asm
1243+
mov byte ptr [rdx], 34 ; '"'
1244+
mov byte ptr [rdx+1], 109 ; 'm'
1245+
mov byte ptr [rdx+2], 97 ; 'a'
1246+
mov byte ptr [rdx+3], 107 ; 'k'
1247+
mov byte ptr [rdx+4], 101 ; 'e'
1248+
mov byte ptr [rdx+5], 34 ; '"'
1249+
mov byte ptr [rdx+6], 58 ; ':'
1250+
; ... plus bounds checks
1251+
```
1252+
**50+ instructions per field name**
1253+
1254+
</div>
1255+
<div style="flex: 1">
1256+
1257+
### Reflection: 64-bit constant
1258+
```asm
1259+
movabs rax, 0x223A656B616D22
1260+
; "make":" as single value!
1261+
1262+
mov qword ptr [rdx], rax
1263+
; Store 8 bytes at once!
1264+
```
1265+
**2 instructions per field name**
1266+
1267+
</div>
1268+
</div>
1269+
1270+
![Field Name Encoding](images/bonus_chart3_fields.png)
1271+
1272+
*Source: compiler_explorer_instruction_comparison.asm - 25x fewer instructions for field operations*
1273+
1274+
---
1275+
1276+
# Branch Prediction: The Hidden Performance Killer
1277+
1278+
<div style="columns: 2; column-gap: 40px">
1279+
1280+
### Manual: 311 branches! 😱
1281+
```asm
1282+
cmp al, 34 ; quote?
1283+
je .LBB0_19 ; branch!
1284+
cmp al, 92 ; backslash?
1285+
je .LBB0_27 ; branch!
1286+
cmp al, 10 ; newline?
1287+
je .LBB0_35 ; branch!
1288+
cmp al, 13 ; return?
1289+
je .LBB0_42 ; branch!
1290+
; ... 300+ more conditions
1291+
```
1292+
1293+
**Problem:** Each branch = potential CPU pipeline stall
1294+
1295+
<div style="break-before: column"></div>
1296+
1297+
### Reflection: 20 branches 🎯
1298+
```asm
1299+
call simdjson::to_json_string
1300+
; Most logic inside optimized
1301+
; library with straight-line
1302+
; SIMD code
1303+
```
1304+
1305+
**Benefit:**
1306+
- 15x fewer misprediction opportunities
1307+
- Better CPU pipeline utilization
1308+
- Predictable control flow
1309+
1310+
</div>
1311+
1312+
*Measured from assembly: 311 je/jne/jb/ja instructions vs 20*
1313+
1314+
---
1315+
1316+
# Memory Allocation: Death by a Thousand Cuts
1317+
1318+
<style scoped>
1319+
table {
1320+
font-size: 0.9em;
1321+
}
1322+
</style>
1323+
1324+
| Operation | Manual | Reflection | Impact |
1325+
|-----------|--------|------------|--------|
1326+
| String appends | 40 | 5 | 8x fewer |
1327+
| Memory reallocations | 235 | 1 | **235x fewer!** |
1328+
| Escape checks | 600+ | (inside lib) | Bulk SIMD |
1329+
1330+
### Manual: Growing pain
1331+
```cpp
1332+
std::string json = "{"; // alloc 1
1333+
json += "\"make\":\""; // realloc 2
1334+
json += car.make; // realloc 3
1335+
json += "\",\"model\":\""; // realloc 4
1336+
// ... 231 more reallocations!
1337+
```
1338+
1339+
### Reflection: Pre-sized perfection
1340+
```cpp
1341+
return simdjson::to_json(car); // 1 allocation, perfectly sized!
1342+
```
1343+
1344+
*Source: Assembly analysis of compiler_explorer_instruction_comparison.asm*
1345+
1346+
---
1347+
1348+
# Real Code Comparison
1349+
1350+
## What developers write (Manual):
1351+
```cpp
1352+
std::string serialize_manual(const Car& car) {
1353+
std::string json = "{";
1354+
json += "\"make\":\"";
1355+
for (char c : car.make) {
1356+
switch(c) {
1357+
case '"': json += "\\\""; break;
1358+
case '\\': json += "\\\\"; break;
1359+
case '\n': json += "\\n"; break;
1360+
// ... more escape cases
1361+
default: json += c;
1362+
}
1363+
}
1364+
json += "\",\"model\":\"";
1365+
// ... 70+ more lines of similar code
1366+
}
1367+
```
1368+
1369+
## What developers write (Reflection):
1370+
```cpp
1371+
std::string serialize_reflection(const Car& car) {
1372+
return simdjson::to_json(car); // That's it!
1373+
}
1374+
```
1375+
1376+
Try both: https://godbolt.org/z/1n539e7cq
1377+
1378+
---
1379+
1380+
# Branch Complexity Analysis
1381+
1382+
![Branch Complexity](images/bonus_chart2_branches.png)
1383+
1384+
### What the Numbers Mean:
1385+
- **Manual:** 311 conditional branches in assembly
1386+
- **Reflection:** 20 conditional branches in assembly
1387+
- **Impact:** Fewer branches = fewer potential mispredictions
1388+
- **Note:** Actual performance depends on data patterns
1389+
1390+
---
1391+
1392+
1393+
1394+
# How Reflection Optimizes
1395+
1396+
## Compile-Time Field Discovery
1397+
```cpp
1398+
template for (constexpr auto member :
1399+
std::meta::nonstatic_data_members_of(^^Car)) {
1400+
// Field names known at compile time!
1401+
// Compiler generates optimal code for each field
1402+
}
1403+
```
1404+
1405+
## Result: Pre-computed Constants
1406+
- Field names → 64-bit integers
1407+
- String lengths → compile-time constants
1408+
- Escape sequences → eliminated entirely
1409+
- Buffer sizes → calculated at compile time
1410+
1411+
---
1412+
1413+
# Escape Processing: Different Approaches
1414+
1415+
## Manual: Character-by-character checking
1416+
```cpp
1417+
for (char c : str) {
1418+
if (c == '"') output += "\\\"";
1419+
else if (c == '\\') output += "\\\\";
1420+
else if (c < 0x20) {
1421+
// Unicode escape sequence
1422+
snprintf(buf, 7, "\\u%04x", c);
1423+
output += buf;
1424+
}
1425+
// ... more checks
1426+
}
1427+
```
1428+
1429+
## Reflection: Library handles escaping
1430+
- Escaping logic encapsulated in simdjson
1431+
- Implementation may use SIMD for bulk processing
1432+
- Details hidden inside `simdjson::to_json_string`
1433+
1434+
---
1435+
1436+
# Try It Yourself!
1437+
1438+
## Compiler Explorer Links:
1439+
1440+
1. **Basic Comparison** (Manual vs Reflection):
1441+
https://godbolt.org/z/1n539e7cq
1442+
1443+
2. **Reflection-Only Serialization**:
1444+
https://godbolt.org/z/94jPx6bEb
1445+
1446+
3. **Full simdjson Integration** (requires reflection support):
1447+
```bash
1448+
clang++ -std=c++26 -freflection \
1449+
-fexpansion-statements -O3
1450+
```
1451+
1452+
## What to Look For:
1453+
- Search for `movabs` instructions with large numbers
1454+
- Count the `je/jne/jb/ja` branch instructions
1455+
- Look at the size of each function
1456+
- Notice the `.rodata` section with pre-computed strings
1457+
1458+
---
1459+
1460+
# Why This Matters for Real Applications
1461+
1462+
## Benefits Compound:
1463+
1. Fewer instructions → Better I-cache usage
1464+
2. Fewer branches → Better speculation
1465+
3. Compile-time strings → Better D-cache usage
1466+
4. SIMD-ready layout → Vectorization opportunities
1467+
1468+
---
1469+
1470+
# Key Takeaways from Assembly Analysis
1471+
1472+
1. **Reflection generates highly optimized code**
1473+
- Consistently applies optimizations
1474+
- Eliminates manual boilerplate
1475+
- Reduces opportunity for errors
1476+
1477+
2. **Compile-time is powerful**
1478+
- Field names become constants
1479+
- No runtime string building
1480+
- Pre-computed buffer sizes
1481+
1482+
3. **Modern C++ delivers on its promises**
1483+
- Zero-overhead abstraction is real
1484+
- Better performance AND better ergonomics
1485+
1486+
4. **simdjson + reflection = excellent match**
1487+
- Compile-time structure analysis
1488+
- Optimized library implementation
1489+
- Significant reduction in code complexity
1490+
1491+
---
1492+
1493+
# End of Bonus Section
1494+
1495+
Return to main presentation or explore the code yourself!
1496+
1497+
Remember: The assembly doesn't lie! 🚀

cppcon2025/images/bonus_chart1_instructions.png

Loading

cppcon2025/images/bonus_chart2_branches.png

Loading

cppcon2025/images/bonus_chart3_fields.png

Loading

0 commit comments

Comments
 (0)