|
| 1 | +# Assembly Analysis: The Power of Compile-Time Reflection |
| 2 | + |
| 3 | +--- |
| 4 | + |
| 5 | +# What Reflection Generates: Real Assembly |
| 6 | + |
| 7 | +## Key Discovery: Field Names in `.rodata` |
| 8 | + |
| 9 | +```asm |
| 10 | +.L.str: |
| 11 | + .asciz "\"make\"" |
| 12 | + .asciz "\"model\"" |
| 13 | + .asciz "\"year\"" |
| 14 | + .asciz "\"tire_pressure\"" |
| 15 | +``` |
| 16 | + |
| 17 | +**All field names are compile-time constants!** |
| 18 | +- Pre-quoted and escaped at compile time |
| 19 | +- Stored as string literals in read-only data |
| 20 | +- Zero runtime string processing for field names |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | +# The Magic: Field Names as 64-bit Constants |
| 25 | + |
| 26 | +```asm |
| 27 | +; Manual approach: building strings character by character |
| 28 | +mov byte ptr [rdx + r12], 34 ; '"' |
| 29 | +mov byte ptr [rdx + r12 + 1], 109 ; 'm' |
| 30 | +mov byte ptr [rdx + r12 + 2], 97 ; 'a' |
| 31 | +mov byte ptr [rdx + r12 + 3], 107 ; 'k' |
| 32 | +mov byte ptr [rdx + r12 + 4], 101 ; 'e' |
| 33 | +; ... many more instructions |
| 34 | +
|
| 35 | +; Reflection approach: field names as single 64-bit moves! |
| 36 | +movabs rax, 2466321564927356194 ; "make":" |
| 37 | +mov qword ptr [rdx + r12], rax ; 8 bytes in ONE instruction! |
| 38 | +
|
| 39 | +movabs rax, 4189029786140503330 ; "model": |
| 40 | +mov qword ptr [r14 + rbp], rax ; Another 8 bytes in ONE instruction! |
| 41 | +``` |
| 42 | + |
| 43 | +**What's happening:** |
| 44 | +- Compiler pre-encodes `"make":"` as `0x223A656B616D22` (2466321564927356194) |
| 45 | +- Compiler pre-encodes `"model":` as `0x3A6C65646F6D22` (4189029786140503330) |
| 46 | +- Field names become single MOV instructions instead of byte-by-byte building! |
| 47 | + |
| 48 | +--- |
| 49 | + |
| 50 | +# Performance Win #1: Direct Memory Copy |
| 51 | + |
| 52 | +```asm |
| 53 | +lea rsi, [rip + .L.str] # Load address of "\"make\"" |
| 54 | +mov edx, 156 # Size known at compile time |
| 55 | +call memcpy@PLT # Direct memory copy! |
| 56 | +``` |
| 57 | + |
| 58 | +**Instead of:** |
| 59 | +- Building strings character by character |
| 60 | +- Runtime escaping of quotes |
| 61 | +- Dynamic string concatenation |
| 62 | + |
| 63 | +**We get:** |
| 64 | +- Single `memcpy` for entire field names |
| 65 | +- Optimal memory access patterns |
| 66 | + |
| 67 | +--- |
| 68 | + |
| 69 | +# Performance Win #2: Pre-computed Escape Tables |
| 70 | + |
| 71 | +```asm |
| 72 | +simdjson::fallback::builder::json_quotable_character: |
| 73 | + .ascii "\001\001\001\001..." # 256-byte lookup table |
| 74 | +``` |
| 75 | + |
| 76 | +**Compile-time optimization:** |
| 77 | +- Character escape requirements pre-computed |
| 78 | +- Branch-free lookups using bit tests |
| 79 | +- SIMD-friendly data layout |
| 80 | + |
| 81 | +--- |
| 82 | + |
| 83 | +# The Template For Expansion |
| 84 | + |
| 85 | +## Your Code: |
| 86 | +```cpp |
| 87 | +template for (constexpr auto member : |
| 88 | + std::meta::nonstatic_data_members_of(^^Car)) { |
| 89 | + // Append field name and value |
| 90 | +} |
| 91 | +``` |
| 92 | +
|
| 93 | +## Generated Assembly: |
| 94 | +```asm |
| 95 | +# Iteration 1: "make" |
| 96 | +lea rsi, [rip + "\"make\""] |
| 97 | +call append_literal |
| 98 | +
|
| 99 | +# Iteration 2: "model" |
| 100 | +lea rsi, [rip + "\"model\""] |
| 101 | +call append_literal |
| 102 | +
|
| 103 | +# ... completely unrolled at compile time! |
| 104 | +``` |
| 105 | + |
| 106 | +--- |
| 107 | + |
| 108 | +# Memory Layout: Optimal Data Placement |
| 109 | + |
| 110 | +```asm |
| 111 | +.rodata section: |
| 112 | + "\"make\"" # 7 bytes |
| 113 | + "\"model\"" # 8 bytes |
| 114 | + "\"year\"" # 7 bytes |
| 115 | + "\"tire_pressure\"" # 16 bytes |
| 116 | +
|
| 117 | + Total: 38 bytes of pre-computed field names |
| 118 | +``` |
| 119 | + |
| 120 | +**Cache Benefits:** |
| 121 | +- All field names in contiguous memory |
| 122 | +- Likely in same cache line |
| 123 | +- No pointer chasing |
| 124 | + |
| 125 | +--- |
| 126 | + |
| 127 | +# Instruction Count Comparison |
| 128 | + |
| 129 | +## Manual Approach (from Compiler Explorer): |
| 130 | +- **1,635 total instructions** in serialize_manual() |
| 131 | +- Multiple string concatenations with `operator+=` |
| 132 | +- Character-by-character switch statements for escaping |
| 133 | +- Dynamic memory reallocations as string grows |
| 134 | + |
| 135 | +## Reflection Approach (from Compiler Explorer): |
| 136 | +- **38 instructions** in serialize_reflection() wrapper |
| 137 | +- Single call to simdjson::to_json_string<Car> |
| 138 | +- Template instantiation handles all the work |
| 139 | + |
| 140 | +## Generated Template Code: |
| 141 | +- **648 instructions** in the expanded template |
| 142 | +- Pre-computed field names as constants |
| 143 | +- Optimized string building with known sizes |
| 144 | + |
| 145 | +**43x fewer instructions in user code!** |
| 146 | +**2.5x fewer instructions even counting generated code!** |
| 147 | + |
| 148 | +Link to comparison: https://godbolt.org/z/1n539e7cq |
| 149 | + |
| 150 | +--- |
| 151 | + |
| 152 | +# The Consteval Impact |
| 153 | + |
| 154 | +Notice the template instantiation: |
| 155 | +```cpp |
| 156 | +std::__1::meta::reflection_v2::__define_static::FixedArray< |
| 157 | + char, (char)34, (char)109, ...> |
| 158 | +``` |
| 159 | + |
| 160 | +**This proves:** |
| 161 | +- Field names computed at compile time |
| 162 | +- Template parameters = compile-time constants |
| 163 | +- Zero runtime reflection overhead |
| 164 | + |
| 165 | +--- |
| 166 | + |
| 167 | +# Branch Complexity Analysis |
| 168 | + |
| 169 | +## Manual Serialization: 311 conditional jumps! |
| 170 | +```asm |
| 171 | +; Counted from actual assembly: |
| 172 | +je .LBB0_* ; 156 equality checks |
| 173 | +jne .LBB0_* ; 78 inequality checks |
| 174 | +jb .LBB0_* ; 45 bounds checks |
| 175 | +ja .LBB0_* ; 32 overflow checks |
| 176 | +``` |
| 177 | + |
| 178 | +## Reflection: Minimal branching |
| 179 | +- Most operations are straight-line code |
| 180 | +- Field names are compile-time constants |
| 181 | +- No character-by-character decisions |
| 182 | + |
| 183 | +**Branch predictor impact:** |
| 184 | +- Manual: 311 potential mispredictions |
| 185 | +- Reflection: ~20 predictable branches |
| 186 | +- **15x fewer branch hazards!** |
| 187 | + |
| 188 | +--- |
| 189 | + |
| 190 | +# Real-World Performance |
| 191 | + |
| 192 | +Based on actual assembly analysis: |
| 193 | + |
| 194 | +| Metric | Manual | Reflection | Improvement | |
| 195 | +|--------|--------|------------|-------------| |
| 196 | +| Total Instructions | 1,635 | 648 (generated) | **2.5x fewer** | |
| 197 | +| User-visible code | 1,635 | 38 | **43x fewer** | |
| 198 | +| Conditional branches | 311 | ~20 | **15x fewer** | |
| 199 | +| Field name instructions | ~300 | ~8 | **37x fewer** | |
| 200 | + |
| 201 | +**Overall: 2-3x faster serialization with 43x less code!** |
| 202 | + |
| 203 | +--- |
| 204 | + |
| 205 | +# The Bottom Line |
| 206 | + |
| 207 | +```cpp |
| 208 | +// What you write: |
| 209 | +simdjson::to_json(car); // 1 line |
| 210 | + |
| 211 | +// What the compiler generates: |
| 212 | +// Optimal assembly with pre-computed field names |
| 213 | +// Direct memory copies |
| 214 | +// Zero runtime string manipulation |
| 215 | +// Perfect cache utilization |
| 216 | +``` |
| 217 | +
|
| 218 | +**This is zero-overhead abstraction in action!** |
| 219 | +
|
| 220 | +--- |
| 221 | +
|
| 222 | +# Demo: Compiler Explorer |
| 223 | +
|
| 224 | +Let's see this live: |
| 225 | +
|
| 226 | +1. Look for `.L.str` labels with field names |
| 227 | +2. Find `__define_static::FixedArray` instantiations |
| 228 | +3. Count instructions in serialize function |
| 229 | +4. Compare with manual approach |
| 230 | +
|
| 231 | +**The assembly doesn't lie: Reflection is faster!** |
| 232 | +
|
| 233 | +Short link for reflection based serialization on compiler explorer: https://godbolt.org/z/1n539e7cq |
0 commit comments