Skip to content

Commit 8f1ee34

Browse files
Adding slides on the ablation study
1 parent 69529a5 commit 8f1ee34

File tree

1 file changed

+222
-32
lines changed

1 file changed

+222
-32
lines changed

cppcon2025/cppcon_2025_slides.md

Lines changed: 222 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ JSON can be *slow*. E.g., 20 MB/s.
8989
# Superscalar vs. SIMD execution
9090

9191
| processor | year | arithmetic logic units | SIMD units | simdjson |
92-
|-----------------|---------|---------------------------|----------------|----------|
92+
|-----------------|---------|---------------------------|----------------|----------|
9393
| Apple M* | 2019 | 6+ | $4 \times 128$ | 🥉 |
9494
| Intel Lion Cove | 2024 | 6 | $4 \times 256$ | 🥈🥈 |
9595
| AMD Zen 5 | 2024 | 6 | $4 \times 512$ | 🥇🥇🥇 |
@@ -794,7 +794,7 @@ using SumFunc = float (*)(const float *, size_t);
794794

795795
---
796796

797-
# Setup a reassignable implementation
797+
# Setup a reassignable implementation
798798

799799

800800
```cpp
@@ -890,61 +890,251 @@ _mm512_cmple_epu8_mask(word, _mm512_set1_epi8(31));
890890

891891
---
892892

893-
# Compile-time string escaping
893+
# Current JSON Serialization Landscape
894894

895-
- Often the 'keys' are known at compile time.
895+
**How fast can popular libraries serialize JSON?**
896896

897+
```
898+
nlohmann::json: ██ 242 MB/s
899+
RapidJSON: █████ 497 MB/s
900+
Serde (Rust): █████████████ 1,343 MB/s
901+
yyjson: ████████████████████ 2,074 MB/s
897902
898-
```cpp
899-
struct Player {
900-
std::string username;
901-
int level;
902-
double health;
903-
std::vector<std::string> inventory;
904-
};
903+
0 500 1000 1500 2000 2500 MB/s
904+
```
905+
906+
---
907+
908+
# How fast are we? ...
909+
910+
```
911+
nlohmann::json: ██ 242 MB/s
912+
RapidJSON: █████ 497 MB/s
913+
Serde (Rust): █████████████ 1,343 MB/s
914+
yyjson: ████████████████████ 2,074 MB/s
915+
simdjson: ██████████████████████████████████ 3,435 MB/s ⭐
916+
917+
0 500 1000 1500 2000 2500 3000 3500 MB/s
905918
```
906919

907-
- Keys are: `username`, `level`, `health`, `inventory`.
920+
**3.4 GB/s** on the Twitter benchmark - That's:
921+
- **14x faster** than nlohmann
922+
- **2.5x faster** than Rust's Serde
923+
- **66% faster** than hand-optimized yyjson
924+
925+
**How did we achieve this? Let's find out...**
926+
927+
---
928+
929+
# Ablation Study: How We Achieved 3.4 GB/s
930+
931+
**What is Ablation?**
932+
From neuroscience: systematically remove parts to understand function
933+
934+
**Our Approach:**
935+
1. **Baseline**: All optimizations enabled (3,435 MB/s)
936+
2. **Disable one optimization** at a time
937+
3. **Measure performance impact**
938+
4. **Calculate contribution**: `(Baseline - Disabled) / Disabled`
939+
940+
---
941+
942+
# Five Key Optimizations
943+
944+
1. **Consteval**: Compile-time field name processing
945+
2. **SIMD String Escaping**: Vectorized character checks
946+
3. **Fast Integer Conversion**: Optimized number serialization
947+
4. **Branch Prediction Hints**: CPU pipeline optimization
948+
5. **Buffer Growth Strategy**: Smart memory allocation
908949

909950
---
910951

911-
# Escape at compile time.
952+
# Optimization #1: Consteval
953+
## The Power of Compile-Time
912954

955+
**The Insight:** JSON field names are known at compile time!
956+
957+
**Traditional (Runtime):**
913958
```cpp
914-
[:expand(std::meta::nonstatic_data_members_of(...)] {
915-
constexpr auto key =
916-
std::define_static_string(consteval_to_quoted_escaped(
917-
std::meta::identifier_of(dm)));
918-
b.append_raw(key);
919-
b.append(':');
920-
// ...
921-
};
959+
// Every serialization call:
960+
write_string("\"username\""); // Quote & escape at runtime
961+
write_string("\"level\""); // Quote & escape again!
962+
```
963+
964+
**With Consteval (Compile-Time):**
965+
```cpp
966+
constexpr auto username_key = "\"username\":"; // Pre-computed!
967+
b.append_literal(username_key); // Just memcpy!
922968
```
923969

924970
---
925971

926-
# Otherwise tricky to do
972+
# Consteval Performance Impact
973+
974+
| Dataset | Baseline | No Consteval | Impact | **Speedup** |
975+
|---------|----------|--------------|--------|-------------|
976+
| Twitter | 3,231 MB/s | 1,624 MB/s | -50% | **1.99x** |
977+
| CITM | 2,341 MB/s | 883 MB/s | -62% | **2.65x** |
978+
979+
**Twitter Example (100 tweets):**
980+
- 100 tweets × 15 fields = **1,500 field names**
981+
- Without: 1,500 runtime escape operations
982+
- With: **0 runtime operations**
927983

928-
- Outside metaprogramming, lots of values are compile-time constants
929-
- But processing it at compile time is not always easy/convenient.
984+
**Result: 2-2.6x faster serialization!**
930985

931986
---
932987

933-
# Example: `g` returns 1
988+
# Optimization #2: SIMD String Escaping
934989

990+
**The Problem:** JSON requires escaping `"`, `\`, and control chars
991+
992+
**Traditional (1 byte at a time):**
935993
```cpp
936-
constexpr int convert(const char * x) {
937-
if (std::is_constant_evaluated()) { return 0; }
938-
return 1;
994+
for (char c : str) {
995+
if (c == '"' || c == '\\' || c < 0x20)
996+
return true;
939997
}
998+
```
999+
1000+
**SIMD (16 bytes at once):**
1001+
```cpp
1002+
__m128i chunk = load_16_bytes(str);
1003+
__m128i needs_escape = check_all_conditions_parallel(chunk);
1004+
if (!needs_escape)
1005+
return false; // Fast path!
1006+
```
1007+
1008+
---
1009+
1010+
# SIMD Escaping Performance Impact
1011+
1012+
| Dataset | Baseline | No SIMD | Impact | **Speedup** |
1013+
|---------|----------|---------|--------|-------------|
1014+
| Twitter | 3,231 MB/s | 2,245 MB/s | -31% | **1.44x** |
1015+
| CITM | 2,341 MB/s | 2,273 MB/s | -3% | **1.03x** |
1016+
1017+
**Why Different Impact?**
1018+
- **Twitter**: Long text fields (tweets, descriptions) → Big win
1019+
- **CITM**: Mostly numbers → Small impact
1020+
1021+
---
1022+
1023+
# Optimization #3: Fast Integer Conversion
9401024

941-
int g() {
942-
constexpr char key[] = "name";
943-
auto x = convert(key);
944-
return x;
1025+
**Traditional:**
1026+
```cpp
1027+
std::to_string(value); // 2 divisions per digit!
1028+
```
1029+
1030+
**Optimized:**
1031+
```cpp
1032+
fast_itoa(value, buffer); // 50% fewer divisions + lookup tables
1033+
```
1034+
1035+
| Dataset | Baseline | No Fast Digits | Impact | **Speedup** |
1036+
|---------|----------|----------------|--------|-------------|
1037+
| Twitter | 3,231 MB/s | 3,041 MB/s | -6% | **1.06x** |
1038+
| CITM | 2,341 MB/s | 1,841 MB/s | -21% | **1.27x** |
1039+
1040+
**CITM has ~10,000+ integers per document!**
1041+
1042+
---
1043+
1044+
# Optimizations #4 & #5: Branch Hints & Buffer Growth
1045+
1046+
**Branch Prediction:**
1047+
```cpp
1048+
if (UNLIKELY(buffer_full)) { // CPU knows this is rare
1049+
grow_buffer();
9451050
}
1051+
// CPU optimizes for this path
9461052
```
9471053

1054+
**Buffer Growth:**
1055+
- Linear: 1000 allocations for 1MB
1056+
- Exponential: 10 allocations for 1MB
1057+
1058+
| Both Optimizations | Impact | Speedup |
1059+
|-------------------|--------|---------|
1060+
| Twitter & CITM | ~2% | 1.02x |
1061+
1062+
**Small but free!**
1063+
1064+
---
1065+
1066+
# Combined Performance Impact
1067+
1068+
**All Optimizations Together:**
1069+
1070+
| Optimization | Twitter Contribution | CITM Contribution |
1071+
|--------------|---------------------|-------------------|
1072+
| **Consteval** | +99% (1.99x) | +165% (2.65x) |
1073+
| **SIMD Escaping** | +44% (1.44x) | +3% (1.03x) |
1074+
| **Fast Digits** | +6% (1.06x) | +27% (1.27x) |
1075+
| **Branch Hints** | +1.5% | +1.5% |
1076+
| **Buffer Growth** | +0.8% | +0.8% |
1077+
| **TOTAL** | **~2.95x faster** | **~3.5x faster** |
1078+
1079+
**From Baseline to Optimized:**
1080+
- Twitter: ~1,100 MB/s → 3,231 MB/s
1081+
- CITM: ~670 MB/s → 2,341 MB/s
1082+
1083+
---
1084+
1085+
# Library Performance Comparison
1086+
1087+
**Twitter Dataset (631KB):**
1088+
```
1089+
simdjson (reflection): ████████████████████████ 3,435 MB/s ⭐
1090+
yyjson: ██████████████ 2,074 MB/s
1091+
Serde (Rust): █████████ 1,343 MB/s
1092+
RapidJSON: ███ 497 MB/s
1093+
nlohmann::json: ██ 242 MB/s
1094+
```
1095+
1096+
**simdjson achieves the fastest JSON serialization performance!**
1097+
1098+
---
1099+
1100+
# Real-World Impact
1101+
1102+
**API Server Example:**
1103+
- 10 million API responses/day
1104+
- Average response: ~5KB JSON
1105+
- Total: 50GB JSON serialization/day
1106+
1107+
**Serialization Time:**
1108+
```
1109+
nlohmann::json: 210 seconds (3.5 minutes)
1110+
RapidJSON: 102 seconds (1.7 minutes)
1111+
Serde (Rust): 38 seconds
1112+
yyjson: 24 seconds
1113+
simdjson: 14.5 seconds ⭐
1114+
```
1115+
1116+
**Time saved: 195 seconds vs nlohmann (93% reduction)**
1117+
1118+
---
1119+
1120+
# Key Technical Insights
1121+
1122+
1. **Compile-Time optimizations can be awesome**
1123+
- Consteval: 2-2.6x speedup alone
1124+
- C++26 reflection enables unprecedented optimization
1125+
1126+
2. **SIMD Everywhere**
1127+
- Not just for parsing anymore
1128+
- String operations benefit hugely
1129+
1130+
3. **Avoid Hidden Costs**
1131+
- Hidden allocations: `std::to_string()`
1132+
- Hidden divisions: `log10(value)`
1133+
- Hidden mispredictions: rare conditions
1134+
1135+
4. **Every Optimization Matters**
1136+
- Small gains compound into huge improvements
1137+
9481138
---
9491139

9501140
# Conclusion

0 commit comments

Comments
 (0)