@@ -89,7 +89,7 @@ JSON can be *slow*. E.g., 20 MB/s.
8989# Superscalar vs. SIMD execution
9090
9191| processor | year | arithmetic logic units | SIMD units | simdjson |
92- | -----------------| ---------| ---------------------------| ----------------| ----------|
92+ | -----------------| ---------| ---------------------------| ----------------| ----------|
9393| Apple M* | 2019 | 6+ | $4 \times 128$ | 🥉 |
9494| Intel Lion Cove | 2024 | 6 | $4 \times 256$ | 🥈🥈 |
9595| AMD Zen 5 | 2024 | 6 | $4 \times 512$ | 🥇🥇🥇 |
@@ -794,7 +794,7 @@ using SumFunc = float (*)(const float *, size_t);
794794
795795---
796796
797- # Setup a reassignable implementation
797+ # Setup a reassignable implementation
798798
799799
800800``` cpp
@@ -890,61 +890,251 @@ _mm512_cmple_epu8_mask(word, _mm512_set1_epi8(31));
890890
891891---
892892
893- # Compile-time string escaping
893+ # Current JSON Serialization Landscape
894894
895- - Often the 'keys' are known at compile time.
895+ ** How fast can popular libraries serialize JSON? **
896896
897+ ```
898+ nlohmann::json: ██ 242 MB/s
899+ RapidJSON: █████ 497 MB/s
900+ Serde (Rust): █████████████ 1,343 MB/s
901+ yyjson: ████████████████████ 2,074 MB/s
897902
898- ``` cpp
899- struct Player {
900- std::string username;
901- int level;
902- double health;
903- std::vector< std::string > inventory;
904- };
903+ 0 500 1000 1500 2000 2500 MB/s
904+ ```
905+
906+ ---
907+
908+ # How fast are we? ...
909+
910+ ```
911+ nlohmann::json: ██ 242 MB/s
912+ RapidJSON: █████ 497 MB/s
913+ Serde (Rust): █████████████ 1,343 MB/s
914+ yyjson: ████████████████████ 2,074 MB/s
915+ simdjson: ██████████████████████████████████ 3,435 MB/s ⭐
916+
917+ 0 500 1000 1500 2000 2500 3000 3500 MB/s
905918```
906919
907- - Keys are: `username`, `level`, `health`, `inventory`.
920+ ** 3.4 GB/s** on the Twitter benchmark - That's:
921+ - ** 14x faster** than nlohmann
922+ - ** 2.5x faster** than Rust's Serde
923+ - ** 66% faster** than hand-optimized yyjson
924+
925+ ** How did we achieve this? Let's find out...**
926+
927+ ---
928+
929+ # Ablation Study: How We Achieved 3.4 GB/s
930+
931+ ** What is Ablation?**
932+ From neuroscience: systematically remove parts to understand function
933+
934+ ** Our Approach:**
935+ 1 . ** Baseline** : All optimizations enabled (3,435 MB/s)
936+ 2 . ** Disable one optimization** at a time
937+ 3 . ** Measure performance impact**
938+ 4 . ** Calculate contribution** : ` (Baseline - Disabled) / Disabled `
939+
940+ ---
941+
942+ # Five Key Optimizations
943+
944+ 1 . ** Consteval** : Compile-time field name processing
945+ 2 . ** SIMD String Escaping** : Vectorized character checks
946+ 3 . ** Fast Integer Conversion** : Optimized number serialization
947+ 4 . ** Branch Prediction Hints** : CPU pipeline optimization
948+ 5 . ** Buffer Growth Strategy** : Smart memory allocation
908949
909950---
910951
911- # Escape at compile time.
952+ # Optimization #1 : Consteval
953+ ## The Power of Compile-Time
912954
955+ ** The Insight:** JSON field names are known at compile time!
956+
957+ ** Traditional (Runtime):**
913958``` cpp
914- [:expand(std::meta::nonstatic_data_members_of(...)] {
915- constexpr auto key =
916- std::define_static_string(consteval_to_quoted_escaped(
917- std::meta::identifier_of(dm)));
918- b.append_raw(key);
919- b.append(':');
920- // ...
921- };
959+ // Every serialization call:
960+ write_string ("\" username\" "); // Quote & escape at runtime
961+ write_string("\" level\" "); // Quote & escape again!
962+ ```
963+
964+ **With Consteval (Compile-Time):**
965+ ```cpp
966+ constexpr auto username_key = "\"username\":"; // Pre-computed!
967+ b.append_literal(username_key); // Just memcpy!
922968```
923969
924970---
925971
926- # Otherwise tricky to do
972+ # Consteval Performance Impact
973+
974+ | Dataset | Baseline | No Consteval | Impact | ** Speedup** |
975+ | ---------| ----------| --------------| --------| -------------|
976+ | Twitter | 3,231 MB/s | 1,624 MB/s | -50% | ** 1.99x** |
977+ | CITM | 2,341 MB/s | 883 MB/s | -62% | ** 2.65x** |
978+
979+ ** Twitter Example (100 tweets):**
980+ - 100 tweets × 15 fields = ** 1,500 field names**
981+ - Without: 1,500 runtime escape operations
982+ - With: ** 0 runtime operations**
927983
928- - Outside metaprogramming, lots of values are compile-time constants
929- - But processing it at compile time is not always easy/convenient.
984+ ** Result: 2-2.6x faster serialization!**
930985
931986---
932987
933- # Example: ` g ` returns 1
988+ # Optimization # 2 : SIMD String Escaping
934989
990+ ** The Problem:** JSON requires escaping ` " ` , ` \ ` , and control chars
991+
992+ ** Traditional (1 byte at a time):**
935993``` cpp
936- constexpr int convert (const char * x ) {
937- if (std::is_constant_evaluated()) { return 0; }
938- return 1 ;
994+ for ( char c : str ) {
995+ if (c == '"' || c == '\\' || c < 0x20)
996+ return true ;
939997}
998+ ```
999+
1000+ ** SIMD (16 bytes at once):**
1001+ ``` cpp
1002+ __m128i chunk = load_16_bytes(str);
1003+ __m128i needs_escape = check_all_conditions_parallel(chunk);
1004+ if (!needs_escape)
1005+ return false ; // Fast path!
1006+ ```
1007+
1008+ ---
1009+
1010+ # SIMD Escaping Performance Impact
1011+
1012+ | Dataset | Baseline | No SIMD | Impact | ** Speedup** |
1013+ | ---------| ----------| ---------| --------| -------------|
1014+ | Twitter | 3,231 MB/s | 2,245 MB/s | -31% | ** 1.44x** |
1015+ | CITM | 2,341 MB/s | 2,273 MB/s | -3% | ** 1.03x** |
1016+
1017+ ** Why Different Impact?**
1018+ - ** Twitter** : Long text fields (tweets, descriptions) → Big win
1019+ - ** CITM** : Mostly numbers → Small impact
1020+
1021+ ---
1022+
1023+ # Optimization #3 : Fast Integer Conversion
9401024
941- int g() {
942- constexpr char key[ ] = "name";
943- auto x = convert(key);
944- return x;
1025+ ** Traditional:**
1026+ ``` cpp
1027+ std::to_string (value); // 2 divisions per digit!
1028+ ```
1029+
1030+ **Optimized:**
1031+ ```cpp
1032+ fast_itoa(value, buffer); // 50% fewer divisions + lookup tables
1033+ ```
1034+
1035+ | Dataset | Baseline | No Fast Digits | Impact | ** Speedup** |
1036+ | ---------| ----------| ----------------| --------| -------------|
1037+ | Twitter | 3,231 MB/s | 3,041 MB/s | -6% | ** 1.06x** |
1038+ | CITM | 2,341 MB/s | 1,841 MB/s | -21% | ** 1.27x** |
1039+
1040+ ** CITM has ~ 10,000+ integers per document!**
1041+
1042+ ---
1043+
1044+ # Optimizations #4 & #5 : Branch Hints & Buffer Growth
1045+
1046+ ** Branch Prediction:**
1047+ ``` cpp
1048+ if (UNLIKELY(buffer_full)) { // CPU knows this is rare
1049+ grow_buffer();
9451050}
1051+ // CPU optimizes for this path
9461052```
9471053
1054+ ** Buffer Growth:**
1055+ - Linear: 1000 allocations for 1MB
1056+ - Exponential: 10 allocations for 1MB
1057+
1058+ | Both Optimizations | Impact | Speedup |
1059+ | -------------------| --------| ---------|
1060+ | Twitter & CITM | ~ 2% | 1.02x |
1061+
1062+ ** Small but free!**
1063+
1064+ ---
1065+
1066+ # Combined Performance Impact
1067+
1068+ ** All Optimizations Together:**
1069+
1070+ | Optimization | Twitter Contribution | CITM Contribution |
1071+ | --------------| ---------------------| -------------------|
1072+ | ** Consteval** | +99% (1.99x) | +165% (2.65x) |
1073+ | ** SIMD Escaping** | +44% (1.44x) | +3% (1.03x) |
1074+ | ** Fast Digits** | +6% (1.06x) | +27% (1.27x) |
1075+ | ** Branch Hints** | +1.5% | +1.5% |
1076+ | ** Buffer Growth** | +0.8% | +0.8% |
1077+ | ** TOTAL** | ** ~ 2.95x faster** | ** ~ 3.5x faster** |
1078+
1079+ ** From Baseline to Optimized:**
1080+ - Twitter: ~ 1,100 MB/s → 3,231 MB/s
1081+ - CITM: ~ 670 MB/s → 2,341 MB/s
1082+
1083+ ---
1084+
1085+ # Library Performance Comparison
1086+
1087+ ** Twitter Dataset (631KB):**
1088+ ```
1089+ simdjson (reflection): ████████████████████████ 3,435 MB/s ⭐
1090+ yyjson: ██████████████ 2,074 MB/s
1091+ Serde (Rust): █████████ 1,343 MB/s
1092+ RapidJSON: ███ 497 MB/s
1093+ nlohmann::json: ██ 242 MB/s
1094+ ```
1095+
1096+ ** simdjson achieves the fastest JSON serialization performance!**
1097+
1098+ ---
1099+
1100+ # Real-World Impact
1101+
1102+ ** API Server Example:**
1103+ - 10 million API responses/day
1104+ - Average response: ~ 5KB JSON
1105+ - Total: 50GB JSON serialization/day
1106+
1107+ ** Serialization Time:**
1108+ ```
1109+ nlohmann::json: 210 seconds (3.5 minutes)
1110+ RapidJSON: 102 seconds (1.7 minutes)
1111+ Serde (Rust): 38 seconds
1112+ yyjson: 24 seconds
1113+ simdjson: 14.5 seconds ⭐
1114+ ```
1115+
1116+ ** Time saved: 195 seconds vs nlohmann (93% reduction)**
1117+
1118+ ---
1119+
1120+ # Key Technical Insights
1121+
1122+ 1 . ** Compile-Time optimizations can be awesome**
1123+ - Consteval: 2-2.6x speedup alone
1124+ - C++26 reflection enables unprecedented optimization
1125+
1126+ 2 . ** SIMD Everywhere**
1127+ - Not just for parsing anymore
1128+ - String operations benefit hugely
1129+
1130+ 3 . ** Avoid Hidden Costs**
1131+ - Hidden allocations: ` std::to_string() `
1132+ - Hidden divisions: ` log10(value) `
1133+ - Hidden mispredictions: rare conditions
1134+
1135+ 4 . ** Every Optimization Matters**
1136+ - Small gains compound into huge improvements
1137+
9481138---
9491139
9501140# Conclusion
0 commit comments