@@ -958,13 +958,13 @@ _mm512_cmple_epu8_mask(word, _mm512_set1_epi8(31));
958958
959959---
960960
961- # Ablation Study: How We Achieved 3.4 GB/s
961+ # Ablation Study: How We Achieved 3.2 GB/s
962962
963963** What is Ablation?**
964964From neuroscience: systematically remove parts to understand function
965965
966- ** Our Approach:**
967- 1 . ** Baseline** : All optimizations enabled (3,435 MB/s)
966+ ** Our Approach (Apple Silicon M2) :**
967+ 1 . ** Baseline** : All optimizations enabled (3,211 MB/s)
9689682 . ** Disable one optimization** at a time
9699693 . ** Measure performance impact**
9709704 . ** Calculate contribution** : ` (Baseline - Disabled) / Disabled `
@@ -1001,12 +1001,12 @@ b.append_literal(username_key); // Just memcpy!
10011001
10021002---
10031003
1004- # Consteval Performance Impact
1004+ # Consteval Performance Impact (Apple Silicon)
10051005
10061006| Dataset | Baseline | No Consteval | Impact | ** Speedup** |
10071007| ---------| ----------| --------------| --------| -------------|
1008- | Twitter | 3,231 MB/s | 1,624 MB/s | -50% | ** 1.99x ** |
1009- | CITM | 2,341 MB/s | 883 MB/s | -62 % | ** 2.65x ** |
1008+ | Twitter | 3,211 MB/s | 1,607 MB/s | -50% | ** 2.00x ** |
1009+ | CITM | 2,360 MB/s | 978 MB/s | -59 % | ** 2.41x ** |
10101010
10111011** Twitter Example (100 tweets):**
10121012- 100 tweets × 15 fields = ** 1,500 field names**
@@ -1039,12 +1039,12 @@ if (!needs_escape)
10391039
10401040---
10411041
1042- # SIMD Escaping Performance Impact
1042+ # SIMD Escaping Performance Impact (Apple Silicon)
10431043
10441044| Dataset | Baseline | No SIMD | Impact | ** Speedup** |
10451045| ---------| ----------| ---------| --------| -------------|
1046- | Twitter | 3,231 MB/s | 2,245 MB/s | -31 % | ** 1.44x ** |
1047- | CITM | 2,341 MB/s | 2,273 MB/s | -3 % | ** 1.03x ** |
1046+ | Twitter | 3,211 MB/s | 2,269 MB/s | -29 % | ** 1.42x ** |
1047+ | CITM | 2,360 MB/s | 2,259 MB/s | -4 % | ** 1.04x ** |
10481048
10491049** Why Different Impact?**
10501050- ** Twitter** : Long text fields (tweets, descriptions) → Big win
@@ -1066,8 +1066,8 @@ fast_digit_count(value); // Bit operations + lookup table
10661066
10671067| Dataset | Baseline | No Fast Digits | ** Speedup** |
10681068| ---------| ----------| ----------------| -------------|
1069- | Twitter | 3,231 MB/s | 3,041 MB/s | ** 1.06x** |
1070- | CITM | 2,341 MB/s | 1,841 MB/s | ** 1.27x ** |
1069+ | Twitter | 3,211 MB/s | 3,035 MB/s | ** 1.06x** |
1070+ | CITM | 2,360 MB/s | 1,767 MB/s | ** 1.34x ** |
10711071
10721072** CITM has ~ 10,000+ integers!**
10731073
@@ -1089,7 +1089,7 @@ if (UNLIKELY(buffer_full)) { // CPU knows this is rare
10891089
10901090| Both Optimizations | Impact | Speedup |
10911091| -------------------| --------| ---------|
1092- | Twitter & CITM | ~ 2 % | 1.02x |
1092+ | Twitter & CITM | ~ 1 % | 1.01x |
10931093
10941094** Small but free!**
10951095
@@ -1101,16 +1101,16 @@ if (UNLIKELY(buffer_full)) { // CPU knows this is rare
11011101
11021102| Optimization | Twitter Contribution | CITM Contribution |
11031103| --------------| ---------------------| -------------------|
1104- | ** Consteval** | +99 % (1.99x ) | +165 % (2.65x ) |
1105- | ** SIMD Escaping** | +44 % (1.44x ) | +3 % (1.03x ) |
1106- | ** Fast Digits** | +6% (1.06x) | +27 % (1.27x ) |
1107- | ** Branch Hints** | +1.5 % | +1. 5% |
1108- | ** Buffer Growth** | +0.8 % | +0.8 % |
1109- | ** TOTAL** | ** ~ 2.95x faster** | ** ~ 3.5x faster** |
1104+ | ** Consteval** | +100 % (2.00x ) | +141 % (2.41x ) |
1105+ | ** SIMD Escaping** | +42 % (1.42x ) | +4 % (1.04x ) |
1106+ | ** Fast Digits** | +6% (1.06x) | +34 % (1.34x ) |
1107+ | ** Branch Hints** | +1% | +5% |
1108+ | ** Buffer Growth** | -0.4 % | +2 % |
1109+ | ** TOTAL** | ** ~ 2.9x faster** | ** ~ 3.4x faster** |
11101110
11111111** From Baseline to Optimized:**
1112- - Twitter: ~ 1,100 MB/s → 3,231 MB/s
1113- - CITM: ~ 670 MB/s → 2,341 MB/s
1112+ - Twitter: ~ 1,100 MB/s → 3,211 MB/s
1113+ - CITM: ~ 700 MB/s → 2,360 MB/s
11141114
11151115---
11161116
0 commit comments