Skip to content

Commit 568adb3

Browse files
deploy: 508fa70
1 parent 8c1cd4c commit 568adb3

File tree

4 files changed

+60
-60
lines changed

4 files changed

+60
-60
lines changed

cppcon2025/cppcon_2025_slides.html

Lines changed: 40 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -811,12 +811,12 @@ <h1 id="how-fast-are-we">How fast are we? ...</h1>
811811
<p><strong>3.4 GB/s</strong> - 14x faster than nlohmann, 2.5x faster than Serde!</p>
812812
</section>
813813
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="69" data-paginate="true" data-theme="default" lang="C" data-marpit-pagination="69" style="--paginate:true;--theme:default;" data-marpit-pagination-total="81">
814-
<h1 id="ablation-study-how-we-achieved-34-gbs">Ablation Study: How We Achieved 3.4 GB/s</h1>
814+
<h1 id="ablation-study-how-we-achieved-32-gbs">Ablation Study: How We Achieved 3.2 GB/s</h1>
815815
<p><strong>What is Ablation?</strong><br />
816816
From neuroscience: systematically remove parts to understand function</p>
817-
<p><strong>Our Approach:</strong></p>
817+
<p><strong>Our Approach (Apple Silicon M2):</strong></p>
818818
<ol>
819-
<li><strong>Baseline</strong>: All optimizations enabled (3,435 MB/s)</li>
819+
<li><strong>Baseline</strong>: All optimizations enabled (3,211 MB/s)</li>
820820
<li><strong>Disable one optimization</strong> at a time</li>
821821
<li><strong>Measure performance impact</strong></li>
822822
<li><strong>Calculate contribution</strong>: <code>(Baseline - Disabled) / Disabled</code></li>
@@ -847,7 +847,7 @@ <h2 id="the-power-of-compile-time">The Power of Compile-Time</h2>
847847
</code></pre>
848848
</section>
849849
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="72" data-paginate="true" data-theme="default" lang="C" data-marpit-pagination="72" style="--paginate:true;--theme:default;" data-marpit-pagination-total="81">
850-
<h1 id="consteval-performance-impact">Consteval Performance Impact</h1>
850+
<h1 id="consteval-performance-impact-apple-silicon">Consteval Performance Impact (Apple Silicon)</h1>
851851
<table>
852852
<thead>
853853
<tr>
@@ -861,17 +861,17 @@ <h1 id="consteval-performance-impact">Consteval Performance Impact</h1>
861861
<tbody>
862862
<tr>
863863
<td>Twitter</td>
864-
<td>3,231 MB/s</td>
865-
<td>1,624 MB/s</td>
864+
<td>3,211 MB/s</td>
865+
<td>1,607 MB/s</td>
866866
<td>-50%</td>
867-
<td><strong>1.99x</strong></td>
867+
<td><strong>2.00x</strong></td>
868868
</tr>
869869
<tr>
870870
<td>CITM</td>
871-
<td>2,341 MB/s</td>
872-
<td>883 MB/s</td>
873-
<td>-62%</td>
874-
<td><strong>2.65x</strong></td>
871+
<td>2,360 MB/s</td>
872+
<td>978 MB/s</td>
873+
<td>-59%</td>
874+
<td><strong>2.41x</strong></td>
875875
</tr>
876876
</tbody>
877877
</table>
@@ -900,7 +900,7 @@ <h1 id="optimization-2-simd-string-escaping">Optimization #2: SIMD String Escapi
900900
</code></pre>
901901
</section>
902902
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="74" data-paginate="true" data-theme="default" lang="C" data-marpit-pagination="74" style="--paginate:true;--theme:default;" data-marpit-pagination-total="81">
903-
<h1 id="simd-escaping-performance-impact">SIMD Escaping Performance Impact</h1>
903+
<h1 id="simd-escaping-performance-impact-apple-silicon">SIMD Escaping Performance Impact (Apple Silicon)</h1>
904904
<table>
905905
<thead>
906906
<tr>
@@ -914,17 +914,17 @@ <h1 id="simd-escaping-performance-impact">SIMD Escaping Performance Impact</h1>
914914
<tbody>
915915
<tr>
916916
<td>Twitter</td>
917-
<td>3,231 MB/s</td>
918-
<td>2,245 MB/s</td>
919-
<td>-31%</td>
920-
<td><strong>1.44x</strong></td>
917+
<td>3,211 MB/s</td>
918+
<td>2,269 MB/s</td>
919+
<td>-29%</td>
920+
<td><strong>1.42x</strong></td>
921921
</tr>
922922
<tr>
923923
<td>CITM</td>
924-
<td>2,341 MB/s</td>
925-
<td>2,273 MB/s</td>
926-
<td>-3%</td>
927-
<td><strong>1.03x</strong></td>
924+
<td>2,360 MB/s</td>
925+
<td>2,259 MB/s</td>
926+
<td>-4%</td>
927+
<td><strong>1.04x</strong></td>
928928
</tr>
929929
</tbody>
930930
</table>
@@ -954,15 +954,15 @@ <h1 id="optimization-3-fast-digit-counting">Optimization #3: Fast Digit Counting
954954
<tbody>
955955
<tr>
956956
<td>Twitter</td>
957-
<td>3,231 MB/s</td>
958-
<td>3,041 MB/s</td>
957+
<td>3,211 MB/s</td>
958+
<td>3,035 MB/s</td>
959959
<td><strong>1.06x</strong></td>
960960
</tr>
961961
<tr>
962962
<td>CITM</td>
963-
<td>2,341 MB/s</td>
964-
<td>1,841 MB/s</td>
965-
<td><strong>1.27x</strong></td>
963+
<td>2,360 MB/s</td>
964+
<td>1,767 MB/s</td>
965+
<td><strong>1.34x</strong></td>
966966
</tr>
967967
</tbody>
968968
</table>
@@ -992,8 +992,8 @@ <h1 id="optimizations-4--5-branch-hints--buffer-growth">Optimizations #4 &amp; #
992992
<tbody>
993993
<tr>
994994
<td>Twitter &amp; CITM</td>
995-
<td>~2%</td>
996-
<td>1.02x</td>
995+
<td>~1%</td>
996+
<td>1.01x</td>
997997
</tr>
998998
</tbody>
999999
</table>
@@ -1013,40 +1013,40 @@ <h1 id="combined-performance-impact">Combined Performance Impact</h1>
10131013
<tbody>
10141014
<tr>
10151015
<td><strong>Consteval</strong></td>
1016-
<td>+99% (1.99x)</td>
1017-
<td>+165% (2.65x)</td>
1016+
<td>+100% (2.00x)</td>
1017+
<td>+141% (2.41x)</td>
10181018
</tr>
10191019
<tr>
10201020
<td><strong>SIMD Escaping</strong></td>
1021-
<td>+44% (1.44x)</td>
1022-
<td>+3% (1.03x)</td>
1021+
<td>+42% (1.42x)</td>
1022+
<td>+4% (1.04x)</td>
10231023
</tr>
10241024
<tr>
10251025
<td><strong>Fast Digits</strong></td>
10261026
<td>+6% (1.06x)</td>
1027-
<td>+27% (1.27x)</td>
1027+
<td>+34% (1.34x)</td>
10281028
</tr>
10291029
<tr>
10301030
<td><strong>Branch Hints</strong></td>
1031-
<td>+1.5%</td>
1032-
<td>+1.5%</td>
1031+
<td>+1%</td>
1032+
<td>+5%</td>
10331033
</tr>
10341034
<tr>
10351035
<td><strong>Buffer Growth</strong></td>
1036-
<td>+0.8%</td>
1037-
<td>+0.8%</td>
1036+
<td>-0.4%</td>
1037+
<td>+2%</td>
10381038
</tr>
10391039
<tr>
10401040
<td><strong>TOTAL</strong></td>
1041-
<td><strong>~2.95x faster</strong></td>
1042-
<td><strong>~3.5x faster</strong></td>
1041+
<td><strong>~2.9x faster</strong></td>
1042+
<td><strong>~3.4x faster</strong></td>
10431043
</tr>
10441044
</tbody>
10451045
</table>
10461046
<p><strong>From Baseline to Optimized:</strong></p>
10471047
<ul>
1048-
<li>Twitter: ~1,100 MB/s → 3,231 MB/s</li>
1049-
<li>CITM: ~670 MB/s → 2,341 MB/s</li>
1048+
<li>Twitter: ~1,100 MB/s → 3,211 MB/s</li>
1049+
<li>CITM: ~700 MB/s → 2,360 MB/s</li>
10501050
</ul>
10511051
</section>
10521052
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="78" data-paginate="true" data-theme="default" lang="C" data-marpit-pagination="78" style="--paginate:true;--theme:default;" data-marpit-pagination-total="81">

cppcon2025/cppcon_2025_slides.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -958,13 +958,13 @@ _mm512_cmple_epu8_mask(word, _mm512_set1_epi8(31));
958958

959959
---
960960

961-
# Ablation Study: How We Achieved 3.4 GB/s
961+
# Ablation Study: How We Achieved 3.2 GB/s
962962

963963
**What is Ablation?**
964964
From neuroscience: systematically remove parts to understand function
965965

966-
**Our Approach:**
967-
1. **Baseline**: All optimizations enabled (3,435 MB/s)
966+
**Our Approach (Apple Silicon M2):**
967+
1. **Baseline**: All optimizations enabled (3,211 MB/s)
968968
2. **Disable one optimization** at a time
969969
3. **Measure performance impact**
970970
4. **Calculate contribution**: `(Baseline - Disabled) / Disabled`
@@ -1001,12 +1001,12 @@ b.append_literal(username_key); // Just memcpy!
10011001

10021002
---
10031003

1004-
# Consteval Performance Impact
1004+
# Consteval Performance Impact (Apple Silicon)
10051005

10061006
| Dataset | Baseline | No Consteval | Impact | **Speedup** |
10071007
|---------|----------|--------------|--------|-------------|
1008-
| Twitter | 3,231 MB/s | 1,624 MB/s | -50% | **1.99x** |
1009-
| CITM | 2,341 MB/s | 883 MB/s | -62% | **2.65x** |
1008+
| Twitter | 3,211 MB/s | 1,607 MB/s | -50% | **2.00x** |
1009+
| CITM | 2,360 MB/s | 978 MB/s | -59% | **2.41x** |
10101010

10111011
**Twitter Example (100 tweets):**
10121012
- 100 tweets × 15 fields = **1,500 field names**
@@ -1039,12 +1039,12 @@ if (!needs_escape)
10391039

10401040
---
10411041

1042-
# SIMD Escaping Performance Impact
1042+
# SIMD Escaping Performance Impact (Apple Silicon)
10431043

10441044
| Dataset | Baseline | No SIMD | Impact | **Speedup** |
10451045
|---------|----------|---------|--------|-------------|
1046-
| Twitter | 3,231 MB/s | 2,245 MB/s | -31% | **1.44x** |
1047-
| CITM | 2,341 MB/s | 2,273 MB/s | -3% | **1.03x** |
1046+
| Twitter | 3,211 MB/s | 2,269 MB/s | -29% | **1.42x** |
1047+
| CITM | 2,360 MB/s | 2,259 MB/s | -4% | **1.04x** |
10481048

10491049
**Why Different Impact?**
10501050
- **Twitter**: Long text fields (tweets, descriptions) → Big win
@@ -1066,8 +1066,8 @@ fast_digit_count(value); // Bit operations + lookup table
10661066

10671067
| Dataset | Baseline | No Fast Digits | **Speedup** |
10681068
|---------|----------|----------------|-------------|
1069-
| Twitter | 3,231 MB/s | 3,041 MB/s | **1.06x** |
1070-
| CITM | 2,341 MB/s | 1,841 MB/s | **1.27x** |
1069+
| Twitter | 3,211 MB/s | 3,035 MB/s | **1.06x** |
1070+
| CITM | 2,360 MB/s | 1,767 MB/s | **1.34x** |
10711071

10721072
**CITM has ~10,000+ integers!**
10731073

@@ -1089,7 +1089,7 @@ if (UNLIKELY(buffer_full)) { // CPU knows this is rare
10891089

10901090
| Both Optimizations | Impact | Speedup |
10911091
|-------------------|--------|---------|
1092-
| Twitter & CITM | ~2% | 1.02x |
1092+
| Twitter & CITM | ~1% | 1.01x |
10931093

10941094
**Small but free!**
10951095

@@ -1101,16 +1101,16 @@ if (UNLIKELY(buffer_full)) { // CPU knows this is rare
11011101

11021102
| Optimization | Twitter Contribution | CITM Contribution |
11031103
|--------------|---------------------|-------------------|
1104-
| **Consteval** | +99% (1.99x) | +165% (2.65x) |
1105-
| **SIMD Escaping** | +44% (1.44x) | +3% (1.03x) |
1106-
| **Fast Digits** | +6% (1.06x) | +27% (1.27x) |
1107-
| **Branch Hints** | +1.5% | +1.5% |
1108-
| **Buffer Growth** | +0.8% | +0.8% |
1109-
| **TOTAL** | **~2.95x faster** | **~3.5x faster** |
1104+
| **Consteval** | +100% (2.00x) | +141% (2.41x) |
1105+
| **SIMD Escaping** | +42% (1.42x) | +4% (1.04x) |
1106+
| **Fast Digits** | +6% (1.06x) | +34% (1.34x) |
1107+
| **Branch Hints** | +1% | +5% |
1108+
| **Buffer Growth** | -0.4% | +2% |
1109+
| **TOTAL** | **~2.9x faster** | **~3.4x faster** |
11101110

11111111
**From Baseline to Optimized:**
1112-
- Twitter: ~1,100 MB/s → 3,231 MB/s
1113-
- CITM: ~670 MB/s → 2,341 MB/s
1112+
- Twitter: ~1,100 MB/s → 3,211 MB/s
1113+
- CITM: ~700 MB/s → 2,360 MB/s
11141114

11151115
---
11161116

cppcon2025/cppcon_2025_slides.pdf

99 Bytes
Binary file not shown.

cppcon2025/presentation.pdf

0 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)