@@ -78,21 +78,21 @@ Source: Gwen (Chen) Shapira
7878---
7979
8080
81- ## SIMD
81+ ## SIMD (Single Instruction, multiple data)
8282
83- - Stands for Single instruction, multiple data
8483- Allows us to process 16 (or more) bytes or more with one instruction
8584- Supported on all modern CPUs (phone, laptop)
85+ - <Add a bullet point for language support voted on C++26>
8686
8787---
8888
89- # Superscalar vs. SIMD execution
89+ # Not all processors are equal
9090
91- | processor | year | arithmetic logic units | SIMD units | simdjson |
92- | -----------------| ---------| ---------------------------| ----------------| ---------- |
93- | Apple M* | 2019 | 6+ | $4 \times 128$ | 🥉 |
94- | Intel Lion Cove | 2024 | 6 | $4 \times 256$ | 🥈🥈 |
95- | AMD Zen 5 | 2024 | 6 | $4 \times 512$ | 🥇🥇🥇 |
91+ | processor | year | arithmetic logic units | SIMD units |
92+ | -----------------| ---------| ---------------------------| ----------------|
93+ | Apple M* | 2019 | 6+ | $4 \times 128$ |
94+ | Intel Lion Cove | 2024 | 6 | $4 \times 256$ |
95+ | AMD Zen 5 | 2024 | 6 | $4 \times 512$ |
9696
9797---
9898
@@ -110,8 +110,8 @@ Source: Gwen (Chen) Shapira
110110
111111- First scan identifies the structural characters, start of all strings at about 10 GB/s using SIMD instructions.
112112- Validates Unicode (UTF-8) at 30 GB/s.
113- - Rest of parsing relies on index.
114- - Allows fast skipping.
113+ - Rest of parsing relies on the generated index.
114+ - Allows fast skipping. (Only parse what we need)
115115
116116---
117117
@@ -137,67 +137,17 @@ The simdjson library is found in...
137137
138138<img src =" images/nodejs.jpg " width =" 20% " >
139139
140-
141- ---
142-
143- # Conventional JSON parsing (DOM)
144-
145- Start with JSON.
146- ``` json
147- {"name" :" Scooby" , "age" : 3 , "friends" :[" Fred" , " Daphne" , " Velma" ]}
148- ```
149-
150- Parses (everything) to Document-Object-Model:
151- <img src =" images/dom.svg " />
152-
153- Copies to user data structure.
154-
155-
156- ---
157-
158- # Limitations of conventional parsing
159-
160- - Tends to parse everything at once even when not needed.
161- - Requires an intermediate data structure (DOM).
162- - Can't specialize (e.g., treat ` "123" ` as a number)
163-
164-
165- --
166-
167- # On-Demand
168-
169- Can load a multi-kilobyte file and only parse a narrow segment from a fast index.
170-
171- ``` cpp
172- #include < iostream>
173- #include " simdjson.h"
174- using namespace simdjson ;
175- int main (void) {
176- ondemand::parser parser;
177- padded_string json = padded_string::load("twitter.json");
178- ondemand::document tweets = parser.iterate(json);
179- std::cout << uint64_t(tweets[ "search_metadata"] [ "count" ] ) << " results." << std::endl;
180- }
181- ```
182-
183-
184- ---
185-
186- # Automate the serialization/deserialization process.
187-
188-
189- <img src="images/tofrom.svg" width="100%">
190-
191140---
192141
193142# The Problem
194143
195144Imagine you're building a game server that needs to persist player data.
196145
197146
198-
199147<img src =" images/player.svg " width =" 60% " >
200148
149+
150+
201151---
202152
203153You start simple:
@@ -234,23 +184,6 @@ fmt::format(
234184);
235185```
236186
237- ---
238-
239- # With a library (JSON for Modern C++)
240-
241- Or you might use a library.
242-
243- ``` cpp
244- std::string to_json (Player& p) {
245- return nlohmann::json{{"username", p.username},
246- {"level", p.level},
247- {"health", p.health},
248- {"inventory", p.inventory}}
249- .dump();
250- }
251- ```
252-
253-
254187---
255188
256189# Manual Deserialization (simdjson)
@@ -267,19 +200,6 @@ for (auto item : arr) {
267200}
268201```
269202
270- ---
271-
272- # The Pain Points
273-
274- This manual approach has several problems:
275-
276- 1 . ** Repetition** : Every field needs to be handled twice (serialize + deserialize)
277- 2 . ** Maintenance Nightmare** : Add a new field? Update both functions!
278- 3 . ** Error-Prone** : Typos in field names, forgotten fields, type mismatches
279- 4 . ** Boilerplate Explosion** : 30+ lines for a simple 4-field struct
280- 5 . ** Performance** : You may fall into performance traps
281-
282-
283203---
284204
285205# When Your Game Grows...
@@ -299,11 +219,29 @@ struct Player {
299219 std::vector< std::string > inventory;
300220 std::map<std::string, Equipment> equipped; // New!
301221 std::vector<Achievement > achievements; // New!
302- std::optional< std::string > guild_name; // New!
222+ std::optional< std::string > guild_name; // New!
303223};
304224```
305225
306- **Suddenly you need to write hundreds of lines of serialization code! 😱**
226+ ---
227+
228+ <img src="images/happy_programmer.jpg">
229+
230+ ---
231+
232+ # The Pain Points
233+
234+ This manual approach has several problems:
235+
236+ 1. **Maintenance Nightmare**: Add a new field? Update both functions!
237+ 2. **Error-Prone**: Typos in field names, forgotten fields, type mismatches
238+
239+ ---
240+
241+ # Automate the serialization/deserialization process.
242+
243+
244+ <img src="images/tofrom.svg" width="100%">
307245
308246---
309247
0 commit comments