@@ -148,11 +148,11 @@ transition: slide-up
148148| | | |
149149| ----------------- | ----------------- | ----------------- |
150150| | ** Business app** | ** Enterprise data‑lake** |
151- | <span style =" color : #00007F " >Focus </span > | <span v-mark =" { at: 0 , color: 'red', type: 'underline' } " >Reliability</span > over coverage | <span v-mark =" { at: 0 , color: 'green', type: 'underline' } " >Coverage & speed</span > over accuracy |
151+ | <span style =" color : #00007F " >Success Criteria </span > | <span v-mark =" { at: 1 , color: 'red', type: 'underline' } " >Reliability</span > over coverage | <span v-mark =" { at: 1 , color: 'green', type: 'underline' } " >Coverage & speed</span > over accuracy |
152152| <span style =" color : #00007F " >Typical User</span > | Business user | Data analyst, developer, PM |
153153| <span style =" color : #00007F " >Tables</span > | 10‑50 | 1000+ |
154- | <span style =" color : #00007F " >Accuracy</span > | 95 %+ required | 70‑80 % acceptable |
155- | <span style =" color : #00007F " >Consistency</span > | Very important | Not so important |
154+ | <span style =" color : #00007F " >Accuracy</span > | < span v-mark = " { at: 1, color: 'red', type: 'circle' } " > 95 %+</ span > required | 70‑80 % acceptable |
155+ | <span style =" color : #00007F " >Consistency</span > | < span v-mark = " { at: 1, color: 'red', type: 'underline' } " > Very important</ span > | Not so important |
156156
157157<div style =" padding-top : 3em ; text-align : center ;" >
158158 <b>More Use Cases</b>: 🔸 Client-facing app 🔸 Internal business logic
@@ -255,17 +255,141 @@ flowchart LR
255255transition: slide-up
256256---
257257
258- # 🔄 Consistency
258+ # 🔄 Inconsistency: Same Q, Different Answers
259+ ##
259260
261+ <div style =" font-size : 0.7em ;" >
262+ User Question: <b >"Last year sales, by quarter"</b >
263+ </div >
264+
265+ <div grid =" ~ cols-3 gap-4 " style =" font-size : 0.7em ;" >
266+
267+ <div >
268+
269+ ** Variant A**
270+
271+ ``` sql
272+ SELECT
273+ EXTRACT(QUARTER FROM order_date) AS quarter,
274+ SUM (total) AS sales
275+ FROM orders
276+ WHERE EXTRACT(YEAR FROM order_date) = 2023
277+ GROUP BY quarter;
278+ ```
279+
280+ | quarter | sales |
281+ | ---------| --------|
282+ | 1 | 12,500 |
283+ | 2 | 10,400 |
284+
285+ </div >
286+
287+ <div >
288+
289+ ** Variant B**
290+
291+ ``` sql
292+ SELECT
293+ CONCAT(' Q' , EXTRACT(QUARTER FROM order_date)) AS Quarter,
294+ ROUND(SUM (total), 0 ) AS TotalSales
295+ FROM orders
296+ WHERE YEAR(order_date) = YEAR(CURDATE()) - 1
297+ GROUP BY Quarter;
298+ ```
299+
300+ | <span v-mark =" { at: 1, color: 'red', type: 'underline' } " >Quarter</span > | <span v-mark =" { at: 1, color: 'red', type: 'underline' } " >TotalSales</span > |
301+ | ---------| ------------|
302+ | <span v-mark =" { at: 1, color: 'red', type: 'underline' } " >Q1</span > | 12,500 |
303+ | <span v-mark =" { at: 1, color: 'red', type: 'underline' } " >Q2</span > | 10,400 |
304+
305+ </div >
306+
307+ <div >
308+
309+
310+ ** Variant C**
311+
312+ ``` sql
313+ SELECT
314+ CONCAT(EXTRACT(YEAR FROM order_date), ' -Q' , EXTRACT(QUARTER FROM order_date)) AS period,
315+ FORMAT(SUM (total), 2 ) AS total_amount
316+ FROM orders
317+ WHERE order_date BETWEEN ' 2023-01-01' AND ' 2023-12-31'
318+ GROUP BY period;
319+ ```
320+
321+ | <span v-mark =" { at: 1, color: 'orange', type: 'underline' } " >period</span > | <span v-mark =" { at: 1, color: 'orange', type: 'underline' } " >total_amount</span > |
322+ | ----------| --------------|
323+ | <span v-mark =" { at: 1, color: 'orange', type: 'underline' } " >2023-Q1</span > | <span v-mark =" { at: 1, color: 'orange', type: 'underline' } " >12,500.00</span > |
324+ | <span v-mark =" { at: 1, color: 'orange', type: 'underline' } " >2023-Q2</span > | <span v-mark =" { at: 1, color: 'orange', type: 'underline' } " >10,400.00</span > |
325+
326+ </div >
327+
328+ </div >
329+
330+ <style >
331+ pre {
332+ font-size : 0.6em !important ;
333+ }
334+ </style >
260335
261336---
262337transition: slide-up
263338---
264339
265- # 🔍 Eval-driven development
266- Iterative process
340+ # Accuracy
341+
342+ - timeframe
343+ - all sales vs all my sales
344+ - sales: net vs gross, booked vs paid
345+
346+ ---
347+ transition: slide-up
348+ ---
349+
350+ # Observations
351+ Across multiple projects
352+
353+ <div style =" padding-top : 0.5em " >
354+
355+ 🔹 ** Large context windows** are not helping much
356+
357+ <div style =" padding-left : 1.3em ; font-style : italic ;" >
358+ More information = more noise, requires more thinking = more mistakes
359+ </div >
360+
361+ 🔹 ** Fine-tuning** helps a little, but requires a lot of time and resources
362+
363+ <div style =" padding-left : 1.3em ; font-style : italic ;" >
364+ Models already know SQL, teaching them new knowledge is hard
365+ </div >
366+
367+ 🔹 <span v-mark =" { at: 1, color: 'red', type: 'underline' } " >Critical</span >: ** Fast Experimentation**
368+
369+ <div style =" padding-left : 1.3em ; font-style : italic ;" >
370+ Fast feedback loop allows rapid development
371+ </div >
372+
373+ 🔹 <span v-mark =" { at: 1, color: 'red', type: 'underline' } " >Critical</span >: ** Evaluations**
374+
375+ <div style =" padding-left : 1.3em ; font-style : italic ;" >
376+ Evaluations are the only way to know if the solution is doing what it's supposed to
377+ </div >
378+
379+ </div >
380+
381+ ---
382+ transition: slide-up
383+ ---
384+
385+ # 🔎 Eval-driven development
386+ ##
387+
388+ - Experimentation: this is the way
389+ - Iterative process
390+
391+ <br />
267392
268- .
269393
270394``` mermaid {scale: 0.8}
271395graph LR
@@ -307,36 +431,50 @@ graph LR
307431transition: slide-up
308432---
309433
310- # Observations
311- Across multiple projects
312-
313- <div style =" padding-top : 0.5em " >
434+ # ⚙️ Development Workflow
435+ ##
314436
315- 🔹 ** Large context windows** are not helping much
437+ - Start with a goal
438+ - Reverse engineer evals
439+ - Experiment iterations
440+ - Benchmark at the end
316441
317- <div style =" padding-left : 1.3em ; font-style : italic ;" >
318- More information = more noise, requires more thinking = more mistakes
319- </div >
320-
321- 🔹 ** Fine-tuning** helps a little, but requires a lot of time and resources
322-
323- <div style =" padding-left : 1.3em ; font-style : italic ;" >
324- Models already know SQL, teaching them new knowledge is hard
325- </div >
442+ ``` mermaid {scale: 0.9}
443+ graph LR
444+ style DG fill:#F6FFED,stroke:#B7EB8F,stroke-width:2px,rx:10,ry:10
445+ style DP fill:#FFFBE6,stroke:#FFE58F,stroke-width:2px,rx:10,ry:10
446+ style RE fill:#F0F5FF,stroke:#ADC6FF,stroke-width:2px,rx:10,ry:10
447+ style EV fill:#FFFBE6,stroke:#FFE58F,stroke-width:2px,rx:10,ry:10
448+ style AN fill:#FFF1F0,stroke:#FFCCC7,stroke-width:2px,rx:10,ry:10
449+ style IM fill:#F6FFED,stroke:#B7EB8F,stroke-width:2px,rx:10,ry:10
450+ style BM fill:#F0F5FF,stroke:#ADC6FF,stroke-width:2px,rx:10,ry:10
451+ style PR fill:#F6FFED,stroke:#B7EB8F,stroke-width:2px,rx:10,ry:10
452+
453+ DG([Define the Goal]) --> DP([POC]);
454+ DP --> RE(["Evals v1"]);
455+
456+ subgraph IT [Iterate]
457+ direction LR
458+ EV([Evaluate]) --> AN([Analyze]);
459+ AN --> IM([Improve]);
460+ IM --> EV;
461+ end
326462
327- 🔹 <span v-mark =" { at: 0, color: 'red', type: 'underline' } " >Critical</span >: ** Fast Experimentation**
463+ RE --> IT;
464+ EV --> BM([Benchmark]);
465+ BM --> PR([Production]);
466+ ```
467+ <!-- Describes the overall development lifecycle -->
328468
329- <div style =" padding-left : 1.3em ; font-style : italic ;" >
330- Fast feedback loop allows rapid development
331- </div >
469+ ---
332470
333- 🔹 < span v-mark = " { at: 0, color: 'red', type: 'underline' } " >Critical</ span >: ** Evaluations **
471+ # How to Evaluate
334472
335- < div style = " padding-left : 1.3 em ; font-style : italic ; " >
336- Evaluations are the only way to know if the solution is doing what it's supposed to
337- </ div >
473+ - LLM-as-a-judge
474+ - Query mock DB
475+ - Advanced: https://arxiv.org/abs/2312.10321
338476
339- </ div >
477+ <!-- - -- >
340478
341479---
342480
@@ -365,18 +503,35 @@ Across multiple projects
365503
366504# Takeaways
367505
368- 1 . Pinpoint * your* success criteria first
369- <!-- 2. Use Mini‑RAG – fewer, better examples
370- 3. Split tasks so the model thinks less
371- 4. Invest in evals early; they pay daily
372- 5. Model swap is easy *after* #4 -->
506+ 1 . 💡 Pinpoint * your* success criteria first
507+ 2 . ⚙️ Develop POC
508+ 3 . 🔎 Build evals
509+ 4 . 🚀 Iterate
510+ 5 . 📈 Production with confidence (benchmarks)
511+
512+ ---
513+
514+ # 📚 Resources
515+ ##   ;
516+
517+ - [ Multinear Site] ( https://multinear.com )
518+ - [ Multinear Platform] ( https://github.com/multinear/multinear )
519+ - [ Uber Text-to-SQL] ( https://www.uber.com/en-GB/blog/query-gpt/ )
520+ - [ LinkedIn Text-to-SQL] ( https://www.linkedin.com/blog/engineering/ai/practical-text-to-sql-for-data-analytics )
521+ - [ Eugene Yan on evals] ( https://eugeneyan.com/tag/eval/ )
522+ - [ Hamel Husain on evals] ( https://hamel.dev )
523+ - [ Lenny Rachitsky episode on evals] ( https://x.com/lennysan/status/1909636749103599729 )
373524
374525---
375526
376527## What's next?
377528
378- - Grab the starter kit ➜ ...
379- - Register for the deep‑dive workshop
529+ - Register for the deep-dive workshop
380530- Connect with us on LinkedIn / X
381531
532+ Use Multinear
533+
534+ <img src =" ./assets/multinear.png " style =" width : 20em " ></img >
535+
382536Thanks!
537+
0 commit comments