Skip to content

Commit 10168f0

Browse files
committed
Updates
1 parent 233cad1 commit 10168f0

File tree

2 files changed

+192
-37
lines changed

2 files changed

+192
-37
lines changed

assets/multinear.png

252 KB
Loading

slides.md

Lines changed: 192 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -148,11 +148,11 @@ transition: slide-up
148148
| | | |
149149
| ----------------- | ----------------- | ----------------- |
150150
| | **Business app** | **Enterprise data‑lake** |
151-
| <span style="color: #00007F">Focus</span> | <span v-mark="{ at: 0, color: 'red', type: 'underline' }">Reliability</span> over coverage | <span v-mark="{ at: 0, color: 'green', type: 'underline' }">Coverage & speed</span> over accuracy |
151+
| <span style="color: #00007F">Success Criteria</span> | <span v-mark="{ at: 1, color: 'red', type: 'underline' }">Reliability</span> over coverage | <span v-mark="{ at: 1, color: 'green', type: 'underline' }">Coverage & speed</span> over accuracy |
152152
| <span style="color: #00007F">Typical User</span> | Business user | Data analyst, developer, PM |
153153
| <span style="color: #00007F">Tables</span> | 10‑50 | 1000+ |
154-
| <span style="color: #00007F">Accuracy</span> | 95 %+ required | 70‑80 % acceptable |
155-
| <span style="color: #00007F">Consistency</span> | Very important | Not so important |
154+
| <span style="color: #00007F">Accuracy</span> | <span v-mark="{ at: 1, color: 'red', type: 'circle' }">95 %+</span> required | 70‑80 % acceptable |
155+
| <span style="color: #00007F">Consistency</span> | <span v-mark="{ at: 1, color: 'red', type: 'underline' }">Very important</span> | Not so important |
156156

157157
<div style="padding-top: 3em; text-align: center;">
158158
<b>More Use Cases</b>: 🔸 Client-facing app 🔸 Internal business logic
@@ -255,17 +255,141 @@ flowchart LR
255255
transition: slide-up
256256
---
257257

258-
# 🔄 Consistency
258+
# 🔄 Inconsistency: Same Q, Different Answers
259+
##
259260

261+
<div style="font-size: 0.7em;">
262+
User Question: <b>"Last year sales, by quarter"</b>
263+
</div>
264+
265+
<div grid="~ cols-3 gap-4" style="font-size: 0.7em;">
266+
267+
<div>
268+
269+
**Variant A**
270+
271+
```sql
272+
SELECT
273+
EXTRACT(QUARTER FROM order_date) AS quarter,
274+
SUM(total) AS sales
275+
FROM orders
276+
WHERE EXTRACT(YEAR FROM order_date) = 2023
277+
GROUP BY quarter;
278+
```
279+
280+
| quarter | sales |
281+
|---------|--------|
282+
| 1 | 12,500 |
283+
| 2 | 10,400 |
284+
285+
</div>
286+
287+
<div>
288+
289+
**Variant B**
290+
291+
```sql
292+
SELECT
293+
CONCAT('Q', EXTRACT(QUARTER FROM order_date)) AS Quarter,
294+
ROUND(SUM(total), 0) AS TotalSales
295+
FROM orders
296+
WHERE YEAR(order_date) = YEAR(CURDATE()) - 1
297+
GROUP BY Quarter;
298+
```
299+
300+
| <span v-mark="{ at: 1, color: 'red', type: 'underline' }">Quarter</span> | <span v-mark="{ at: 1, color: 'red', type: 'underline' }">TotalSales</span> |
301+
|---------|------------|
302+
| <span v-mark="{ at: 1, color: 'red', type: 'underline' }">Q1</span> | 12,500 |
303+
| <span v-mark="{ at: 1, color: 'red', type: 'underline' }">Q2</span> | 10,400 |
304+
305+
</div>
306+
307+
<div>
308+
309+
310+
**Variant C**
311+
312+
```sql
313+
SELECT
314+
CONCAT(EXTRACT(YEAR FROM order_date), '-Q', EXTRACT(QUARTER FROM order_date)) AS period,
315+
FORMAT(SUM(total), 2) AS total_amount
316+
FROM orders
317+
WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31'
318+
GROUP BY period;
319+
```
320+
321+
| <span v-mark="{ at: 1, color: 'orange', type: 'underline' }">period</span> | <span v-mark="{ at: 1, color: 'orange', type: 'underline' }">total_amount</span> |
322+
|----------|--------------|
323+
| <span v-mark="{ at: 1, color: 'orange', type: 'underline' }">2023-Q1</span> | <span v-mark="{ at: 1, color: 'orange', type: 'underline' }">12,500.00</span> |
324+
| <span v-mark="{ at: 1, color: 'orange', type: 'underline' }">2023-Q2</span> | <span v-mark="{ at: 1, color: 'orange', type: 'underline' }">10,400.00</span> |
325+
326+
</div>
327+
328+
</div>
329+
330+
<style>
331+
pre {
332+
font-size: 0.6em !important;
333+
}
334+
</style>
260335

261336
---
262337
transition: slide-up
263338
---
264339

265-
# 🔍 Eval-driven development
266-
Iterative process
340+
# Accuracy
341+
342+
- timeframe
343+
- all sales vs all my sales
344+
- sales: net vs gross, booked vs paid
345+
346+
---
347+
transition: slide-up
348+
---
349+
350+
# Observations
351+
Across multiple projects
352+
353+
<div style="padding-top: 0.5em">
354+
355+
🔹 **Large context windows** are not helping much
356+
357+
<div style="padding-left: 1.3em; font-style: italic;">
358+
More information = more noise, requires more thinking = more mistakes
359+
</div>
360+
361+
🔹 **Fine-tuning** helps a little, but requires a lot of time and resources
362+
363+
<div style="padding-left: 1.3em; font-style: italic;">
364+
Models already know SQL, teaching them new knowledge is hard
365+
</div>
366+
367+
🔹 <span v-mark="{ at: 1, color: 'red', type: 'underline' }">Critical</span>: **Fast Experimentation**
368+
369+
<div style="padding-left: 1.3em; font-style: italic;">
370+
Fast feedback loop allows rapid development
371+
</div>
372+
373+
🔹 <span v-mark="{ at: 1, color: 'red', type: 'underline' }">Critical</span>: **Evaluations**
374+
375+
<div style="padding-left: 1.3em; font-style: italic;">
376+
Evaluations are the only way to know if the solution is doing what it's supposed to
377+
</div>
378+
379+
</div>
380+
381+
---
382+
transition: slide-up
383+
---
384+
385+
# 🔎 Eval-driven development
386+
##
387+
388+
- Experimentation: this is the way
389+
- Iterative process
390+
391+
<br/>
267392

268-
.
269393

270394
```mermaid {scale: 0.8}
271395
graph LR
@@ -307,36 +431,50 @@ graph LR
307431
transition: slide-up
308432
---
309433

310-
# Observations
311-
Across multiple projects
312-
313-
<div style="padding-top: 0.5em">
434+
# ⚙️ Development Workflow
435+
##
314436

315-
🔹 **Large context windows** are not helping much
437+
- Start with a goal
438+
- Reverse engineer evals
439+
- Experiment iterations
440+
- Benchmark at the end
316441

317-
<div style="padding-left: 1.3em; font-style: italic;">
318-
More information = more noise, requires more thinking = more mistakes
319-
</div>
320-
321-
🔹 **Fine-tuning** helps a little, but requires a lot of time and resources
322-
323-
<div style="padding-left: 1.3em; font-style: italic;">
324-
Models already know SQL, teaching them new knowledge is hard
325-
</div>
442+
```mermaid {scale: 0.9}
443+
graph LR
444+
style DG fill:#F6FFED,stroke:#B7EB8F,stroke-width:2px,rx:10,ry:10
445+
style DP fill:#FFFBE6,stroke:#FFE58F,stroke-width:2px,rx:10,ry:10
446+
style RE fill:#F0F5FF,stroke:#ADC6FF,stroke-width:2px,rx:10,ry:10
447+
style EV fill:#FFFBE6,stroke:#FFE58F,stroke-width:2px,rx:10,ry:10
448+
style AN fill:#FFF1F0,stroke:#FFCCC7,stroke-width:2px,rx:10,ry:10
449+
style IM fill:#F6FFED,stroke:#B7EB8F,stroke-width:2px,rx:10,ry:10
450+
style BM fill:#F0F5FF,stroke:#ADC6FF,stroke-width:2px,rx:10,ry:10
451+
style PR fill:#F6FFED,stroke:#B7EB8F,stroke-width:2px,rx:10,ry:10
452+
453+
DG([Define the Goal]) --> DP([POC]);
454+
DP --> RE(["Evals v1"]);
455+
456+
subgraph IT [Iterate]
457+
direction LR
458+
EV([Evaluate]) --> AN([Analyze]);
459+
AN --> IM([Improve]);
460+
IM --> EV;
461+
end
326462
327-
🔹 <span v-mark="{ at: 0, color: 'red', type: 'underline' }">Critical</span>: **Fast Experimentation**
463+
RE --> IT;
464+
EV --> BM([Benchmark]);
465+
BM --> PR([Production]);
466+
```
467+
<!-- Describes the overall development lifecycle -->
328468

329-
<div style="padding-left: 1.3em; font-style: italic;">
330-
Fast feedback loop allows rapid development
331-
</div>
469+
---
332470

333-
🔹 <span v-mark="{ at: 0, color: 'red', type: 'underline' }">Critical</span>: **Evaluations**
471+
# How to Evaluate
334472

335-
<div style="padding-left: 1.3em; font-style: italic;">
336-
Evaluations are the only way to know if the solution is doing what it's supposed to
337-
</div>
473+
- LLM-as-a-judge
474+
- Query mock DB
475+
- Advanced: https://arxiv.org/abs/2312.10321
338476

339-
</div>
477+
<!-- - -->
340478

341479
---
342480

@@ -365,18 +503,35 @@ Across multiple projects
365503

366504
# Takeaways
367505

368-
1. Pinpoint *your* success criteria first
369-
<!-- 2. Use Mini‑RAG – fewer, better examples
370-
3. Split tasks so the model thinks less
371-
4. Invest in evals early; they pay daily
372-
5. Model swap is easy *after* #4 -->
506+
1. 💡 Pinpoint *your* success criteria first
507+
2. ⚙️ Develop POC
508+
3. 🔎 Build evals
509+
4. 🚀 Iterate
510+
5. 📈 Production with confidence (benchmarks)
511+
512+
---
513+
514+
# 📚 Resources
515+
## &nbsp;
516+
517+
- [Multinear Site](https://multinear.com)
518+
- [Multinear Platform](https://github.com/multinear/multinear)
519+
- [Uber Text-to-SQL](https://www.uber.com/en-GB/blog/query-gpt/)
520+
- [LinkedIn Text-to-SQL](https://www.linkedin.com/blog/engineering/ai/practical-text-to-sql-for-data-analytics)
521+
- [Eugene Yan on evals](https://eugeneyan.com/tag/eval/)
522+
- [Hamel Husain on evals](https://hamel.dev)
523+
- [Lenny Rachitsky episode on evals](https://x.com/lennysan/status/1909636749103599729)
373524

374525
---
375526

376527
## What's next?
377528

378-
- Grab the starter kit ➜ ...
379-
- Register for the deep‑dive workshop
529+
- Register for the deep-dive workshop
380530
- Connect with us on LinkedIn / X
381531

532+
Use Multinear
533+
534+
<img src="./assets/multinear.png" style="width: 20em"></img>
535+
382536
Thanks!
537+

0 commit comments

Comments
 (0)