Skip to content

Commit c9566a1

Browse files
committed
Updates
1 parent 10168f0 commit c9566a1

File tree

1 file changed

+123
-21
lines changed

1 file changed

+123
-21
lines changed

β€Žslides.mdβ€Ž

Lines changed: 123 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ transition: slide-left
5656
# πŸ’‘ Meet the Team
5757

5858
<div grid="~ cols-2 gap-4">
59-
<div style="padding-top: 1em">
59+
<div style="padding-top: 2em">
6060

6161
πŸ‘€ **Dima Kuchin**
6262

@@ -67,7 +67,7 @@ transition: slide-left
6767
- [LinkedIn @kuchin](https://www.linkedin.com/in/kuchin), [X @kuchin](https://x.com/kuchin)
6868

6969
</div>
70-
<div style="padding-top: 1em">
70+
<div style="padding-top: 2em">
7171

7272
πŸ‘€ **Asaf Bord**
7373

@@ -166,9 +166,11 @@ transition: slide-up
166166

167167
# πŸ“Š Business App Text-to-SQL
168168

169-
**Role**: A business analyst replacement.
169+
## Role: A business analyst replacement
170170

171-
**Goal**: Reliability.
171+
**Primary Goal**: Reliability
172+
173+
<div style="padding-top: 0.1em;">
172174

173175
A combination of:
174176

@@ -179,6 +181,8 @@ A combination of:
179181
- πŸ“Š **Presentation**: Choose between tables and charts, picking the best visualization method
180182
- πŸ”’ **Guardrails**: Enforce access control, prevent prompt injection
181183

184+
</div>
185+
182186
<div style="padding-top: 2em; text-align: center;">
183187
⭐ Ensuring analyst-level trust & quality ⭐
184188
</div>
@@ -193,6 +197,8 @@ transition: fade-out
193197

194198
Decompose the solution into smaller steps.
195199

200+
<br><br>
201+
196202
```mermaid
197203
flowchart LR
198204
style A fill:#E6F7FF,stroke:#91D5FF,stroke-width:2px
@@ -207,7 +213,7 @@ flowchart LR
207213
A{{Question}} --> B2(Security);
208214
B1 --> C(((Build Query)));
209215
B2 --> C(((Build Query)));
210-
B2 -.-> F(Stop);
216+
B2 -- "&nbsp;Alert ⚠️&nbsp;" --> F(((Stop)));
211217
C --> D(Execute);
212218
D --> E(Presentation);
213219
B1 -.-> A;
@@ -245,7 +251,7 @@ flowchart LR
245251
246252
B1 --> C;
247253
B2 --> C;
248-
B2 -.-> F(Stop);
254+
B2 -- "&nbsp;Alert ⚠️&nbsp;" --> F(((Stop)));
249255
C --> D(Execute);
250256
D --> E(Presentation);
251257
B1 -.-> A;
@@ -347,10 +353,10 @@ transition: slide-up
347353
transition: slide-up
348354
---
349355

350-
# Observations
356+
# πŸ‘€ Observations
351357
Across multiple projects
352358

353-
<div style="padding-top: 0.5em">
359+
<div style="font-size: 0.85em;">
354360

355361
πŸ”Ή **Large context windows** are not helping much
356362

@@ -364,6 +370,12 @@ Across multiple projects
364370
Models already know SQL, teaching them new knowledge is hard
365371
</div>
366372

373+
πŸ”Ή <span v-mark="{ at: 1, color: 'green', type: 'underline' }">**Best approach**</span>: a combination of instructions and examples
374+
375+
<div style="padding-left: 1.3em; font-style: italic;">
376+
Instructions are the <b>what</b>, examples are the <b>how</b>
377+
</div>
378+
367379
πŸ”Ή <span v-mark="{ at: 1, color: 'red', type: 'underline' }">Critical</span>: **Fast Experimentation**
368380

369381
<div style="padding-left: 1.3em; font-style: italic;">
@@ -382,13 +394,19 @@ Across multiple projects
382394
transition: slide-up
383395
---
384396

397+
# 🧩 Examples
398+
399+
---
400+
transition: slide-up
401+
---
402+
385403
# πŸ”Ž Eval-driven development
386404
##
387405

388406
- Experimentation: this is the way
389407
- Iterative process
390408

391-
<br/>
409+
<br><br>
392410

393411

394412
```mermaid {scale: 0.8}
@@ -439,6 +457,8 @@ transition: slide-up
439457
- Experiment iterations
440458
- Benchmark at the end
441459

460+
<br>
461+
442462
```mermaid {scale: 0.9}
443463
graph LR
444464
style DG fill:#F6FFED,stroke:#B7EB8F,stroke-width:2px,rx:10,ry:10
@@ -468,13 +488,55 @@ graph LR
468488

469489
---
470490

471-
# How to Evaluate
491+
# βš–οΈ How to Evaluate
492+
## LLM-as-a-judge
472493

473-
- LLM-as-a-judge
474-
- Query mock DB
475-
- Advanced: https://arxiv.org/abs/2312.10321
494+
<div style="font-size: 0.8em; padding-top: 1em;">
476495

477-
<!-- - -->
496+
**LLM Prompt:**
497+
498+
```markdown
499+
Given the following
500+
Database schema: <DB_SCHEMA>
501+
User question: <USER_QUESTION>
502+
Expected SQL query: <EXPECTED_SQL>
503+
Generated SQL query: <GENERATED_SQL>
504+
505+
Do the "Generated SQL query" and the "Expected SQL query" produce
506+
the same results for the given user question and DB schema?
507+
```
508+
509+
**Expected Output:**
510+
511+
- **Answer:** True / False
512+
- **Reasoning:** Brief explanation comparing the queries
513+
514+
</div>
515+
516+
<style>
517+
pre {
518+
font-size: 0.8em !important;
519+
}
520+
</style>
521+
522+
---
523+
524+
# βš–οΈ How to Evaluate
525+
## Query mock DB
526+
527+
<br>
528+
529+
- Generate mock database that matches the schema
530+
- Run Text-to-SQL on the user question, get new SQL query
531+
- Run both expected SQL and generated SQL at the mock DB
532+
- Compare results βœ…
533+
534+
---
535+
536+
# βš–οΈ How to Evaluate
537+
## Advanced
538+
539+
https://arxiv.org/abs/2312.10321
478540

479541
---
480542

@@ -501,18 +563,53 @@ graph LR
501563

502564
---
503565

566+
# πŸ€– AI Agents
567+
##
568+
**Agents are simple**
569+
570+
<div style="text-align: center; margin-top: -3em">
571+
572+
```mermaid {scale: 0.85}
573+
graph TB
574+
style GQ fill:#E6F7FF,stroke:#91D5FF,stroke-width:2px,rx:10,ry:10
575+
style GS fill:#FFFBE6,stroke:#FFE58F,stroke-width:2px,rx:10,ry:10
576+
style ES fill:#F0F5FF,stroke:#ADC6FF,stroke-width:2px,rx:10,ry:10
577+
style AR fill:#F6FFED,stroke:#B7EB8F,stroke-width:2px,rx:10,ry:10
578+
style IM fill:#FCF4E0,stroke:#FFD666,stroke-width:2px,rx:10,ry:10
579+
style Stop fill:#F6FFED,stroke:#B7EB8F,stroke-width:2px,rx:10,ry:10
580+
581+
GQ{{Question}} --> Loop;
582+
583+
subgraph Loop [Agent Loop]
584+
direction LR
585+
GS(Generate SQL) --> ES([Execute SQL]);
586+
ES --> AR([Analyze Results]);
587+
AR -- "&nbsp;Unhappy 🚫&nbsp;" --> IM([Improve]);
588+
IM --> GS;
589+
end
590+
591+
AR -- "&nbsp;Good βœ…&nbsp;" --> Stop(((Done)));
592+
```
593+
594+
</div>
595+
596+
---
597+
504598
# Takeaways
505599

506-
1. πŸ’‘ Pinpoint *your* success criteria first
507-
2. βš™οΈ Develop POC
508-
3. πŸ”Ž Build evals
509-
4. πŸš€ Iterate
510-
5. πŸ“ˆ Production with confidence (benchmarks)
600+
<br>
601+
602+
1. ⭐ Pinpoint *your* success criteria first
603+
2. πŸ€” Make LLM think less, not more
604+
3. πŸ”Ž Reverse engineer evals
605+
4. πŸ“ˆ Experiment, iterate, benchmark
606+
5. πŸš€ Production with confidence
511607

512608
---
513609

514610
# πŸ“š Resources
515-
## &nbsp;
611+
612+
<br>
516613

517614
- [Multinear Site](https://multinear.com)
518615
- [Multinear Platform](https://github.com/multinear/multinear)
@@ -526,8 +623,13 @@ graph LR
526623

527624
## What's next?
528625

626+
<br>
627+
529628
- Register for the deep-dive workshop
530-
- Connect with us on LinkedIn / X
629+
- Follow us on LinkedIn / X
630+
- Subscribe to the newsletter
631+
632+
<br>
531633

532634
Use Multinear
533635

0 commit comments

Comments
Β (0)