@@ -56,7 +56,7 @@ transition: slide-left
5656# π‘ Meet the Team
5757
5858<div grid =" ~ cols-2 gap-4 " >
59- <div style =" padding-top : 1 em " >
59+ <div style =" padding-top : 2 em " >
6060
6161π€ ** Dima Kuchin**
6262
@@ -67,7 +67,7 @@ transition: slide-left
6767- [ LinkedIn @kuchin ] ( https://www.linkedin.com/in/kuchin ) , [ X @kuchin ] ( https://x.com/kuchin )
6868
6969</div >
70- <div style =" padding-top : 1 em " >
70+ <div style =" padding-top : 2 em " >
7171
7272π€ ** Asaf Bord**
7373
@@ -166,9 +166,11 @@ transition: slide-up
166166
167167# π Business App Text-to-SQL
168168
169- ** Role** : A business analyst replacement.
169+ ## Role: A business analyst replacement
170170
171- ** Goal** : Reliability.
171+ ** Primary Goal** : Reliability
172+
173+ <div style =" padding-top : 0.1em ;" >
172174
173175A combination of:
174176
@@ -179,6 +181,8 @@ A combination of:
179181- π ** Presentation** : Choose between tables and charts, picking the best visualization method
180182- π ** Guardrails** : Enforce access control, prevent prompt injection
181183
184+ </div >
185+
182186<div style =" padding-top : 2em ; text-align : center ;" >
183187 β Ensuring analyst-level trust & quality β
184188</div >
@@ -193,6 +197,8 @@ transition: fade-out
193197
194198Decompose the solution into smaller steps.
195199
200+ <br ><br >
201+
196202``` mermaid
197203flowchart LR
198204 style A fill:#E6F7FF,stroke:#91D5FF,stroke-width:2px
@@ -207,7 +213,7 @@ flowchart LR
207213 A{{Question}} --> B2(Security);
208214 B1 --> C(((Build Query)));
209215 B2 --> C(((Build Query)));
210- B2 -.- > F(Stop);
216+ B2 -- " Alert β οΈ " -- > F((( Stop)) );
211217 C --> D(Execute);
212218 D --> E(Presentation);
213219 B1 -.-> A;
@@ -245,7 +251,7 @@ flowchart LR
245251
246252 B1 --> C;
247253 B2 --> C;
248- B2 -.- > F(Stop);
254+ B2 -- " Alert β οΈ " -- > F((( Stop)) );
249255 C --> D(Execute);
250256 D --> E(Presentation);
251257 B1 -.-> A;
@@ -347,10 +353,10 @@ transition: slide-up
347353transition: slide-up
348354---
349355
350- # Observations
356+ # π Observations
351357Across multiple projects
352358
353- <div style =" padding-top : 0.5 em " >
359+ <div style =" font-size : 0.85 em ; " >
354360
355361πΉ ** Large context windows** are not helping much
356362
@@ -364,6 +370,12 @@ Across multiple projects
364370 Models already know SQL, teaching them new knowledge is hard
365371</div >
366372
373+ πΉ <span v-mark =" { at: 1, color: 'green', type: 'underline' } " >** Best approach** </span >: a combination of instructions and examples
374+
375+ <div style =" padding-left : 1.3em ; font-style : italic ;" >
376+ Instructions are the <b>what</b>, examples are the <b>how</b>
377+ </div >
378+
367379πΉ <span v-mark =" { at: 1, color: 'red', type: 'underline' } " >Critical</span >: ** Fast Experimentation**
368380
369381<div style =" padding-left : 1.3em ; font-style : italic ;" >
@@ -382,13 +394,19 @@ Across multiple projects
382394transition: slide-up
383395---
384396
397+ # π§© Examples
398+
399+ ---
400+ transition: slide-up
401+ ---
402+
385403# π Eval-driven development
386404##
387405
388406- Experimentation: this is the way
389407- Iterative process
390408
391- <br / >
409+ <br >< br >
392410
393411
394412``` mermaid {scale: 0.8}
@@ -439,6 +457,8 @@ transition: slide-up
439457- Experiment iterations
440458- Benchmark at the end
441459
460+ <br >
461+
442462``` mermaid {scale: 0.9}
443463graph LR
444464 style DG fill:#F6FFED,stroke:#B7EB8F,stroke-width:2px,rx:10,ry:10
@@ -468,13 +488,55 @@ graph LR
468488
469489---
470490
471- # How to Evaluate
491+ # βοΈ How to Evaluate
492+ ## LLM-as-a-judge
472493
473- - LLM-as-a-judge
474- - Query mock DB
475- - Advanced: https://arxiv.org/abs/2312.10321
494+ <div style =" font-size : 0.8em ; padding-top : 1em ;" >
476495
477- <!-- - -->
496+ ** LLM Prompt:**
497+
498+ ``` markdown
499+ Given the following
500+ Database schema: <DB_SCHEMA>
501+ User question: <USER_QUESTION>
502+ Expected SQL query: <EXPECTED_SQL>
503+ Generated SQL query: <GENERATED_SQL>
504+
505+ Do the "Generated SQL query" and the "Expected SQL query" produce
506+ the same results for the given user question and DB schema?
507+ ```
508+
509+ ** Expected Output:**
510+
511+ - ** Answer:** True / False
512+ - ** Reasoning:** Brief explanation comparing the queries
513+
514+ </div >
515+
516+ <style >
517+ pre {
518+ font-size : 0.8em !important ;
519+ }
520+ </style >
521+
522+ ---
523+
524+ # βοΈ How to Evaluate
525+ ## Query mock DB
526+
527+ <br >
528+
529+ - Generate mock database that matches the schema
530+ - Run Text-to-SQL on the user question, get new SQL query
531+ - Run both expected SQL and generated SQL at the mock DB
532+ - Compare results β
533+
534+ ---
535+
536+ # βοΈ How to Evaluate
537+ ## Advanced
538+
539+ https://arxiv.org/abs/2312.10321
478540
479541---
480542
@@ -501,18 +563,53 @@ graph LR
501563
502564---
503565
566+ # π€ AI Agents
567+ ##
568+ ** Agents are simple**
569+
570+ <div style =" text-align : center ; margin-top : -3em " >
571+
572+ ``` mermaid {scale: 0.85}
573+ graph TB
574+ style GQ fill:#E6F7FF,stroke:#91D5FF,stroke-width:2px,rx:10,ry:10
575+ style GS fill:#FFFBE6,stroke:#FFE58F,stroke-width:2px,rx:10,ry:10
576+ style ES fill:#F0F5FF,stroke:#ADC6FF,stroke-width:2px,rx:10,ry:10
577+ style AR fill:#F6FFED,stroke:#B7EB8F,stroke-width:2px,rx:10,ry:10
578+ style IM fill:#FCF4E0,stroke:#FFD666,stroke-width:2px,rx:10,ry:10
579+ style Stop fill:#F6FFED,stroke:#B7EB8F,stroke-width:2px,rx:10,ry:10
580+
581+ GQ{{Question}} --> Loop;
582+
583+ subgraph Loop [Agent Loop]
584+ direction LR
585+ GS(Generate SQL) --> ES([Execute SQL]);
586+ ES --> AR([Analyze Results]);
587+ AR -- " Unhappy π« " --> IM([Improve]);
588+ IM --> GS;
589+ end
590+
591+ AR -- " Good β
" --> Stop(((Done)));
592+ ```
593+
594+ </div >
595+
596+ ---
597+
504598# Takeaways
505599
506- 1 . π‘ Pinpoint * your* success criteria first
507- 2 . βοΈ Develop POC
508- 3 . π Build evals
509- 4 . π Iterate
510- 5 . π Production with confidence (benchmarks)
600+ <br >
601+
602+ 1 . β Pinpoint * your* success criteria first
603+ 2 . π€ Make LLM think less, not more
604+ 3 . π Reverse engineer evals
605+ 4 . π Experiment, iterate, benchmark
606+ 5 . π Production with confidence
511607
512608---
513609
514610# π Resources
515- ##   ;
611+
612+ <br >
516613
517614- [ Multinear Site] ( https://multinear.com )
518615- [ Multinear Platform] ( https://github.com/multinear/multinear )
@@ -526,8 +623,13 @@ graph LR
526623
527624## What's next?
528625
626+ <br >
627+
529628- Register for the deep-dive workshop
530- - Connect with us on LinkedIn / X
629+ - Follow us on LinkedIn / X
630+ - Subscribe to the newsletter
631+
632+ <br >
531633
532634Use Multinear
533635
0 commit comments