Skip to content

Commit 5fee8c8

Browse files
committed
Update Clay agent evaluation notes and multi-agent workflow details
1 parent 65427f2 commit 5fee8c8

File tree

3 files changed

+113
-91
lines changed

3 files changed

+113
-91
lines changed

journals/2025_02_22.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@
1414
- [[AI/LLM/Technique/LLM System Eval]] term [[Grading Party]]
1515
- [[Person/Dan Mason]] #langgraph with #MCP
1616
- [[AI/ES/25/ws/4/Multi-Agent Workflows with MCP]]
17-
-
1817
- #Filed
1918
- [[CLI/Tool/ffmpeg]]
2019
- [[CLI/Tool/yt-dlp]]

pages/AI___ES___25___ws___3___How Clay Performs Agent Evaluation.md

Lines changed: 112 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -184,56 +184,49 @@ tags:: [[AI/Agent]], [[LangChain]], [[Workshop]], [[Tutorial]]
184184
- "Talking: Sydney"
185185
- ## Expected trajectory can be ordered
186186
- ![image.png](../assets/image_1740255660446_0.png)
187-
- **v1 Trajectory (Red Box)**
188-
- `get_calendar(time="5pm")`
189-
- `get_calendar(time="7pm")` *(green)*
190-
- `get_calendar(time="6pm")` *(red)*
191-
- `schedule_meeting(time="7pm")`
192-
- **Expected Trajectory (Green Box)**
193-
- `get_calendar(time="5pm")`
194-
- `get_calendar(time="6pm")` *(red)*
195-
- `get_calendar(time="7pm")`
196-
- `schedule_meeting(time="7pm")`
197-
- **Key takeaway:**
198-
- The v1 trajectory executes `get_calendar(time="7pm")` before `get_calendar(time="6pm")`, which is incorrect compared to the expected ordering.
199-
- **LangChain logo present at the bottom-left corner**
200-
- **Slide number:** 41
201-
- ## Setting up Agentic Evals
187+
- **v1 Trajectory (Red Box)**
188+
- `get_calendar(time="5pm")`
189+
- `get_calendar(time="7pm")` *(green)*
190+
- `get_calendar(time="6pm")` *(red)*
191+
- `schedule_meeting(time="7pm")`
192+
- **Expected Trajectory (Green Box)**
193+
- `get_calendar(time="5pm")`
194+
- `get_calendar(time="6pm")` *(red)*
195+
- `get_calendar(time="7pm")`
196+
- `schedule_meeting(time="7pm")`
197+
- **Key takeaway:**
198+
- The v1 trajectory executes `get_calendar(time="7pm")` before `get_calendar(time="6pm")`, which is incorrect compared to the expected ordering.
199+
- **LangChain logo present at the bottom-left corner**
200+
- **Slide number:** 41
201+
- ## **Multi-turn conversations can be tested individually, or in series**
202202
- ![image.png](../assets/image_1740256078101_0.png)
203-
- **Multi-turn conversations can be tested individually, or in series**
204203
- **Comparison of Testing Approaches:**
205204
- **One Conversation (Green Box)**
206205
- Human: A
207-
- AI: B
206+
- AI: B
208207
- Human: C
209-
- AI: D
208+
- AI: D
210209
- Human: E
211-
- AI: F
210+
- AI: F
212211
- **Each Turn can be tested individually (Red Boxes)**
213212
- **First box:**
214213
- Human: A
215-
- AI: ?
214+
- AI: ?
216215
- **Second box:**
217216
- Human: A
218-
- AI: B
217+
- AI: B
219218
- Human: C
220-
- AI: ?
219+
- AI: ?
221220
- **Third box:**
222221
- Human: A
223-
- AI: B
222+
- AI: B
224223
- Human: C
225-
- AI: D
224+
- AI: D
226225
- Human: E
227-
- AI: ?
228-
- **LangChain logo present at the bottom-left corner**
226+
- AI: ?
229227
- **Slide number:** 46
230-
- **Zoom overlay visible at the top**
231-
- "This meeting is being recorded."
232-
- "Not hearing anything? Turn up volume"
233-
- "Talking: Sydney"
234-
- ## Improving your Application
228+
- ## **Regression Testing: Does performance actually improve with a change?**
235229
- ![image.png](../assets/image_1740256094705_0.png)
236-
- **Regression Testing: Does performance actually improve with a change?**
237230
- **Diagram Structure:**
238231
- **Application Versions**
239232
- v1 → Dataset A (red arrow)
@@ -253,11 +246,7 @@ tags:: [[AI/Agent]], [[LangChain]], [[Workshop]], [[Tutorial]]
253246
- v2: 10s
254247
- v3: 20s
255248
- v4: 5s
256-
- **LangChain logo present at the bottom-left corner**
257249
- **Slide number:** 52
258-
- **Zoom overlay visible at the top**
259-
- "This meeting is being recorded."
260-
- "Not hearing anything? Turn up volume"
261250
- ## [[Person/Ratch Sujithan]]
262251
- ### Clay
263252
- "data marketplace for go to markplace"
@@ -266,68 +255,102 @@ tags:: [[AI/Agent]], [[LangChain]], [[Workshop]], [[Tutorial]]
266255
- it can augment your dataset
267256
- go to clay.com and see what they have to offer
268257
- trusted by 300k gtm teams - openai, airbnb, [[Anthropic]], [[CursorAI]], notion, dropbox
269-
- ### demo that took 5-7 min to shut up
258+
- ### demo that took 5-7 min to create
270259
- ## LangChain Demo: LinkedIn Profile Finder
271260
- **Workspace:** Langchain demo
272261
- ![image.png](../assets/image_1740256185555_0.png)
273-
- **Data Table View:**
274-
- **Columns:**
275-
- "Find stargazers on" (GitHub user list)
276-
- "Num Contributions"
277-
- "Repos Contributed"
278-
- "LinkedIn Profile Find"
279-
- "Response"
280-
- **Example Users:**
281-
- `allsayar` (160+ contributions)
282-
- `JohnShahawy` (3481 contributions, `backstage`, `cotmaker`)
283-
- `davidtsong`, `salomartin`, `sjwithmore`, etc.
284-
- **Model:** Claygent -> Argon
285-
- **Configuration:**
286-
- **Prompt:**
287-
"Given a person's full name, GitHub username, and GitHub profile link, find their LinkedIn profile URL. Follow these steps to ensure accuracy:"
288-
- **Inputs:**
289-
- Name (`T Name`)
290-
- GitHub Username (`T Username`)
291-
- GitHub URL (`URL`)
292-
- **Steps to Execute:**
293-
1. Check the GitHub Profile Directly
294-
- **Additional UI Elements:**
295-
- "Compare models" button
296-
- "Save" button
297-
- **Zoom overlay visible at the top:**
298-
- "This meeting is being recorded."
299-
- "Not hearing anything? Turn up volume"
300-
- "Talking: Sydney"
301-
- **MacOS Dock visible at the bottom**
302-
- use "stargazers" on github integration
303-
- put langchain github url in there
304-
- get back stargazers in a spareadsheet
305-
- extract usernames and github urla
306-
- use github integration to get names and contributions and number of repos they created
307-
- i'm a recruiter to find out about them and find their backgrounds
308-
- there's a linkedin profile finder
309-
- now you can extract person from linkedin profile
310-
- now you have country profile, email
311-
- write personalized messages to them
312-
- very small snapshot of what clay can do
262+
- cgpt
263+
- **Data Table View:**
264+
- **Columns:**
265+
- "Find stargazers on" (GitHub user list)
266+
- "Num Contributions"
267+
- "Repos Contributed"
268+
- "LinkedIn Profile Find"
269+
- "Response"
270+
- **Example Users:**
271+
- `allsayar` (160+ contributions)
272+
- `JohnShahawy` (3481 contributions, `backstage`, `cotmaker`)
273+
- `davidtsong`, `salomartin`, `sjwithmore`, etc.
274+
- **Model:** Claygent -> Argon
275+
- **Configuration:**
276+
- **Prompt:**
277+
"Given a person's full name, GitHub username, and GitHub profile link, find their LinkedIn profile URL. Follow these steps to ensure accuracy:"
278+
- **Inputs:**
279+
- Name (`T Name`)
280+
- GitHub Username (`T Username`)
281+
- GitHub URL (`URL`)
282+
- **Steps to Execute:**
283+
1. Check the GitHub Profile Directly
284+
- **Additional UI Elements:**
285+
- "Compare models" button
286+
- "Save" button
287+
- **Zoom overlay visible at the top:**
288+
- "This meeting is being recorded."
289+
- "Not hearing anything? Turn up volume"
290+
- "Talking: Sydney"
291+
- my #notes
292+
- use "stargazers" on github integration
293+
- put langchain github url in there
294+
- get back stargazers in a spareadsheet
295+
- extract usernames and github urla
296+
- use github integration to get names and contributions and number of repos they created
297+
- i'm a recruiter to find out about them and find their backgrounds
298+
- there's a linkedin profile finder
299+
- now you can extract person from linkedin profile
300+
- now you have country profile, email
301+
- write personalized messages to them
302+
- very small snapshot of what clay can do
313303
- ### how we think of evals
314304
- ### Logs to Action
315305
- logs here could be json logs
316306
- analysis
317307
- clustering
318308
- regression
319-
- ## Evals at Clay
309+
- ## **Evals: Steering AI Strategy**
320310
- **Evals: Steering AI Strategy**
321311
- ![image.png](../assets/image_1740257703536_0.png)
322-
- **Key Points:**
323-
- Data-driven feedback loop
324-
- Continuous Feedback
325-
- Customer-Centric Insights
326-
- Strategic Alignment
327-
- Evolutionary Foundation
328-
- **Clay logo present at the bottom-left corner**
329-
- **Zoom overlay visible at the top**
330-
- "This meeting is being recorded."
331-
- "Not hearing anything? Turn up volume"
332-
- "Talking: Sydney"
312+
- **Key Points:**
313+
- Data-driven feedback loop
314+
- Continuous Feedback
315+
- Customer-Centric Insights
316+
- Strategic Alignment
317+
- Evolutionary Foundation
318+
- ## Core Elements of Our Evaluation Framework
319+
- **Development Evaluations**
320+
- Validate Functionality
321+
- Test Integration
322+
- Support New Use Cases
323+
- **Observability Evaluations**
324+
- Monitor Usage Patterns
325+
- Performance Analysis
326+
- Driving Strategic Decisions
327+
- **Key Insight:**
328+
- Observability insights continuously refine development evaluations in a feedback loop.
329+
- ## Evals at Clay - **Evaluation Pipeline Overview**
330+
- image here
331+
- tbd
332+
- **Production / Observability Evals**
333+
- **Tools Used:**
334+
- Segment
335+
- LangSmith
336+
- **Process:**
337+
- Logs are collected and processed in LangSmith
338+
- Structured events analyzed (Eval_IDs assigned)
339+
- Pattern analysis and broader insights derived from multiple evaluations
340+
- Insights linked to Linear for tracking action items
341+
- Example: Sprint 1 - Implement Use Case 1
342+
- Example: Sprint 2 - Implement Use Case 2
343+
- **Development Evals**
344+
- **Process:**
345+
- New functionality added and simulated in development
346+
- Update test suite:
347+
- Test Case 1 → Test Use Case 1
348+
- Test Case 2 → Test Use Case 2
349+
- Test Case N → Test Use Case N
350+
- Run CI Pipelines:
351+
- Smoke Test CI
352+
- Integration Test CI
353+
- **GitHub Actions handles CI runs**
354+
- **Final step: Deploy! 🚀**
355+
-
333356
-

pages/AI___ES___25___ws___4___Multi-Agent Workflows with MCP.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,4 +26,4 @@ tags:: [[Anthropic/MCP]], #langgraph, Agentic
2626
- #langgraph is the easiest to explain of all the frameworks #Quote -- [[Person/Dan Mason]]
2727
- it is #Expensive at least it gan me
2828
- it is model agnostic
29-
-
29+
- [[GitHub/CoPilot]] is at the bottom of the pile but

0 commit comments

Comments
 (0)