Skip to content

Commit 2505ab9

Browse files
committed
Add architecture diagram
1 parent 96bf0f5 commit 2505ab9

File tree

1 file changed

+385
-0
lines changed

1 file changed

+385
-0
lines changed
Lines changed: 385 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,385 @@
1+
# Agent Architecture Diagrams - Context Engineering Course
2+
3+
**Complete architectural documentation for all fully functional agents in the Context Engineering course.**
4+
5+
---
6+
7+
## 📋 Table of Contents
8+
9+
1. [Overview](#overview)
10+
2. [Course Advisor Agent v1](#course-advisor-agent-v1)
11+
3. [Course Advisor Agent v2](#course-advisor-agent-v2)
12+
4. [Course Advisor Agent v3](#course-advisor-agent-v3)
13+
5. [ClassAgent (Reference)](#classagent-reference-implementation)
14+
6. [AugmentedClassAgent](#augmentedclassagent-extended-reference)
15+
7. [Evolution Summary](#evolution-summary)
16+
8. [Key Differences](#key-differences-between-versions)
17+
18+
---
19+
20+
## Overview
21+
22+
This document provides detailed architecture diagrams and specifications for all fully functional agents developed throughout the Context Engineering course. The course follows a **progressive enhancement approach** where a single agent (Redis University Course Advisor) evolves across multiple notebooks, with each version adding new capabilities.
23+
24+
### Agent Versions
25+
26+
| Version | Notebook | Tools | Key Features | Status |
27+
|---------|----------|-------|--------------|--------|
28+
| **v1** | 02_building_course_advisor_agent.ipynb | 3 | Base agent with dual-memory | ✅ Complete |
29+
| **v2** | 03_agent_with_memory_compression.ipynb | 3 | + Memory compression | ✅ Complete |
30+
| **v3** | 04_semantic_tool_selection.ipynb | 5 | + Semantic routing | ✅ Complete |
31+
| **ClassAgent** | reference-agent/agent.py | 2 | Production reference | ✅ Complete |
32+
| **AugmentedClassAgent** | reference-agent/augmented_agent.py | 4 | Extension pattern | ✅ Complete |
33+
34+
---
35+
36+
## Course Advisor Agent v1
37+
38+
**Location**: `notebooks/section-4-integrating-tools-and-agents/02_building_course_advisor_agent.ipynb`
39+
40+
**Purpose**: First complete production agent demonstrating LangGraph orchestration, dual-memory architecture, and tool calling.
41+
42+
### Architecture Diagram
43+
44+
See rendered Mermaid diagram: **Course Advisor Agent v1 - Architecture**
45+
46+
### Component Breakdown
47+
48+
#### 1. **LangGraph Orchestration Layer**
49+
50+
**Workflow**: `START → Load Memory → Agent → [Tools → Agent]* → Save Memory → END`
51+
52+
**Nodes** (4 total):
53+
- `load_memory` - Load working memory from Agent Memory Server
54+
- `agent` - LLM reasoning with tool calling (OpenAI GPT-4)
55+
- `tools` - Execute selected tool
56+
- `save_memory` - Save to working memory, auto-extract to long-term
57+
58+
**Decision Logic**:
59+
- If agent response contains tool calls → route to `tools` node
60+
- If no tool calls → route to `save_memory` node
61+
- Tools always loop back to `agent` for final response
62+
63+
#### 2. **Tool Inventory** (3 Tools)
64+
65+
| Tool | Purpose | Input | Output |
66+
|------|---------|-------|--------|
67+
| **search_courses** | Semantic course search using vector embeddings | query: str, limit: int | Course list with descriptions |
68+
| **search_memories** | Search long-term memory for user facts | query: str, limit: int | Relevant memories |
69+
| **store_memory** | Save important information to long-term memory | text: str, type: str, topics: List[str] | Confirmation |
70+
71+
#### 3. **Memory Architecture** (Dual-Memory System)
72+
73+
**Working Memory** (Session-scoped):
74+
- Conversation history for current session
75+
- Loaded at start of each turn
76+
- Saved at end of each turn
77+
- Enables conversation continuity and grounding
78+
79+
**Long-term Memory** (Cross-session):
80+
- Persistent facts about the student
81+
- Preferences, goals, completed courses
82+
- Searchable via semantic vector search
83+
- Accessible via `search_memories` tool
84+
85+
**Graph State** (Turn-scoped):
86+
- `messages`: List[BaseMessage] - Conversation messages
87+
- `student_id`: str - Student identifier
88+
- `session_id`: str - Session identifier
89+
- `context`: Dict - Retrieved context from long-term memory
90+
91+
**Memory Extraction**:
92+
- Strategy: **Discrete** (default)
93+
- Automatically extracts individual facts from conversations
94+
- Stores in long-term memory for future retrieval
95+
96+
#### 4. **Data Flow**
97+
98+
```
99+
User Query
100+
101+
Load Working Memory (conversation history)
102+
103+
Agent Node (LLM reasoning with 3 tools)
104+
105+
Tool Call? → Yes → Execute Tool → Back to Agent
106+
→ No → Save Memory → Response
107+
```
108+
109+
#### 5. **Storage Backends**
110+
111+
- **Redis** (Port 6379): Vector search for courses, memory storage
112+
- **Agent Memory Server** (Port 8088): Memory management, extraction engine
113+
114+
#### 6. **External Integrations**
115+
116+
- **OpenAI API**: GPT-4 for LLM, embeddings for vector search
117+
- **CourseManager**: Course search, vector embeddings, course data
118+
119+
### Context Engineering Techniques
120+
121+
- ✅ System context (role, instructions)
122+
- ✅ User context (student profile)
123+
- ✅ Conversation context (working memory)
124+
- ✅ Retrieved context (RAG via search_courses)
125+
- ✅ Memory-based context (long-term memory)
126+
127+
### Key Learning Objectives
128+
129+
1. Build stateful agents with LangGraph StateGraph
130+
2. Implement dual-memory architecture (working + long-term)
131+
3. Create and integrate multiple tools
132+
4. Design conversation flow control
133+
5. Integrate Agent Memory Server with LangGraph
134+
135+
---
136+
137+
## Course Advisor Agent v2
138+
139+
**Location**: `notebooks/section-4-integrating-tools-and-agents/03_agent_with_memory_compression.ipynb`
140+
141+
**Purpose**: Enhanced version of v1 that adds working memory compression for long conversations.
142+
143+
### Architecture Diagram
144+
145+
See rendered Mermaid diagram: **Course Advisor Agent v2 - Architecture with Memory Compression**
146+
147+
### What's New in v2
148+
149+
#### 🆕 Memory Compression Layer
150+
151+
**Problem**: Unbounded conversation growth leads to:
152+
- Token limits exceeded (128K for GPT-4o)
153+
- High API costs (quadratic growth)
154+
- Increased latency
155+
- Context rot (LLMs struggle with very long contexts)
156+
157+
**Solution**: Three compression strategies
158+
159+
### Compression Strategies
160+
161+
#### 1. **Truncation** (Fast, Simple)
162+
163+
**How it works**: Keep only the most recent N messages within token budget
164+
165+
**Pros**:
166+
- ✅ Fast (no LLM calls)
167+
- ✅ Predictable token usage
168+
- ✅ Simple implementation
169+
170+
**Cons**:
171+
- ❌ Loses all old context
172+
- ❌ No intelligence in selection
173+
174+
**Use when**:
175+
- Speed is critical (real-time chat)
176+
- Recent context is all that matters
177+
- Cost-sensitive (no LLM calls)
178+
179+
#### 2. **Priority-Based** (Balanced)
180+
181+
**How it works**: Score messages by importance, keep highest-scoring ones
182+
183+
**Scoring factors**:
184+
- System messages (high priority)
185+
- Tool calls and results (high priority)
186+
- User questions (medium priority)
187+
- Recency (recent messages scored higher)
188+
- Keywords (domain-specific importance)
189+
190+
**Pros**:
191+
- ✅ Preserves important context
192+
- ✅ No LLM calls (fast)
193+
- ✅ Balanced approach
194+
195+
**Cons**:
196+
- ❌ Requires good scoring logic
197+
- ❌ May lose temporal flow
198+
199+
**Use when**:
200+
- Need balance between speed and quality
201+
- Important context scattered throughout conversation
202+
- No LLM calls allowed (cost/latency constraints)
203+
204+
#### 3. **Summarization** (High Quality)
205+
206+
**How it works**: LLM creates intelligent summaries of old messages, keep recent ones
207+
208+
**Process**:
209+
1. Split conversation into old (to summarize) and recent (to keep)
210+
2. Format old messages for summarization
211+
3. LLM generates comprehensive summary
212+
4. Return summary + recent messages
213+
214+
**Pros**:
215+
- ✅ Preserves meaning
216+
- ✅ High quality compression
217+
- ✅ Intelligent context preservation
218+
219+
**Cons**:
220+
- ❌ Slower (requires LLM call)
221+
- ❌ Costs tokens
222+
- ❌ Additional latency
223+
224+
**Use when**:
225+
- Quality is critical
226+
- Long conversations (30+ turns)
227+
- Can afford LLM call latency
228+
- Comprehensive context needed
229+
230+
### Enhanced Components
231+
232+
**Graph State** (v2 additions):
233+
- `compression_stats`: Dict - Compression metrics and statistics
234+
235+
**Agent Memory Server** (v2 configuration):
236+
- `WINDOW_SIZE` environment variable for auto-compression
237+
- Automatic compression when threshold exceeded
238+
- Background processing (async workers)
239+
240+
### Compression Comparison
241+
242+
| Strategy | Messages | Tokens | Savings | Quality | Speed |
243+
|----------|----------|--------|---------|---------|-------|
244+
| Original | 60 | 1,500 | 0% | N/A | N/A |
245+
| Truncation | 20 | 1,000 | 33% | Low | Fast |
246+
| Priority-Based | 22 | 980 | 35% | Medium | Fast |
247+
| Summarization | 5 | 850 | 43% | High | Slow |
248+
249+
### Key Learning Objectives
250+
251+
1. Implement working memory compression strategies
252+
2. Understand token cost management
253+
3. Apply compression in production agents
254+
4. Balance context preservation vs token efficiency
255+
256+
---
257+
258+
## Course Advisor Agent v3
259+
260+
**Location**: `notebooks/section-4-integrating-tools-and-agents/04_semantic_tool_selection.ipynb`
261+
262+
**Purpose**: Scales agent from 3 to 5 tools using semantic tool selection to reduce token costs.
263+
264+
### Architecture Diagram
265+
266+
See rendered Mermaid diagram: **Course Advisor Agent v3 - Architecture with Semantic Tool Selection**
267+
268+
### What's New in v3
269+
270+
#### 🆕 Expanded Tool Set (5 Tools)
271+
272+
**New Tools** (2 added):
273+
274+
| Tool | Purpose | Input | Output |
275+
|------|---------|-------|--------|
276+
| **check_prerequisites** | Check course requirements | course_id: str | Prerequisites list, eligibility status |
277+
| **compare_courses** | Compare courses side-by-side | course_ids: List[str] | Comparison table |
278+
279+
**All Tools** (5 total):
280+
1. search_courses (from v1)
281+
2. search_memories (from v1)
282+
3. store_memory (from v1)
283+
4. check_prerequisites (NEW)
284+
5. compare_courses (NEW)
285+
286+
#### 🆕 Semantic Tool Selection Layer
287+
288+
**Problem**: Token cost scales linearly with number of tools
289+
- 3 tools = ~1,200 tokens
290+
- 5 tools = ~2,200 tokens (83% increase)
291+
- 10 tools = ~4,000 tokens
292+
- Sending all tools every time is wasteful
293+
294+
**Solution**: RedisVL Semantic Router for intelligent tool selection
295+
296+
### Semantic Router Architecture
297+
298+
**How it works**:
299+
1. Define **Routes** for each tool with reference examples
300+
2. Router automatically creates Redis vector index
301+
3. Generates embeddings for all route references
302+
4. For each query, embed query and find top-k most similar routes
303+
5. Send only selected tools to LLM
304+
305+
**Route Structure**:
306+
```python
307+
Route(
308+
name="check_prerequisites",
309+
references=[
310+
"Check course prerequisites",
311+
"Verify readiness for a course",
312+
"Understand course requirements",
313+
...
314+
],
315+
metadata={"category": "course_planning"},
316+
distance_threshold=0.3
317+
)
318+
```
319+
320+
**Selection Process**:
321+
```
322+
User Query: "What are the prerequisites for RU202?"
323+
324+
Embed Query → [0.23, -0.45, 0.67, ...]
325+
326+
Compare to Route Embeddings:
327+
check_prerequisites: similarity = 0.92 ✅
328+
search_courses: similarity = 0.45
329+
compare_courses: similarity = 0.38
330+
search_memories: similarity = 0.12
331+
store_memory: similarity = 0.08
332+
333+
Select Top-k (k=3):
334+
→ check_prerequisites
335+
→ search_courses
336+
→ compare_courses
337+
338+
Send only these 3 tools to LLM (instead of all 5)
339+
```
340+
341+
### Tool Selection Strategies Comparison
342+
343+
| Strategy | Description | Pros | Cons | Use When |
344+
|----------|-------------|------|------|----------|
345+
| **Static** | Send all tools | Simple | Wasteful tokens | ≤3 tools |
346+
| **Pre-filtered** | Keyword matching | Fast | Brittle, manual rules | 4-7 tools |
347+
| **Semantic** | Embedding similarity | Robust, scalable | Requires embeddings | 8+ tools |
348+
349+
### Token Cost Optimization
350+
351+
**Without Semantic Selection**:
352+
- 5 tools sent every time = 2,200 tokens per query
353+
- 100 queries = 220,000 tokens
354+
- Cost: ~$0.55 (at $0.0025/1K tokens)
355+
356+
**With Semantic Selection** (top-3):
357+
- 3 tools sent per query = ~1,000 tokens per query
358+
- 100 queries = 100,000 tokens
359+
- Cost: ~$0.25 (at $0.0025/1K tokens)
360+
- **Savings: 55% reduction in token costs**
361+
362+
### Enhanced Components
363+
364+
**Graph State** (v3 additions):
365+
- `selected_tools`: List - Tools selected by semantic router
366+
- `tool_selection_method`: str - Selection strategy used
367+
368+
**Redis** (v3 additions):
369+
- Tool route index for semantic routing
370+
- Route embeddings stored in Redis
371+
372+
**OpenAI** (v3 usage):
373+
- Route embeddings generation
374+
- Query embeddings for routing
375+
376+
### Key Learning Objectives
377+
378+
1. Implement semantic tool selection at scale
379+
2. Use RedisVL Semantic Router for production routing
380+
3. Understand token cost scaling with tools
381+
4. Design scalable tool architectures
382+
383+
---
384+
385+

0 commit comments

Comments
 (0)