|
| 1 | +# Agent Architecture Diagrams - Context Engineering Course |
| 2 | + |
| 3 | +**Complete architectural documentation for all fully functional agents in the Context Engineering course.** |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## 📋 Table of Contents |
| 8 | + |
| 9 | +1. [Overview](#overview) |
| 10 | +2. [Course Advisor Agent v1](#course-advisor-agent-v1) |
| 11 | +3. [Course Advisor Agent v2](#course-advisor-agent-v2) |
| 12 | +4. [Course Advisor Agent v3](#course-advisor-agent-v3) |
| 13 | +5. [ClassAgent (Reference)](#classagent-reference-implementation) |
| 14 | +6. [AugmentedClassAgent](#augmentedclassagent-extended-reference) |
| 15 | +7. [Evolution Summary](#evolution-summary) |
| 16 | +8. [Key Differences](#key-differences-between-versions) |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +## Overview |
| 21 | + |
| 22 | +This document provides detailed architecture diagrams and specifications for all fully functional agents developed throughout the Context Engineering course. The course follows a **progressive enhancement approach** where a single agent (Redis University Course Advisor) evolves across multiple notebooks, with each version adding new capabilities. |
| 23 | + |
| 24 | +### Agent Versions |
| 25 | + |
| 26 | +| Version | Notebook | Tools | Key Features | Status | |
| 27 | +|---------|----------|-------|--------------|--------| |
| 28 | +| **v1** | 02_building_course_advisor_agent.ipynb | 3 | Base agent with dual-memory | ✅ Complete | |
| 29 | +| **v2** | 03_agent_with_memory_compression.ipynb | 3 | + Memory compression | ✅ Complete | |
| 30 | +| **v3** | 04_semantic_tool_selection.ipynb | 5 | + Semantic routing | ✅ Complete | |
| 31 | +| **ClassAgent** | reference-agent/agent.py | 2 | Production reference | ✅ Complete | |
| 32 | +| **AugmentedClassAgent** | reference-agent/augmented_agent.py | 4 | Extension pattern | ✅ Complete | |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | +## Course Advisor Agent v1 |
| 37 | + |
| 38 | +**Location**: `notebooks/section-4-integrating-tools-and-agents/02_building_course_advisor_agent.ipynb` |
| 39 | + |
| 40 | +**Purpose**: First complete production agent demonstrating LangGraph orchestration, dual-memory architecture, and tool calling. |
| 41 | + |
| 42 | +### Architecture Diagram |
| 43 | + |
| 44 | +See rendered Mermaid diagram: **Course Advisor Agent v1 - Architecture** |
| 45 | + |
| 46 | +### Component Breakdown |
| 47 | + |
| 48 | +#### 1. **LangGraph Orchestration Layer** |
| 49 | + |
| 50 | +**Workflow**: `START → Load Memory → Agent → [Tools → Agent]* → Save Memory → END` |
| 51 | + |
| 52 | +**Nodes** (4 total): |
| 53 | +- `load_memory` - Load working memory from Agent Memory Server |
| 54 | +- `agent` - LLM reasoning with tool calling (OpenAI GPT-4) |
| 55 | +- `tools` - Execute selected tool |
| 56 | +- `save_memory` - Save to working memory, auto-extract to long-term |
| 57 | + |
| 58 | +**Decision Logic**: |
| 59 | +- If agent response contains tool calls → route to `tools` node |
| 60 | +- If no tool calls → route to `save_memory` node |
| 61 | +- Tools always loop back to `agent` for final response |
| 62 | + |
| 63 | +#### 2. **Tool Inventory** (3 Tools) |
| 64 | + |
| 65 | +| Tool | Purpose | Input | Output | |
| 66 | +|------|---------|-------|--------| |
| 67 | +| **search_courses** | Semantic course search using vector embeddings | query: str, limit: int | Course list with descriptions | |
| 68 | +| **search_memories** | Search long-term memory for user facts | query: str, limit: int | Relevant memories | |
| 69 | +| **store_memory** | Save important information to long-term memory | text: str, type: str, topics: List[str] | Confirmation | |
| 70 | + |
| 71 | +#### 3. **Memory Architecture** (Dual-Memory System) |
| 72 | + |
| 73 | +**Working Memory** (Session-scoped): |
| 74 | +- Conversation history for current session |
| 75 | +- Loaded at start of each turn |
| 76 | +- Saved at end of each turn |
| 77 | +- Enables conversation continuity and grounding |
| 78 | + |
| 79 | +**Long-term Memory** (Cross-session): |
| 80 | +- Persistent facts about the student |
| 81 | +- Preferences, goals, completed courses |
| 82 | +- Searchable via semantic vector search |
| 83 | +- Accessible via `search_memories` tool |
| 84 | + |
| 85 | +**Graph State** (Turn-scoped): |
| 86 | +- `messages`: List[BaseMessage] - Conversation messages |
| 87 | +- `student_id`: str - Student identifier |
| 88 | +- `session_id`: str - Session identifier |
| 89 | +- `context`: Dict - Retrieved context from long-term memory |
| 90 | + |
| 91 | +**Memory Extraction**: |
| 92 | +- Strategy: **Discrete** (default) |
| 93 | +- Automatically extracts individual facts from conversations |
| 94 | +- Stores in long-term memory for future retrieval |
| 95 | + |
| 96 | +#### 4. **Data Flow** |
| 97 | + |
| 98 | +``` |
| 99 | +User Query |
| 100 | + ↓ |
| 101 | +Load Working Memory (conversation history) |
| 102 | + ↓ |
| 103 | +Agent Node (LLM reasoning with 3 tools) |
| 104 | + ↓ |
| 105 | +Tool Call? → Yes → Execute Tool → Back to Agent |
| 106 | + → No → Save Memory → Response |
| 107 | +``` |
| 108 | + |
| 109 | +#### 5. **Storage Backends** |
| 110 | + |
| 111 | +- **Redis** (Port 6379): Vector search for courses, memory storage |
| 112 | +- **Agent Memory Server** (Port 8088): Memory management, extraction engine |
| 113 | + |
| 114 | +#### 6. **External Integrations** |
| 115 | + |
| 116 | +- **OpenAI API**: GPT-4 for LLM, embeddings for vector search |
| 117 | +- **CourseManager**: Course search, vector embeddings, course data |
| 118 | + |
| 119 | +### Context Engineering Techniques |
| 120 | + |
| 121 | +- ✅ System context (role, instructions) |
| 122 | +- ✅ User context (student profile) |
| 123 | +- ✅ Conversation context (working memory) |
| 124 | +- ✅ Retrieved context (RAG via search_courses) |
| 125 | +- ✅ Memory-based context (long-term memory) |
| 126 | + |
| 127 | +### Key Learning Objectives |
| 128 | + |
| 129 | +1. Build stateful agents with LangGraph StateGraph |
| 130 | +2. Implement dual-memory architecture (working + long-term) |
| 131 | +3. Create and integrate multiple tools |
| 132 | +4. Design conversation flow control |
| 133 | +5. Integrate Agent Memory Server with LangGraph |
| 134 | + |
| 135 | +--- |
| 136 | + |
| 137 | +## Course Advisor Agent v2 |
| 138 | + |
| 139 | +**Location**: `notebooks/section-4-integrating-tools-and-agents/03_agent_with_memory_compression.ipynb` |
| 140 | + |
| 141 | +**Purpose**: Enhanced version of v1 that adds working memory compression for long conversations. |
| 142 | + |
| 143 | +### Architecture Diagram |
| 144 | + |
| 145 | +See rendered Mermaid diagram: **Course Advisor Agent v2 - Architecture with Memory Compression** |
| 146 | + |
| 147 | +### What's New in v2 |
| 148 | + |
| 149 | +#### 🆕 Memory Compression Layer |
| 150 | + |
| 151 | +**Problem**: Unbounded conversation growth leads to: |
| 152 | +- Token limits exceeded (128K for GPT-4o) |
| 153 | +- High API costs (quadratic growth) |
| 154 | +- Increased latency |
| 155 | +- Context rot (LLMs struggle with very long contexts) |
| 156 | + |
| 157 | +**Solution**: Three compression strategies |
| 158 | + |
| 159 | +### Compression Strategies |
| 160 | + |
| 161 | +#### 1. **Truncation** (Fast, Simple) |
| 162 | + |
| 163 | +**How it works**: Keep only the most recent N messages within token budget |
| 164 | + |
| 165 | +**Pros**: |
| 166 | +- ✅ Fast (no LLM calls) |
| 167 | +- ✅ Predictable token usage |
| 168 | +- ✅ Simple implementation |
| 169 | + |
| 170 | +**Cons**: |
| 171 | +- ❌ Loses all old context |
| 172 | +- ❌ No intelligence in selection |
| 173 | + |
| 174 | +**Use when**: |
| 175 | +- Speed is critical (real-time chat) |
| 176 | +- Recent context is all that matters |
| 177 | +- Cost-sensitive (no LLM calls) |
| 178 | + |
| 179 | +#### 2. **Priority-Based** (Balanced) |
| 180 | + |
| 181 | +**How it works**: Score messages by importance, keep highest-scoring ones |
| 182 | + |
| 183 | +**Scoring factors**: |
| 184 | +- System messages (high priority) |
| 185 | +- Tool calls and results (high priority) |
| 186 | +- User questions (medium priority) |
| 187 | +- Recency (recent messages scored higher) |
| 188 | +- Keywords (domain-specific importance) |
| 189 | + |
| 190 | +**Pros**: |
| 191 | +- ✅ Preserves important context |
| 192 | +- ✅ No LLM calls (fast) |
| 193 | +- ✅ Balanced approach |
| 194 | + |
| 195 | +**Cons**: |
| 196 | +- ❌ Requires good scoring logic |
| 197 | +- ❌ May lose temporal flow |
| 198 | + |
| 199 | +**Use when**: |
| 200 | +- Need balance between speed and quality |
| 201 | +- Important context scattered throughout conversation |
| 202 | +- No LLM calls allowed (cost/latency constraints) |
| 203 | + |
| 204 | +#### 3. **Summarization** (High Quality) |
| 205 | + |
| 206 | +**How it works**: LLM creates intelligent summaries of old messages, keep recent ones |
| 207 | + |
| 208 | +**Process**: |
| 209 | +1. Split conversation into old (to summarize) and recent (to keep) |
| 210 | +2. Format old messages for summarization |
| 211 | +3. LLM generates comprehensive summary |
| 212 | +4. Return summary + recent messages |
| 213 | + |
| 214 | +**Pros**: |
| 215 | +- ✅ Preserves meaning |
| 216 | +- ✅ High quality compression |
| 217 | +- ✅ Intelligent context preservation |
| 218 | + |
| 219 | +**Cons**: |
| 220 | +- ❌ Slower (requires LLM call) |
| 221 | +- ❌ Costs tokens |
| 222 | +- ❌ Additional latency |
| 223 | + |
| 224 | +**Use when**: |
| 225 | +- Quality is critical |
| 226 | +- Long conversations (30+ turns) |
| 227 | +- Can afford LLM call latency |
| 228 | +- Comprehensive context needed |
| 229 | + |
| 230 | +### Enhanced Components |
| 231 | + |
| 232 | +**Graph State** (v2 additions): |
| 233 | +- `compression_stats`: Dict - Compression metrics and statistics |
| 234 | + |
| 235 | +**Agent Memory Server** (v2 configuration): |
| 236 | +- `WINDOW_SIZE` environment variable for auto-compression |
| 237 | +- Automatic compression when threshold exceeded |
| 238 | +- Background processing (async workers) |
| 239 | + |
| 240 | +### Compression Comparison |
| 241 | + |
| 242 | +| Strategy | Messages | Tokens | Savings | Quality | Speed | |
| 243 | +|----------|----------|--------|---------|---------|-------| |
| 244 | +| Original | 60 | 1,500 | 0% | N/A | N/A | |
| 245 | +| Truncation | 20 | 1,000 | 33% | Low | Fast | |
| 246 | +| Priority-Based | 22 | 980 | 35% | Medium | Fast | |
| 247 | +| Summarization | 5 | 850 | 43% | High | Slow | |
| 248 | + |
| 249 | +### Key Learning Objectives |
| 250 | + |
| 251 | +1. Implement working memory compression strategies |
| 252 | +2. Understand token cost management |
| 253 | +3. Apply compression in production agents |
| 254 | +4. Balance context preservation vs token efficiency |
| 255 | + |
| 256 | +--- |
| 257 | + |
| 258 | +## Course Advisor Agent v3 |
| 259 | + |
| 260 | +**Location**: `notebooks/section-4-integrating-tools-and-agents/04_semantic_tool_selection.ipynb` |
| 261 | + |
| 262 | +**Purpose**: Scales agent from 3 to 5 tools using semantic tool selection to reduce token costs. |
| 263 | + |
| 264 | +### Architecture Diagram |
| 265 | + |
| 266 | +See rendered Mermaid diagram: **Course Advisor Agent v3 - Architecture with Semantic Tool Selection** |
| 267 | + |
| 268 | +### What's New in v3 |
| 269 | + |
| 270 | +#### 🆕 Expanded Tool Set (5 Tools) |
| 271 | + |
| 272 | +**New Tools** (2 added): |
| 273 | + |
| 274 | +| Tool | Purpose | Input | Output | |
| 275 | +|------|---------|-------|--------| |
| 276 | +| **check_prerequisites** | Check course requirements | course_id: str | Prerequisites list, eligibility status | |
| 277 | +| **compare_courses** | Compare courses side-by-side | course_ids: List[str] | Comparison table | |
| 278 | + |
| 279 | +**All Tools** (5 total): |
| 280 | +1. search_courses (from v1) |
| 281 | +2. search_memories (from v1) |
| 282 | +3. store_memory (from v1) |
| 283 | +4. check_prerequisites (NEW) |
| 284 | +5. compare_courses (NEW) |
| 285 | + |
| 286 | +#### 🆕 Semantic Tool Selection Layer |
| 287 | + |
| 288 | +**Problem**: Token cost scales linearly with number of tools |
| 289 | +- 3 tools = ~1,200 tokens |
| 290 | +- 5 tools = ~2,200 tokens (83% increase) |
| 291 | +- 10 tools = ~4,000 tokens |
| 292 | +- Sending all tools every time is wasteful |
| 293 | + |
| 294 | +**Solution**: RedisVL Semantic Router for intelligent tool selection |
| 295 | + |
| 296 | +### Semantic Router Architecture |
| 297 | + |
| 298 | +**How it works**: |
| 299 | +1. Define **Routes** for each tool with reference examples |
| 300 | +2. Router automatically creates Redis vector index |
| 301 | +3. Generates embeddings for all route references |
| 302 | +4. For each query, embed query and find top-k most similar routes |
| 303 | +5. Send only selected tools to LLM |
| 304 | + |
| 305 | +**Route Structure**: |
| 306 | +```python |
| 307 | +Route( |
| 308 | + name="check_prerequisites", |
| 309 | + references=[ |
| 310 | + "Check course prerequisites", |
| 311 | + "Verify readiness for a course", |
| 312 | + "Understand course requirements", |
| 313 | + ... |
| 314 | + ], |
| 315 | + metadata={"category": "course_planning"}, |
| 316 | + distance_threshold=0.3 |
| 317 | +) |
| 318 | +``` |
| 319 | + |
| 320 | +**Selection Process**: |
| 321 | +``` |
| 322 | +User Query: "What are the prerequisites for RU202?" |
| 323 | + ↓ |
| 324 | +Embed Query → [0.23, -0.45, 0.67, ...] |
| 325 | + ↓ |
| 326 | +Compare to Route Embeddings: |
| 327 | + check_prerequisites: similarity = 0.92 ✅ |
| 328 | + search_courses: similarity = 0.45 |
| 329 | + compare_courses: similarity = 0.38 |
| 330 | + search_memories: similarity = 0.12 |
| 331 | + store_memory: similarity = 0.08 |
| 332 | + ↓ |
| 333 | +Select Top-k (k=3): |
| 334 | + → check_prerequisites |
| 335 | + → search_courses |
| 336 | + → compare_courses |
| 337 | + ↓ |
| 338 | +Send only these 3 tools to LLM (instead of all 5) |
| 339 | +``` |
| 340 | + |
| 341 | +### Tool Selection Strategies Comparison |
| 342 | + |
| 343 | +| Strategy | Description | Pros | Cons | Use When | |
| 344 | +|----------|-------------|------|------|----------| |
| 345 | +| **Static** | Send all tools | Simple | Wasteful tokens | ≤3 tools | |
| 346 | +| **Pre-filtered** | Keyword matching | Fast | Brittle, manual rules | 4-7 tools | |
| 347 | +| **Semantic** | Embedding similarity | Robust, scalable | Requires embeddings | 8+ tools | |
| 348 | + |
| 349 | +### Token Cost Optimization |
| 350 | + |
| 351 | +**Without Semantic Selection**: |
| 352 | +- 5 tools sent every time = 2,200 tokens per query |
| 353 | +- 100 queries = 220,000 tokens |
| 354 | +- Cost: ~$0.55 (at $0.0025/1K tokens) |
| 355 | + |
| 356 | +**With Semantic Selection** (top-3): |
| 357 | +- 3 tools sent per query = ~1,000 tokens per query |
| 358 | +- 100 queries = 100,000 tokens |
| 359 | +- Cost: ~$0.25 (at $0.0025/1K tokens) |
| 360 | +- **Savings: 55% reduction in token costs** |
| 361 | + |
| 362 | +### Enhanced Components |
| 363 | + |
| 364 | +**Graph State** (v3 additions): |
| 365 | +- `selected_tools`: List - Tools selected by semantic router |
| 366 | +- `tool_selection_method`: str - Selection strategy used |
| 367 | + |
| 368 | +**Redis** (v3 additions): |
| 369 | +- Tool route index for semantic routing |
| 370 | +- Route embeddings stored in Redis |
| 371 | + |
| 372 | +**OpenAI** (v3 usage): |
| 373 | +- Route embeddings generation |
| 374 | +- Query embeddings for routing |
| 375 | + |
| 376 | +### Key Learning Objectives |
| 377 | + |
| 378 | +1. Implement semantic tool selection at scale |
| 379 | +2. Use RedisVL Semantic Router for production routing |
| 380 | +3. Understand token cost scaling with tools |
| 381 | +4. Design scalable tool architectures |
| 382 | + |
| 383 | +--- |
| 384 | + |
| 385 | + |
0 commit comments