|
| 1 | +# Software Engineering Agent |
| 2 | + |
| 3 | +Autonomous coding agent demonstrating the **agent + tools + memory** pattern used by GitHub Copilot, Cursor, and Claude Code. |
| 4 | + |
| 5 | +## Core Concepts |
| 6 | + |
| 7 | +**Agent capabilities emerge from three components:** |
| 8 | +• **Tools**: File operations, code execution, memory, metacognition |
| 9 | +• **Prompts**: Structured workflow encoding best practices |
| 10 | +• **Memory**: Persistent knowledge across sessions |
| 11 | + |
| 12 | +## Engineering Patterns |
| 13 | + |
| 14 | +**Five-phase workflow**: Memory check → Planning → Execution → Learning → Completion |
| 15 | +**Surgical edits**: `str_replace` mode for precise changes |
| 16 | +**Markdown tracking**: Checkboxes in `/memories/current_task.md` |
| 17 | +**Explicit evaluation**: `TaskStatusTool` prevents premature termination |
| 18 | + |
| 19 | +## Quick Start |
| 20 | + |
| 21 | +```bash |
| 22 | +# Set credentials |
| 23 | +export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" |
| 24 | +export AZURE_OPENAI_API_KEY="your-key" |
| 25 | + |
| 26 | +# Run example |
| 27 | +cd examples/agents/swe_agent |
| 28 | +python agent.py |
| 29 | +``` |
| 30 | + |
| 31 | +## What It Does |
| 32 | + |
| 33 | +Runs three tasks showing memory usage: |
| 34 | + |
| 35 | +**Task 1**: Create calculator module + tests |
| 36 | +**Task 2**: Add power function (reuses testing patterns from Task 1) |
| 37 | +**Task 3**: Add documentation (applies learned conventions) |
| 38 | + |
| 39 | +## Agent Architecture |
| 40 | + |
| 41 | +**File Tools**: `read_file`, `write_file` (3 modes), `list_directory`, `grep_search` |
| 42 | +**Execution**: `python_repl`, `bash_execute` |
| 43 | +**Memory**: `view`, `create`, `search`, `append`, `str_replace` |
| 44 | +**Metacognition**: `ThinkTool`, `TaskStatusTool` |
| 45 | + |
| 46 | +**Memory structure**: |
| 47 | +``` |
| 48 | +agent_memory/ |
| 49 | +├── patterns/ # Reusable solutions |
| 50 | +├── decisions/ # Dated decision logs |
| 51 | +├── current_task.md # Active plan with checkboxes |
| 52 | +└── project_context.md # High-level understanding |
| 53 | +``` |
| 54 | + |
| 55 | +## Configuration |
| 56 | + |
| 57 | +**Iteration limits** (in `Agent`): |
| 58 | +- Simple scripts: 10-20 |
| 59 | +- Multi-file projects: 30-50 (default) |
| 60 | +- Complex refactoring: 50-100 |
| 61 | + |
| 62 | +**Bash timeout** (in `create_coding_tools`): |
| 63 | +- 30s: Quick tests |
| 64 | +- 60s: Test suites (default) |
| 65 | +- 120s+: Large builds |
| 66 | + |
| 67 | +## Files |
| 68 | + |
| 69 | +- `agent.py` - Main agent setup and example tasks |
| 70 | +- `scratch/agent_workspace/` - Generated code (isolated) |
| 71 | +- `scratch/agent_memory/` - Persistent memory |
| 72 | + |
| 73 | +## Key Insights |
| 74 | + |
| 75 | +**Tools are prerequisites, not guarantees.** Quality depends on LLM capabilities, prompt guidance, and feedback from execution. |
| 76 | + |
| 77 | +**Prompts are software.** They encode workflows, best practices, and completion criteria. Iterate and test them. |
| 78 | + |
| 79 | +**Memory enables learning.** Patterns accumulate, mistakes are recorded, decisions are justified. Agent improves over time. |
| 80 | + |
| 81 | +**Completion needs explicit criteria.** Clear requirements + `TaskStatusTool` + tests = reliable task completion. |
0 commit comments