Towards AI’s Agent Architecture Cheat Sheet - Decision-Ready
By Louis-Francois Bouchard (LinkedIn, X, YouTube, other AI resources here)
Goal: decide Workflow vs Single Agent vs Multi-Agent from real constraints, then implement reliably.
Pro Tip: Check the full workflow vs. agents Webinar for free here: https://www.youtube.com/watch?v=iOpLKJYOvXs
(and subscribe to the channel to learn more! 😉)
Workflow -> Single Agent + Tools -> Multi-Agent
Each step right increases cost (~4× to ~15× more tokens), latency, and debugging complexity. Stay as far left as you can while solving the problem.
| The Autonomy Test • You control steps + order -> Workflow • Model decides what to do next (tool choice + next action) -> Agent |
|---|
| • Tools are capabilities (API calls, DB queries, validators, browsers). • Single LLM calls and workflows can use user defined tools following specific (“hard-coded”) conditions. • An agent is the decision-maker that chooses tools and decides the next step. • One model calling 10 APIs is one agent with 10 tools, not multi-agent. |
|---|
- Steps are known and stable (same order most runs).
- Predictability matters: easy unit tests per step, clear traces, deterministic gates.
- Cost/latency matters: fewer thinking loops, fewer tool decisions.
| Default recommendation: If you can define the steps upfront, build a workflow first. Agents add overhead without adding capability. |
|---|
Example: A support ticket pipeline where every request goes through classify → route → draft → policy check → send, in the same order every time.
- Tightly coupled tasks: steps are interdependent and mostly sequential (global context matters end-to-end).
- Path changes based on findings: retries, fallbacks, clarification, partial data.
- Tool load:
- ≤10–20 preloaded tools → reliable selection
- >20 tools → selection degrades unless tools are dynamically discovered (e.g. file-based catalogs (e.g. Anthropic “Skills”, Cursor’s dynamic context discovery with MCP), tool RAG, late MCP connection)
- Low overhead desired: no inter-agent messaging/orchestration; fewer moving parts.
Example: An agent that writes code, runs it, inspects errors, fixes the code, and retries until tests pass.
- True parallelism: independent subtasks must run simultaneously for throughput/speed.
- Context/tool overload: >200K tokens or >20 preloaded tools → first try dynamic tool discovery; split by domain only if reasoning must be isolated.
- Distinct competencies: phases need fundamentally different reasoning modes (e.g., exploratory research vs deterministic writing).
- Modularity/external integration: integrate third-party agent systems or reusable components.
- Hard separation: security boundaries, compliance isolation, sensitive data handling.
Example: One agent independently researches a topic in depth, while another agent focuses solely on writing a structured technical article from the research agent's output.
Prefer orchestrator -> worker or sequential handoff with explicit artifacts/contracts.
Avoid everyone talking to everyone (information silos + coordination failures).
When Multi-Agent Does NOT Fit
- Tasks requiring all agents to share the same context
- Many dependencies between agents requiring real-time coordination
- Most coding tasks (fewer parallelizable subtasks than research)
- Thin agent, heavy tools: agent plans/decides; tools execute mechanics + deterministic constraints.
- Validation loops: generate -> validate -> fix with actionable feedback (hard checks first, then “reflection (similar to LLM-as-judge used in evals)).
- Observability is mandatory: trace prompts, tools, token costs, outputs; auto-evals.
- Having a human-in-the-loop (HITL) is a design choice: place human checkpoints on key artifacts (plans, research notes, drafts, before irreversible actions).
- Clarify scope: deliverable (prototype vs production vs handoff), demo cadence, documentation, who maintains it.
- Map task shape: sequential vs branching (parallel executions); exploratory (flexible) vs deterministic (fixed); single domain vs multiple domains (e.g. requiring various levels of reasoning).
- Pick the minimum architecture: workflow first, then single agent, multi-agent only for true constraints.
- Design tools: group by domain; enforce deterministic rules in code; return structured outputs + actionable errors.
- Choose orchestration: a simple loop if the task is stateless; a graph or framework only if you need to persist state, branch execution paths, or pause and resume long-running work (e.g., long workflows, human-in-the-loop pauses, or resumable failures).
- Pick models by step difficulty: strong models for planning/judgment; cheaper models for narrow execution/cleanup.
- Document decisions: record what you chose and why (prevents rework when stakeholders change).
| Shortcut: If you are uncertain, start with a workflow or a single agent with a small toolset; for exploratory phases, AI frameworks are great—then instrument and only move right once traces show a real production constraint. |
|---|
| Q | Question | Answer patterns / signals | Build this |
|---|---|---|---|
| 1 | Task shape: sequential vs branching? | Sequential/linear steps -> stable order.��Branching/looping -> conditional paths, retries, uncertainty. | Sequential or simple branching, looping and conditional paths -> Workflow (pipeline or graph). �Advanced branching and conditions -> Agent. |
| 2 | Reasoning: exploratory or deterministic? | Deterministic: rules/constraints; known checks; strict formats.��Exploratory: search/pivot, unknown next step. | Deterministic -> Workflow + tools. �Exploratory -> Agent + tools (single or multi). |
| 3 | How many tool types? (tool-load) | ≤5-10 tools -> easy selection.��**~10-20** -> manageable if well scoped.��**>20** heterogeneous tools -> selection degrades. | ≤10-20 -> Single agent + tools.�>20 -> Split by domain (multi-agent or separate MCP/tool servers). |
| 4 | Can it decompose cleanly into distinct competencies? | Yes: phases with different reasoning modes (e.g., research vs writing).��No: tightly coupled/global context. | Yes -> Multi-agent with explicit handoffs. �No -> Single agent or workflow. |
| Q | Question | If you answer... | Do this |
|---|---|---|---|
| 5 | Need internal/proprietary data? (RAG decision) | Yes: external data needed at generation time.��No: APIs/tools + model knowledge suffice. | Yes -> Build retrieval or MCP tools over internal sources. �No -> Keep it simple. |
| 6 | Need persistent state? (memory decision) | Yes: session memory, histories, intermediate artifacts must survive across steps or runs. No: all state fits in the prompt or request lifecycle. | Yes -> Use external state (DB, vector store, object storage). No -> Stateless prompts or in-memory context. |
| 7 | Need durable execution? (workflow/framework decision) | Yes: long-running jobs, pause/resume, retries, branching recovery, human-in-the-loop. No: short-lived, synchronous execution. | Yes -> LangGraph (or durable workflow engine + persisted state). No -> Simple loop or script. |
| 8 | Do outputs require quality gates? (validation loops) | Hard constraints (length/syntax/fields) or high-stakes correctness. | Add generate-validate-fix loops; hard checks first; LLM judge second; retry limits. |
| 9 | How much HITL is needed? (approval steps) Note: HITL is more often linked to UI/UX process design, not AI specifically! | High HITL: humans must approve plans/research/drafts before proceeding. | High -> insert review checkpoints + editing UI. �Low -> automate more; fewer gates. |
| 10 | Latency tolerance? | Tight latency: real-time UX.��Loose latency: offline/batch acceptable. | Tight -> fewer hops, smaller models, more workflows, batching, caching. �Loose -> deeper loops, more evaluation, multi-agent if needed. |
| 11 | What eval data exists? | Have labeled examples vs none. | Have evals -> build automated eval harness + compare prompts/models. �No -> human ratings (optionally bootstrap with synthetic data if you have seed labeled examples) before over-engineering. |
| 12 | How will you do observability? | Need traces of prompts, tools, costs, outputs; debugging via timelines. | Use an observability tool (e.g., Opik) or structured logs per run; no observability = flying blind. |
| 13 | Budget per task? | Low budget vs higher budget for quality. | Low -> cheaper models, fewer tool calls, caching, workflows. �Higher -> stronger reasoning models + more validation/reflection. |
This decision framework (and the engineering patterns behind it) is exactly what we teach in the Towards AI courses.
Learn more in our free Agents webinar: https://www.youtube.com/watch?v=iOpLKJYOvXs
Learn more in our full course: academy.towardsai.net/courses/beginner-to-advanced-llm-dev?ref=1f9b29