Very unclear as to how and when are system prompts and tool prompts applied in LLM-based agents #242
-
|
Using Lovable as an example (but applicable to most agent architectures that are shared in this repo): when an LLM is given a detailed system prompt that describes its behavior and capabilities, what does the model actually produce or “return” when that prompt is executed? Is the system prompt converted into a sequence of instructions that are passed forward as additional prompts, or is it applied internally by the model at runtime? Where and when is the tools prompt (the description/spec of available tools) used: injected into the system prompt, supplied separately, or provided later at tool-invocation time? If the tools prompt / list is merged into the system prompt, how do systems avoid hallucination given the combined length and complexity of system + tools instructions? I would have imagined that such lengthy prompt would make the LLM run out of context. I would love if someone could describe, step-by-step, the process an LLM or an agent chain follows, from initial input + system prompt to the decision to call a tool and the actual tool invocation/output, including the typical formats (role-based messages, structured tool calls, etc.) assumed by modern agent frameworks. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
A system prompt isn’t “executed” like code; it’s just tokens the model reads on every forward pass. Those tokens condition the next-token distribution alongside the conversation history and any retrieved context, so the behavioral constraints in the system prompt influence every decision the model makes during that turn (and subsequent tool loops). There’s no hidden expansion step where the prompt is decomposed into a plan; the “plan” is whatever sequence the model emits under those conditions. When an LLM is given access to tools, the runtime has to tell the model which tools exist and how to use them. Some APIs do this through a structured “tools” parameter (name, description, JSON schema for arguments), while others just paste the same information as plain text in the system prompt. Either way, the model only sees these as tokens in its context, and it conditions on them when generating output. Because the model has no built-in memory of available tools, the tool registry must be included in every forward pass where tool use is allowed. That’s why frameworks consistently re-attach the list of tools (or a filtered subset) on each call, so the model can decide in that moment whether to answer in text or emit a structured tool call. When the model decides to act, it doesn’t “run” the tool itself; it emits a structured call (tool name + JSON args conforming to the provided schema) instead of plain text. The host runtime validates those args, executes the tool, and appends the result back into the conversation as an observation message. The model gets a fresh pass over the updated context (system prompt, tool registry, history, user input, and now the tool output) and either calls another tool or produces the final natural-language response. That loop repeats until a stop condition is met. On hallucinations and long prompts: the most effective mitigations are interface-level, not more verbiage. Keep the registry tight per turn, enforce schema validation with repair/retry, disallow unknown tool names, and use modest temperatures. Summarize or trim prior turns to keep the active window lean; modern contexts are large but not infinite, and routing matters more than throwing everything in. So the timeline is simple: assemble context (system instructions + tool specs + history + user input), run a pass, possibly emit a structured tool call, execute it, fold the observation back in, and iterate. The system prompt and the tool specs are just persistent conditioning tokens present in every relevant pass, there’s no separate phase where one is “applied” and the other happens later. |
Beta Was this translation helpful? Give feedback.
A system prompt isn’t “executed” like code; it’s just tokens the model reads on every forward pass. Those tokens condition the next-token distribution alongside the conversation history and any retrieved context, so the behavioral constraints in the system prompt influence every decision the model makes during that turn (and subsequent tool loops). There’s no hidden expansion step where the prompt is decomposed into a plan; the “plan” is whatever sequence the model emits under those conditions.
When an LLM is given access to tools, the runtime has to tell the model which tools exist and how to use them. Some APIs do this through a structured “tools” parameter (name, description, JSON sche…