Very unclear as to how and when are system prompts and tool prompts applied in LLM-based agents #242

Matteo-Barberis · 2025-09-21T00:32:26Z

Matteo-Barberis
Sep 21, 2025

Using Lovable as an example (but applicable to most agent architectures that are shared in this repo): when an LLM is given a detailed system prompt that describes its behavior and capabilities, what does the model actually produce or “return” when that prompt is executed?

Is the system prompt converted into a sequence of instructions that are passed forward as additional prompts, or is it applied internally by the model at runtime?

Where and when is the tools prompt (the description/spec of available tools) used: injected into the system prompt, supplied separately, or provided later at tool-invocation time?

If the tools prompt / list is merged into the system prompt, how do systems avoid hallucination given the combined length and complexity of system + tools instructions? I would have imagined that such lengthy prompt would make the LLM run out of context.

I would love if someone could describe, step-by-step, the process an LLM or an agent chain follows, from initial input + system prompt to the decision to call a tool and the actual tool invocation/output, including the typical formats (role-based messages, structured tool calls, etc.) assumed by modern agent frameworks.

Answered by x1xhlol

Sep 21, 2025

A system prompt isn’t “executed” like code; it’s just tokens the model reads on every forward pass. Those tokens condition the next-token distribution alongside the conversation history and any retrieved context, so the behavioral constraints in the system prompt influence every decision the model makes during that turn (and subsequent tool loops). There’s no hidden expansion step where the prompt is decomposed into a plan; the “plan” is whatever sequence the model emits under those conditions.

When an LLM is given access to tools, the runtime has to tell the model which tools exist and how to use them. Some APIs do this through a structured “tools” parameter (name, description, JSON sche…

View full answer

x1xhlol · 2025-09-21T12:20:29Z

x1xhlol
Sep 21, 2025
Maintainer

A system prompt isn’t “executed” like code; it’s just tokens the model reads on every forward pass. Those tokens condition the next-token distribution alongside the conversation history and any retrieved context, so the behavioral constraints in the system prompt influence every decision the model makes during that turn (and subsequent tool loops). There’s no hidden expansion step where the prompt is decomposed into a plan; the “plan” is whatever sequence the model emits under those conditions.

When an LLM is given access to tools, the runtime has to tell the model which tools exist and how to use them. Some APIs do this through a structured “tools” parameter (name, description, JSON schema for arguments), while others just paste the same information as plain text in the system prompt. Either way, the model only sees these as tokens in its context, and it conditions on them when generating output. Because the model has no built-in memory of available tools, the tool registry must be included in every forward pass where tool use is allowed. That’s why frameworks consistently re-attach the list of tools (or a filtered subset) on each call, so the model can decide in that moment whether to answer in text or emit a structured tool call.

When the model decides to act, it doesn’t “run” the tool itself; it emits a structured call (tool name + JSON args conforming to the provided schema) instead of plain text. The host runtime validates those args, executes the tool, and appends the result back into the conversation as an observation message. The model gets a fresh pass over the updated context (system prompt, tool registry, history, user input, and now the tool output) and either calls another tool or produces the final natural-language response. That loop repeats until a stop condition is met.

On hallucinations and long prompts: the most effective mitigations are interface-level, not more verbiage. Keep the registry tight per turn, enforce schema validation with repair/retry, disallow unknown tool names, and use modest temperatures. Summarize or trim prior turns to keep the active window lean; modern contexts are large but not infinite, and routing matters more than throwing everything in.

So the timeline is simple: assemble context (system instructions + tool specs + history + user input), run a pass, possibly emit a structured tool call, execute it, fold the observation back in, and iterate. The system prompt and the tool specs are just persistent conditioning tokens present in every relevant pass, there’s no separate phase where one is “applied” and the other happens later.

1 reply

Matteo-Barberis Sep 25, 2025
Author

That was extremely informative, thank you for the detailed explanation.

I do still have a few doubts, and I’d appreciate your help in clearing them up. Let me use Lovable.dev as an example to make my questions concrete:

From what I understand, the model’s response can contain a mix of tool calls and explanatory text. Is that correct? I don’t see anything in the prompt that explicitly restricts it to tool calls only.
Which is why I'm thinking there’s some sort of parser involved. In the Lovable prompt, there’s a line saying: “NEVER make sequential tool calls that could be combined”. But there’s no explicit instruction on how to format that difference. Does this mean that multiple tool calls in the same output are treated as “bulk,” while a single call is treated as “sequential”? If so, what prevents the model from producing something ambiguous like: *“okay, run these two together, and then run this one after”? I may not be explaining this perfectly, but I hope the question is clear.
How does Lovable ensure that the tool-calling syntax stays consistent? Is it just feedback loop after feedback loop until we get the right answer?
As for the agent loop, Is the process literally just: the full reply gets passed back into the model along with the system prompt, until the model eventually stops producing tool calls? At that point, does Lovable detect this with a simple condition and return the final response to the user?
Regarding Agent Tools: could these, in theory, also include calls to another LLM, or are they limited to plain code-based tools?
In most agent prompts I’ve seen (e.g., Lovable), it seems that many variables representing context, the last user message, etc., are not included in the prompt. If these variables aren’t injected there, could you clarify where and how they are injected? Or have they simply been omitted altogether?

Am I missing something here?

Would love to hear your thoughts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Very unclear as to how and when are system prompts and tool prompts applied in LLM-based agents #242

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Very unclear as to how and when are system prompts and tool prompts applied in LLM-based agents #242

Uh oh!

Uh oh!

Matteo-Barberis Sep 21, 2025

Replies: 1 comment · 1 reply

Uh oh!

x1xhlol Sep 21, 2025 Maintainer

Uh oh!

Uh oh!

Matteo-Barberis Sep 25, 2025 Author

Matteo-Barberis
Sep 21, 2025

Replies: 1 comment 1 reply

x1xhlol
Sep 21, 2025
Maintainer

Matteo-Barberis Sep 25, 2025
Author