RAG Agent stop tokens for multimode behaviour? #8411

jboero · 2024-07-10T14:31:31Z

jboero
Jul 10, 2024

Hi has anyone put thought into handling different stop tokens for multimode models and RAG agent processes? It looks like there is already a framework for is_control_token() but not much is being done with it. This only does a continue instead of handling different types of control tokens.
https://github.com/ggerganov/llama.cpp/blob/6b2a849d1f43d46b82d2f9c08c3275137b528784/src/llama.cpp#L15732

For example:

<URL>: pause completion and fetch the URL into context before continuing.
<SEARCH>: pause completion and attempt a web search.
<CALC>: pause completion to let math be calcluated via something like bc.
<COMPILE>: pause completion to try compiling code identified in Markdown tags.
<SCRIPT>: pause completion to run a script or containerized process.
<IMG>: pause completion to generate an image (if supported by model).

I see a lot more models recently supporting agents or external tooling. Arcee for example.
https://www.marktechpost.com/2024/07/06/arcee-ai-introduces-arcee-agent-a-cutting-edge-7b-parameter-language-model-specifically-designed-for-function-calling-and-tool-use/

Obviously with security implications this should be disabled by default and proper warnings to admins but the possibilities are interesting and could support more GPT4o-like behaviour.

onestardao · 2025-07-30T12:37:54Z

onestardao
Jul 30, 2025

This is a fascinating direction — what you're describing touches multiple fragile zones in multi-agent RAG setups:

When a control token (like <CALC>, <IMG>, etc.) enters, the model often either keeps going naively, or stalls entirely, depending on how it's injected and how attention is routed.
If logic isn’t modularized (per-token), you risk downstream effects — where one malformed multimodal pause breaks subsequent agents or misguides the reasoning path.

From experience, I’ve seen these fall into what I call:

No.6 Logic Collapse — agent handoffs break internal reasoning state
No.13 Multi-Agent Chaos — overlapping intentions, no proper fallback or protocol
(Sometimes) No.10 Creative Freeze — model halts instead of bluffing, but looks inert

I’ve been exploring ways to counter this with a layered attention protocol — kind of like a “Drunk Transformer” approach: each head gets its own identity, entropy push, and illegal route suppression.

Happy to share more if you’re diving deeper into this zone.

1 reply

jboero Jul 30, 2025
Author

Yes this makes good sense. Not looking for AGI where universal agents are developed automatically on the fly. Just basic things like or to break out of the transformer for math or other models. I think the GGML base has moved ahead on this a bit since I filed the conversation a year ago. There's already image support and a few agent projects growing already.

onestardao · 2025-07-30T16:11:53Z

onestardao
Jul 30, 2025

Love the direction — especially the image routing part. If you’re already seeing multiple agent projects growing, you might find this full failure-mode breakdown map useful:

🔗 https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md

Covers stuff like Logic Collapse, Multi-Agent Chaos, Creative Freeze, and a few weird edge cases where control tokens silently kill reasoning flow.

Would be super curious to hear what modes you’re hitting most often in your builds. I’m actively tuning the fallback recovery patterns (esp. for vision agent chains).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RAG Agent stop tokens for multimode behaviour? #8411

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

RAG Agent stop tokens for multimode behaviour? #8411

Uh oh!

Uh oh!

jboero Jul 10, 2024

Replies: 2 comments · 1 reply

Uh oh!

onestardao Jul 30, 2025

Uh oh!

jboero Jul 30, 2025 Author

Uh oh!

onestardao Jul 30, 2025

jboero
Jul 10, 2024

Replies: 2 comments 1 reply

onestardao
Jul 30, 2025

jboero Jul 30, 2025
Author

onestardao
Jul 30, 2025