Do LLMs actually want to be retrieved? Or are we just forcing them to fake memory? #19539

onestardao · 2025-07-27T14:29:51Z

onestardao
Jul 27, 2025

Hey folks — I’ve been spending a lot of time trying to get RAG stacks to feel... well, natural.

But the more I build, the more it feels like I’m forcing the model to pretend it remembers stuff — when in reality it never asked to remember anything at all.

Like, we're injecting these retrieval chunks mid-convo, praying they make sense...
but it often just feels like it’s hallucinating politely.
“Thank you for the irrelevant context, I’ll now proceed to make up something nice about it.”

Is this an alignment issue? Or just the wrong retrieval paradigm?

I get that retrieval is powerful.
But what if the whole “indexed chunk + vector store” model is fundamentally misaligned with how LLMs process flow?

Are there alternatives being explored here — or ways to make it feel more like real cognition and less like context spam?

Would love to hear from folks actually shipping things with LlamaIndex — what pain have you run into?
And is it just me, or do the elegant demos start to fall apart when you push it past toy scale?

No links, no plug, no pitch — just trying to think through the shape of the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do LLMs actually want to be retrieved? Or are we just forcing them to fake memory? #19539

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Do LLMs actually want to be retrieved? Or are we just forcing them to fake memory? #19539

Uh oh!

onestardao Jul 27, 2025

Replies: 0 comments

onestardao
Jul 27, 2025