Do LLMs actually want to be retrieved? Or are we just forcing them to fake memory? #19539
Unanswered
onestardao
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey folks — I’ve been spending a lot of time trying to get RAG stacks to feel... well, natural.
But the more I build, the more it feels like I’m forcing the model to pretend it remembers stuff — when in reality it never asked to remember anything at all.
Like, we're injecting these retrieval chunks mid-convo, praying they make sense...
but it often just feels like it’s hallucinating politely.
“Thank you for the irrelevant context, I’ll now proceed to make up something nice about it.”
Is this an alignment issue? Or just the wrong retrieval paradigm?
I get that retrieval is powerful.
But what if the whole “indexed chunk + vector store” model is fundamentally misaligned with how LLMs process flow?
Are there alternatives being explored here — or ways to make it feel more like real cognition and less like context spam?
Would love to hear from folks actually shipping things with LlamaIndex — what pain have you run into?
And is it just me, or do the elegant demos start to fall apart when you push it past toy scale?
No links, no plug, no pitch — just trying to think through the shape of the problem.
Beta Was this translation helpful? Give feedback.
All reactions