Replies: 1 comment
-
This issue hits the core of what makes many RAG pipelines feel broken in real-world use: We encountered the same pain point and solved it using a hybrid approach: We lock chunk order based on original document flow, unless a clear semantic cross-reference demands reordering (handled by a reasoning engine). Then apply a drunk mode vector pass, which doesn’t just search by relevance, but reconstructs narrative structure as a whole (by modeling a ΔS = 0.5 alignment layer between meaning and position). This avoids the classic “I asked for a summary, but got scattered trivia” trap. preparing for GPT-5 benchmark. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Instead of ordering by relevance. This will make requests like "summarization" possible, despite the claims that RAG is not suited for overarching queries.
Consider the following: if you set the chunk size X count high enough that they cover a signification portion of the original document (like a half), then the relevance by which the vector search selects those chunks becomes less important - the model will receive a large enough portion of the text anyway to create its summary or answer similar questions. The only thing that currently makes this unfeasible is that the order of these chunks become jumbled by that "relevance", making it impossible to ask questions that involve chronology of the document's events, for example.
You could say that at this point I might as well just feed the model the entire text as a file, but nope, even halving the text saves a significant amount of tokens, not to mention that localdocs are tokenized once, while uploading a document directly to chat requires doing that every time.
I'm sure this can't be too difficult to just skip the relevance ordering. If this is low priority, can you at least point me to a place in code where I could try to tweak it locally?
Beta Was this translation helpful? Give feedback.
All reactions