Add max_chunk_length
to SemanticChunker
.
#18802
Replies: 2 comments 1 reply
-
Exactly the issue i'm having... as soon as you try to index enough documents, the SemanticChunker will produce contexts that exceed the LLM window:
I guess for now I just need to use a different LLM with bigger context window. |
Beta Was this translation helpful? Give feedback.
-
After facing the same problem, one solution could be to use the parameter |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked
Feature request
Add a way to define the max length of the chunks produced by the SemanticChunker.
Motivation
I split a huge document into Chunks using the
SemanticChunker
and then, made some queries in my program that uses OpenAI API and documents stored in the database to generate a prompt and got the following error message.This is because the chunks selected from the database were too long (probably because the author of the text had too much to say about this topic). So, defining a max chunk length would help to prevent that.
Proposal (If applicable)
Right now, I am solving it with this subclass. It does the same as the original
SemanticChunker
but, in the end, it splits each chunk longer than themax_chunk_length
into sentences. Then, it combines sentences to make a chunk as close as possible tomax_chunk_length
w/out exceeding it. When the sentence is about to make the chunk longer thanmax_chunk_length
, it starts a new chunk and combines the sentences in the new chunk.It is probably not a bad strategy, but maybe it is not the best way to solve it. Probably the best option would be something that splits the chunks longer than
max_chunk_length
using some criteria more related to the meaning of the text. some recursive call of the SemantincChunker, maybe?Beta Was this translation helpful? Give feedback.
All reactions