Skip to content
Discussion options

You must be logged in to vote

Hi Gilson, great question — I’ve been through a very similar struggle.

I also started with LangChain’s RecursiveCharacterTextSplitter, but noticed the same issue you mentioned — it often cuts sentences awkwardly and breaks the semantic flow.

Eventually, I moved toward a different approach: instead of cutting based on tokens or character counts, I tried to segment based on semantic tension — basically aiming to keep each chunk internally coherent in meaning. This allows:

Longer chunks with dense, focused meaning (especially useful for contracts, whitepapers, or scientific texts)

Chunks that can be reused across different tasks without losing context

Dynamic overlap depending on the meaning…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by gilsonfiho
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants