Duplicate Page content seen with different metadata as a result of space text splitter in linux #12959
Unanswered
swathithiyan
asked this question in
Help: Other Questions
Replies: 1 comment
-
Hi! When creating a new issue or discussion topic, it would be beneficial to paste the code you ran (a minimal reproducable example) and to paste examples instead of taking screenshots, which are difficult for us to process. That said, I assume you're using Langchain's SpacyTextSplitter which has not been created by us. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi
We are using spacy text_splitter to chunk a document and write into chroma DB.

What we see that , post chunking(docs = text_splitter.split_documents(documents)), we are seeing duplicate page contents with different metadata, refer the screenshot, key is the duplicate page content , value is the number of times , its repeated and list of metadata.
what's strange is that , issue is happening in linux OS, in MAC OS it works perfectly.
spacy version used is 3.5.1.
can you pls help here.
Beta Was this translation helpful? Give feedback.
All reactions