-
-
Notifications
You must be signed in to change notification settings - Fork 50
[Store] Add a way to store document content when using Chroma DB #288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Store] Add a way to store document content when using Chroma DB #288
Conversation
IIRC we had something like this in the past, did we @chr-hertel ? Other than that, it could be a good idea I guess. |
Other than that, with #57 I implemented a So, my theory here: The Idea
(instead a simple ID we could also think of a more sophisticated refernece object or sth, but i'd be lazy on that) Now |
Its the original content, the string from which the embedding vectors are generated.
Looking at #57, I can see that the The I think that what concerns me a bit is that there are now kind-of standardized metadata fields getting introduced, on which the individual store implementations need to start to rely on if they want to implement certain feature (for example, take the VectorDocument->metadata->text and use it to generate documents when batch storing data into Chroma DB). The
Then those should probably be enumerated in some Alternative would be to have these (or some of these) as actual fields on the
And have them more strictly typed in this way, keeping the I am not sure which approach is better overall, but |
@chr-hertel Should I try to go for adding of |
So there's actually a third alternative, that might combine both worlds: I think that's currently my favorite after looking at this - we need to think about other stores as well - what do you think about that? |
Sounds good! 👍 So |
Yes, let's go with text - it relates more to |
This is just a draft for now to explain what I am trying to achieve - I would like to be able to store the original document content in the Chroma DB records, but the
VectorDocument
currently has no way to pass that in, other than the metadata. I feel like this requirement will not be specific just to Chroma DB, but various databases used for embeddings will support storing of the original document content along with the vectors.Any suggestions on how to approach this holistically? Should the
VectorDocument
be expanded with optional field like?string $content = null
?Additionally (see https://symfony.com/releases):
-->