Skip to content
Discussion options

You must be logged in to vote

Hi @JaisVJ , when you set id_hash_keys=['content','meta'], Haystack will create a hash of the content and everything that is stored in meta, which I believe works for use case, no? Does selecting a subset of the data stored in meta make a difference for you?

If you have two documents and they have the exact same content in meta (and content), then they are duplicates. If the two documents differ in some values stored in meta, then they are no duplicates. Or, in your use case, could it be that two documents differ in some of their metadata values but they should still be treated as duplicates?

Right now, id_hash_keys can only be a subset of [content, content_type, id, score, meta, embedding]

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@jais001
Comment options

Answer selected by jais001
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment