work around json or grpc limit on upsert with large mutlivector #7213
-
|
Hi everyone, I'm using upsert like this: I also tried upload_points(). Would it be better to move away from multivector and have the vectors independently with my external document id in the payload and the qdrant id auto generated? The final scale should be 2-3 digit millions of documents with dozens to thousands of vectors per external document. So potentially a couple billion vectors. Thanks Grumpy |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Yes, that's what we normally recommend. Multivectors also have limited scalability in terms of indexing performance. Such large multi vectors will definitely become a problem, which is why we currently have a hard cap of 1MB.
Qdrant is built with this kind of scale in mind. Please create an appropriate payload index to still allow performant searches. Vector data is large, so when you're talking about billions of vectors you'll likely need large machines and a number of nodes. It is hard for me to give a proper estimate here. I'd recommend to do a rough calculation and do a benchmark of your own to see if you meet your requirements. At this scale you're likely looking for binary quantization, assuming it achieves good enough accuracy with your dataset. |
Beta Was this translation helpful? Give feedback.
Yes, that's what we normally recommend.
Multivectors also have limited scalability in terms of indexing performance. Such large multi vectors will definitely become a problem, which is why we currently have a hard cap of 1MB.
Qdrant is built with this kind of scale in mind. Please create an appropriate payload index to still allow performant searches.
Vector data is large, so when you're talking about billions of vectors …