Efficient Filtering Strategy in Zilliz When User Selects Large Dynamic File Sets #47136
Replies: 3 comments 10 replies
-
|
usually we recommend people to directly to store their meta into milvus, not other system so filter can be much easier. if there is a reason you must do so, try filtering template, might be a little bit faster on expression parse. |
Beta Was this translation helpful? Give feedback.
-
|
You can use filter template to pass the 200,000 ids The "filter_params" passes the values by list instead of string-format, is much efficient than string expression. For one million integer ids, the size is 8bytes * 1M = 8MB, is much smaller than the grpc size limit. |
Beta Was this translation helpful? Give feedback.
-
|
Hi Again @yhmo ,we are trying this code but it looks like it is not filtering and giving this error message ""[code, 1100]" |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
We are facing a design challenge related to filtering vector search results in Zilliz at scale.
The issue is not how to send data to Zilliz, but rather how to efficiently restrict search results to a large, dynamic subset of files selected by the user at query time.
Scenario
Total files indexed in Zilliz: ~1,000,000
Each file has one or more embeddings stored in Zilliz
A user:
Has access to all 1M files
Can dynamically select up to 200,000 files per search
The selection can change frequently between searches
Current Limitation
We cannot use a filter like:
fileId IN (1,2,3,...)
because:
Filter expressions have size limits
Performance degrades significantly with very large IN lists
Core Question
What is the recommended or best-practice approach in Zilliz to perform vector search only within a large, dynamic user-selected subset of entities (e.g. 200k out of 1M), without passing large ID lists or frequently updating vector metadata?
Beta Was this translation helpful? Give feedback.
All reactions