feat: enhance search functionality with ID support#499
Conversation
- Updated SearchReq and SearchSimpleReq interfaces to include optional `ids` field for searching by primary keys. - Modified buildSearchRequest function to handle ID-based searches, including validation for primary key types. - Added tests to validate search functionality using IDs, ensuring correct error handling for type mismatches and missing primary keys. Signed-off-by: ryjiang <jiangruiyi@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shanghaikid The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello @shanghaikid, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the search capabilities by introducing the ability to perform searches directly using primary keys. This allows users to retrieve specific data points by their unique identifiers, complementing the existing vector-based search methods. The changes ensure data integrity through robust validation of ID types against the collection's primary key schema, making the search API more versatile and user-friendly. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable enhancement by enabling search functionality using primary key IDs. The implementation is mostly solid, with good test coverage for the new logic. However, I've identified a critical bug concerning the handling of Int64 IDs that could lead to runtime errors. Additionally, I've included several suggestions to improve code quality and the developer experience by simplifying logic, reducing redundancy, and refining the API design to make anns_field optional for ID-based searches. Addressing these points will make the new feature more robust and easier to use.
| if (ids && ids.length > 0) { | ||
| const pkDataType = pkField!.dataType || DataTypeMap[pkField!.data_type]; | ||
| if (pkDataType === DataType.Int64) { | ||
| request.ids = { int_id: { data: ids as number[] } }; |
There was a problem hiding this comment.
There's a critical type mismatch here. The validation for Int64 primary keys allows ids to be an array of strings representing numbers. However, this line performs a direct cast ids as number[] without converting the string values to numbers. This will cause a runtime error or send incorrect data to Milvus if ids contains strings. You must map the values to numbers before assigning them.
| request.ids = { int_id: { data: ids as number[] } }; | |
| request.ids = { int_id: { data: (ids as (string | number)[]).map(Number) } }; |
milvus/utils/Search.ts
Outdated
| let pkField: FieldSchema | undefined; | ||
| for (let i = 0; i < collectionInfo.schema.fields.length; i++) { | ||
| const f = collectionInfo.schema.fields[i]; | ||
| if (f.is_primary_key) { | ||
| pkField = f; | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
This for loop to find the primary key field can be made more concise and idiomatic by using the Array.prototype.find() method.
| let pkField: FieldSchema | undefined; | |
| for (let i = 0; i < collectionInfo.schema.fields.length; i++) { | |
| const f = collectionInfo.schema.fields[i]; | |
| if (f.is_primary_key) { | |
| pkField = f; | |
| break; | |
| } | |
| } | |
| const pkField = collectionInfo.schema.fields.find(f => f.is_primary_key); |
milvus/utils/Search.ts
Outdated
| }; | ||
|
|
||
| if (ids && ids.length > 0) { | ||
| const pkDataType = pkField!.dataType || DataTypeMap[pkField!.data_type]; |
Signed-off-by: ryjiang <jiangruiyi@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #499 +/- ##
==========================================
+ Coverage 97.19% 97.29% +0.10%
==========================================
Files 52 52
Lines 3674 3700 +26
Branches 978 996 +18
==========================================
+ Hits 3571 3600 +29
+ Misses 98 95 -3
Partials 5 5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: ryjiang <jiangruiyi@gmail.com>
Signed-off-by: ryjiang <jiangruiyi@gmail.com>
idsfield for searching by primary keys.