-
Notifications
You must be signed in to change notification settings - Fork 55
feat(storage): implement milvus vector store #744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #744 +/- ##
===================================================
+ Coverage 87.39556% 87.58866% +0.19310%
===================================================
Files 278 285 +7
Lines 36146 37361 +1215
===================================================
+ Hits 31590 32724 +1134
- Misses 2977 3002 +25
- Partials 1579 1635 +56
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4829443 to
28587e5
Compare
|
|
||
| func (vs *VectorStore) queryMetadataBatch(ctx context.Context, limit, offset int, ids []string, filter map[string]any) (map[string]vectorstore.DocumentMetadata, error) { | ||
| filterExpr := "" | ||
| if len(ids) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个条件是不是用 and 的关系也可以,现在是互斥的?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, 可以用and,改成用searchfilter重新构建语句了
| annReqs = append(annReqs, annReq) | ||
| } | ||
|
|
||
| hybridOption := client.NewHybridSearchOption(vs.option.collectionName, vs.getMaxResults(query.Limit), annReqs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
milvus 默认使用的RRFReranker ,这里返回的score会归一化吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看了下milvus请求的构建和文档,如果没有指定的话,应该没有使用RRFreranker
我另外加了reranker option,可以自定义配置
5e08ae2 to
196c8b0
Compare
| return "", nil | ||
| } | ||
|
|
||
| switch condition.Operator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的and 和 or条件构建函数是不是可以合并成一个函数, 以及 下面的 ==、!=、 >、>= 、< 、<= 这些也可以合并成一个函数。 这里的代码重复率太高了。可以参考:tcvector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| // milvusFilterConverter converts searchfilter conditions to Milvus expressions | ||
| type milvusFilterConverter struct{} | ||
|
|
||
| func (c *milvusFilterConverter) Convert(condition *searchfilter.UniversalFilterCondition) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里最好加一个recover 处理一下可能的panic。可以参考 tcvector的实现
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| return c.convertInCondition(condition) | ||
| case searchfilter.OperatorNotIn: | ||
| return c.convertNotInCondition(condition) | ||
| case searchfilter.OperatorLike: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是不是少了一个not like 的实现
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好像也还少了一个 searchfilter.OperatorBetween
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done,还有between
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要实现一下docBuilder, 方便支持用户自定义存储结构的转换处理。参考
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里其实已经实现了,不需要docBuilder,在创建存储collection和查询的时候都已经支持自定义存储结构
这里是指用户不是通过框架来创建的collection, 自己提前创建非标准的collection (这里主要是业务可能有自己独立的离线知识库处理流程,只用knowledge的retriever能力)。 所以需要一个WithDocBuilder, 让用户把非标的document存储结构转换成标准结构
可以看下docBuilder使用例子
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
理解了,加上docBuilder了,单元测试里也加上了
| ) | ||
|
|
||
| // Client is the interface for the milvus client. | ||
| type Client interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里看着都是按照v2的返回值定义的interface。 会有和v1有兼容性问题吗? 或者将来出v3是不是有可能接口兼容不了呢? 这里是不是应该定义和版本无关的接口。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里没有啥v1,之前的milvus-go-sdk已经废弃了,以后都会在milvus仓库维护了
这里主要是为了实现mock client,和milvus server本身没有啥兼容性问题
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里用过滤器模版是不是会好点
| conditions = append(conditions, fmt.Sprintf("%s in [%s]", vs.option.idField, strings.Join(idFilters, ", "))) | ||
| } | ||
|
|
||
| if len(filter.Metadata) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter.Metadata 以及 filter.IDs的条件构建是不是可以统一转换到filter.FilterCondition中,交给Converter统一来处理
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| return maxResults | ||
| } | ||
|
|
||
| func (vs *VectorStore) buildMetadataFilter(filter map[string]any) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的格式转换建议统一放到Converter中来处理。filter.FilterCondition中可能也会存在metadata.xx的条件
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经没有buildMetadataFilter了,全部交由Converter来处理
|
|
||
| func (c *milvusFilterConverter) convertCondition(condition *searchfilter.UniversalFilterCondition) (string, error) { | ||
| if condition == nil { | ||
| return "", nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
考虑与其他db的对齐一下,return error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| func (c *milvusFilterConverter) convertAndCondition(condition *searchfilter.UniversalFilterCondition) (string, error) { | ||
| if condition.Value == nil { | ||
| return "", nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不符合预期的条件或者value,建议都返回error。尽量别把error隐藏了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
196c8b0 to
137f586
Compare
| } | ||
| }() | ||
|
|
||
| if condition == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个条件判断貌似可以去掉, convertCondition方法中已经做了统一判断
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是不是搞反了,应该保留convertCondition中的判断, 去掉Convert中的判断。 因为AND OR逻辑条件时会递归调用convertCondition,所以这里统一在convertCondition做判断处理会完整一点
Signed-off-by: joey <[email protected]>
137f586 to
9ba3bfe
Compare
| // - String values must be enclosed in quotes | ||
| // - Numeric values (int, float) should not be quoted | ||
| // - Boolean values should not be quoted | ||
| func (c *milvusFilterConverter) formatValue(value interface{}) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time.Time类型需要做特殊处理吗?
| // - String values must be enclosed in quotes | ||
| // - Numeric values (int, float) should not be quoted | ||
| // - Boolean values should not be quoted | ||
| func (c *milvusFilterConverter) formatValue(value interface{}) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
milvus文档 这里使用过滤器模版是不是在性能和value兼容性上会好点?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对的,用过滤器模版会好点,这个我改下
No description provided.