-
Notifications
You must be signed in to change notification settings - Fork 267
feat: add transform method to FlowBuilder
#675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
We opened the issue #654, because there was a limitation of the current chain-style API ( Here we propose another solution, which is to add @cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
) -> None:
"""
Define an example flow that embeds text into a vector database.
"""
data_scope["documents"] = flow_builder.add_source(
cocoindex.sources.LocalFile(path="markdown_files")
)
doc_embeddings = data_scope.add_collector()
with data_scope["documents"].row() as doc:
# Chain-style
doc["chunks"] = doc["content"].transform(
cocoindex.functions.SplitRecursively(),
language="markdown",
chunk_size=2000,
chunk_overlap=500,
)
# Non-chain-style
func = cocoindex.functions.SplitRecursively()
chunks = func(
text=doc["content"],
language="markdown",
chunk_size=2000,
chunk_overlap=500,
)
# FlowBuilder Transform
doc["chunks"] = flow_builder.transform(
cocoindex.functions.SplitRecursively(),
doc["content"],
language="markdown",
chunk_size=2000,
chunk_overlap=500,
)Compared to the Non-chain-style approach, one more con for FlowBuilder Transform approach I'd like to add here is, we could not get the intelligent editor hints for which param should be input when using the built-in CocoIndex functions, like |
transform method to FlowBuilder
Yes, noted. This is indeed a limitation. I think there may be some tricks in Python's generic type annotations / meta programming that may be helpful to resolve this. I don't have time to think it carefully yet, but typing.Concatenate may be related. With this, probably we may be able to define a placeholder method (or some sort of annotations) for function specs with parameter types, and annotate Will merge this PR for now. We can resolve this separately. |
This PR implements the suggested regime mentioned in #654.
Relates to #653, and closes #654.