Skip to content

Conversation

@lemorage
Copy link
Contributor

This PR implements the suggested regime mentioned in #654.

Relates to #653, and closes #654.

@lemorage
Copy link
Contributor Author

lemorage commented Jul 1, 2025

We opened the issue #654, because there was a limitation of the current chain-style API (arg0.transform(...)), which assumed a clear primary input. This breaks down in cases like extract_from_llm(text=None, image=None), where inputs are optional and no single argument acts as “self”. Our proposed solution to this was to add a non-chain-style API.

Here we propose another solution, which is to add transform to FlowBuilder. So here is the diff in terms of usage:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(
    flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
) -> None:
    """
    Define an example flow that embeds text into a vector database.
    """
    data_scope["documents"] = flow_builder.add_source(
        cocoindex.sources.LocalFile(path="markdown_files")
    )

    doc_embeddings = data_scope.add_collector()

    with data_scope["documents"].row() as doc:
        # Chain-style
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown",
            chunk_size=2000,
            chunk_overlap=500,
        )
        
        # Non-chain-style
        func = cocoindex.functions.SplitRecursively()
        chunks = func(
            text=doc["content"],
            language="markdown",
            chunk_size=2000,
            chunk_overlap=500,
        )
        
        # FlowBuilder Transform
        doc["chunks"] = flow_builder.transform(
            cocoindex.functions.SplitRecursively(),
            doc["content"],
            language="markdown",
            chunk_size=2000,
            chunk_overlap=500,
        )

Compared to the Non-chain-style approach, one more con for FlowBuilder Transform approach I'd like to add here is, we could not get the intelligent editor hints for which param should be input when using the built-in CocoIndex functions, like ExtractByLlm, SplitRecursively, etc.

@lemorage lemorage changed the title feat: support function calling in non-chain style feat: add transform method to FlowBuilder Jul 1, 2025
@badmonster0
Copy link
Member

Compared to the Non-chain-style approach, one more con for FlowBuilder Transform approach I'd like to add here is, we could not get the intelligent editor hints for which param should be input when using the built-in CocoIndex functions, like ExtractByLlm, SplitRecursively, etc.

Yes, noted. This is indeed a limitation.

I think there may be some tricks in Python's generic type annotations / meta programming that may be helpful to resolve this. I don't have time to think it carefully yet, but typing.Concatenate may be related. With this, probably we may be able to define a placeholder method (or some sort of annotations) for function specs with parameter types, and annotate transform() in a way to correctly forward these type annotations?

Will merge this PR for now. We can resolve this separately.

@badmonster0 badmonster0 merged commit 000fae8 into cocoindex-io:main Jul 2, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support non-chain-style cocoindex function call

2 participants