Skip to content

[BUG] Transform flow: Fields added via DataScope assignment in row context don't persist to parent table schemaΒ #818

@lemorage

Description

@lemorage

Describe the bug
Based on the following MRE:

  1. Schema Issue: The _engine_data_slice doesn't update the children table's schema to include new_field. The children table schema remains LTable(Struct(value: Int64)) instead of becoming LTable(Struct(value: Int64, new_field: Int64)).
  2. Field Scope: The new_field exists temporarily within the row context (DataScope) but doesn't persist to the parent table's schema.
  3. Lost on Return: When returning data, the new_field is not part of the table schema, so the decoder falls back to the default value (0) from the NewChild dataclass.

Relevant Context: #737

To Reproduce
Create a new test file under python/cocoindex/tests and run this MRE code:

import typing
from dataclasses import dataclass
import pytest
import cocoindex

@dataclass
class Child:
    value: int

@dataclass
class Parent:
    children: list[Child]

@dataclass
class NewChild:
    value: int
    new_field: int = 0

@dataclass
class ParentWithNewField:
    children: list[NewChild]

@pytest.fixture(scope="session", autouse=True)
def init_cocoindex() -> typing.Generator[None, None, None]:
    cocoindex.init()
    yield

@cocoindex.op.function()
def extract_value(value: int) -> int:
    return value

# Transform flow that should add new_field
@cocoindex.transform_flow()
def for_each_transform(
    data: cocoindex.DataSlice[Parent],
) -> cocoindex.DataSlice[ParentWithNewField]:
    with data["children"].row() as child:
        child["new_field"] = child["value"].transform(extract_value)
    return data

def test_for_each_transform():
    input_data = Parent(children=[Child(1), Child(2), Child(3)])
    result = for_each_transform.eval(input_data)
    print(f"Result: {result}")
    expected = ParentWithNewField(
        children=[
            NewChild(value=1, new_field=1),
            NewChild(value=2, new_field=2),
            NewChild(value=3, new_field=3),
        ]
    )
    assert result == expected, f"Expected {expected}, got {result}"

# Expected ParentWithNewField(children=
# [NewChild(value=1, new_field=1),
# NewChild(value=2, new_field=2),
# NewChild(value=3, new_field=3)]),
# got ParentWithNewField(children=
# [NewChild(value=1, new_field=0),
# NewChild(value=2, new_field=0),
# NewChild(value=3, new_field=0)])

Expected behavior

  • Intended: child["new_field"] = ... should modify the table schema and persist
  • Actual: The field assignment is temporary and doesn't affect the parent table schema

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions