Skip to content

Conversation

@johnnygreco
Copy link
Contributor

@johnnygreco johnnygreco commented Nov 5, 2025

This PR is in preparation for the plugin framework.

Main Change

column_type is now a pydantic field of column configs (as opposed to a property).

The reasons for this change are:

  • This makes it more straightforward for the plugin framework to dynamically update the column types. For instance, the column_type field is now given as a Literal["column-type"], which can be defined before the DataDesignerColumnType string enum is created. And we can use the type union (i.e., ColumnConfigT) to create DataDesignerColumnType after all the configs have been defined.

  • We can now use this field as a type union discriminator, which is pydantic's most reliable way to identify concrete objects from serialized configs.

Other Notes

  • This changes the API because the column configs now require the column_type field. The user never needs to set this in the SDK, but it is required in yaml configs (this is actually a good thing, though, since it makes it easy to identify column types in the config).

Comment on lines +146 to +150
DataDesignerColumnType = create_str_enum_from_discriminated_type_union(
enum_name="DataDesignerColumnType",
type_union=ColumnConfigT,
discriminator_field_name="column_type",
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where we dynamically create this StrEnum.

columns:
- name: code_id
sampler_type: uuid
column_type: sampler
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yaml configs now have the column_type field.

Comment on lines 166 to 185
def column_type_is_in_dag(column_type: Union[str, DataDesignerColumnType]) -> bool:
column_type = resolve_string_enum(column_type, DataDesignerColumnType)
return column_type in [
DataDesignerColumnType.EXPRESSION,
DataDesignerColumnType.LLM_CODE,
DataDesignerColumnType.LLM_JUDGE,
DataDesignerColumnType.LLM_STRUCTURED,
DataDesignerColumnType.LLM_TEXT,
DataDesignerColumnType.VALIDATION,
]


def column_type_is_llm_generated(column_type: Union[str, DataDesignerColumnType]) -> bool:
column_type = resolve_string_enum(column_type, DataDesignerColumnType)
return column_type in [
DataDesignerColumnType.LLM_TEXT,
DataDesignerColumnType.LLM_CODE,
DataDesignerColumnType.LLM_STRUCTURED,
DataDesignerColumnType.LLM_JUDGE,
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are defined as functions instead of methods on DataDesignerColumnType. We'll need to make the lists they check dynamic once plugins are implemented.

"""

columns: list[ColumnConfigT] = Field(min_length=1)
columns: list[Annotated[ColumnConfigT, Field(discriminator="column_type")]] = Field(min_length=1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we can do this

@johnnygreco johnnygreco force-pushed the johnny/make-column-type-a-field-in-configs branch from a436172 to 17b984f Compare November 6, 2025 21:05
Comment on lines +169 to +174
DataDesignerColumnType.EXPRESSION,
DataDesignerColumnType.LLM_CODE,
DataDesignerColumnType.LLM_JUDGE,
DataDesignerColumnType.LLM_STRUCTURED,
DataDesignerColumnType.LLM_TEXT,
DataDesignerColumnType.VALIDATION,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does our type checker have an issue with this? Since DataDesignerColumnType is dynamically resolved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not running ty yet, but my IDE seems happy with all of it. If you can pull the branch to check your IDE settings, that might be helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's change these to sets

import networkx as nx

from data_designer.config.columns import ColumnConfigT
from data_designer.config.columns import ColumnConfigT, column_type_is_in_dag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: column_type_is_in_dag -> column_type_used_in_execution_dag or something like that?... to be specific about what this dag is.

nabinchha
nabinchha previously approved these changes Nov 6, 2025
Copy link
Contributor

@nabinchha nabinchha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some nits, thanks!

Copy link
Contributor

@nabinchha nabinchha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@johnnygreco johnnygreco merged commit 728e319 into main Nov 6, 2025
10 checks passed
@github-actions
Copy link
Contributor

github-actions bot commented Nov 6, 2025

Thank you for your submission! We ask that you sign our Developer Certificate of Origin before we can accept your contribution. You can sign the DCO by adding a comment below using this text:


I have read the DCO document and I hereby sign the DCO.


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the DCO Assistant Lite bot.

@nabinchha nabinchha deleted the johnny/make-column-type-a-field-in-configs branch December 9, 2025 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants