Skip to content

Conversation

@johnnygreco
Copy link
Contributor

The problem we are solving

ConfigurableTaskMetadata requires you to specify the resources required to execute the task. In practice, however, all tasks always have access to all resources. While this issue isn’t a big deal, it is confusing for plugin builders who don’t have the above context.

Changes

  • Remove required_resources from metadata

  • Always assume all tasks have access to all resources.

  • Use subclasses (instead of mixins) to streamline development and simplify plugin development.

An important note is that I think we still need some what for a generator to specify its required resources. For example, say we want to filter plugin types to only grab the ones that need the model registry or just need the datastore. The solution implemented here is an abstract method called get_required_resources that must be implemented on generators. This effectively pushes this complication to a lower lever, where most developers won't need to worry about it. I'll highlight some places in the code to show what I mean.



class WithChatCompletionGeneration(WithModelGeneration):
class ColumnGeneratorWithSingleModelChatCompletion(ColumnGeneratorWithSingleModel[TaskConfigT]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class name is getting a bit long for my liking lol

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be we drop single from these names assuming we mostly work with one model by default

@johnnygreco johnnygreco changed the title refactor: remove required resources from metadata and leverage subclasses over mixins refactor: update required resources treatment and leverage subclasses over mixins Jan 7, 2026
@johnnygreco johnnygreco changed the title refactor: update required resources treatment and leverage subclasses over mixins refactor: update required resources treatment and use subclasses over mixins Jan 7, 2026
@johnnygreco johnnygreco force-pushed the johnny/refactor/remove-required-resources-from-metadata branch from d7d6e9c to c4f717b Compare January 7, 2026 23:01
logger.info(f" |-- column name: {self.config.name!r}")
logger.info(f" |-- model config:\n{self.model_config.model_dump_json(indent=4)}")
if self.model_config.provider is None:
logger.info(f" |-- default model provider: {self._get_provider_name()!r}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're loosing this log message, which was added at some point to indicate the default mode provider being used when model config itself doesn't reference a provider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update makes it so this message is always logged, right? At least that was my intention. WDYT?



class WithChatCompletionGeneration(WithModelGeneration):
class ColumnGeneratorWithSingleModelChatCompletion(ColumnGeneratorWithSingleModel[TaskConfigT]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be we drop single from these names assuming we mostly work with one model by default

)

def _fan_out_with_threads(self, generator: WithModelGeneration, max_workers: int) -> None:
def _fan_out_with_threads(self, generator: ColumnGeneratorWithSingleModel, max_workers: int) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be at the ColumnGeneratorWithModelRegistry level?

@johnnygreco johnnygreco force-pushed the johnny/refactor/remove-required-resources-from-metadata branch from c4f717b to 1b18b73 Compare January 9, 2026 15:56
Comment on lines -65 to -99
def column_type_used_in_execution_dag(column_type: str | DataDesignerColumnType) -> bool:
"""Return True if the column type is used in the workflow execution DAG."""
column_type = resolve_string_enum(column_type, DataDesignerColumnType)
dag_column_types = {
DataDesignerColumnType.EXPRESSION,
DataDesignerColumnType.LLM_CODE,
DataDesignerColumnType.LLM_JUDGE,
DataDesignerColumnType.LLM_STRUCTURED,
DataDesignerColumnType.LLM_TEXT,
DataDesignerColumnType.VALIDATION,
DataDesignerColumnType.EMBEDDING,
}
dag_column_types.update(plugin_manager.get_plugin_column_types(DataDesignerColumnType))
return column_type in dag_column_types


def column_type_is_model_generated(column_type: str | DataDesignerColumnType) -> bool:
"""Return True if the column type is a model-generated column."""
column_type = resolve_string_enum(column_type, DataDesignerColumnType)
model_generated_column_types = {
DataDesignerColumnType.LLM_TEXT,
DataDesignerColumnType.LLM_CODE,
DataDesignerColumnType.LLM_STRUCTURED,
DataDesignerColumnType.LLM_JUDGE,
DataDesignerColumnType.EMBEDDING,
}
model_generated_column_types.update(
plugin_manager.get_plugin_column_types(
DataDesignerColumnType,
required_resources=["model_registry"],
)
)
return column_type in model_generated_column_types


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved these both to engine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants