-
Notifications
You must be signed in to change notification settings - Fork 51
refactor: update required resources treatment and use subclasses over mixins #184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor: update required resources treatment and use subclasses over mixins #184
Conversation
|
|
||
|
|
||
| class WithChatCompletionGeneration(WithModelGeneration): | ||
| class ColumnGeneratorWithSingleModelChatCompletion(ColumnGeneratorWithSingleModel[TaskConfigT]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class name is getting a bit long for my liking lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be we drop single from these names assuming we mostly work with one model by default
d7d6e9c to
c4f717b
Compare
| logger.info(f" |-- column name: {self.config.name!r}") | ||
| logger.info(f" |-- model config:\n{self.model_config.model_dump_json(indent=4)}") | ||
| if self.model_config.provider is None: | ||
| logger.info(f" |-- default model provider: {self._get_provider_name()!r}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're loosing this log message, which was added at some point to indicate the default mode provider being used when model config itself doesn't reference a provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The update makes it so this message is always logged, right? At least that was my intention. WDYT?
|
|
||
|
|
||
| class WithChatCompletionGeneration(WithModelGeneration): | ||
| class ColumnGeneratorWithSingleModelChatCompletion(ColumnGeneratorWithSingleModel[TaskConfigT]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be we drop single from these names assuming we mostly work with one model by default
| ) | ||
|
|
||
| def _fan_out_with_threads(self, generator: WithModelGeneration, max_workers: int) -> None: | ||
| def _fan_out_with_threads(self, generator: ColumnGeneratorWithSingleModel, max_workers: int) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be at the ColumnGeneratorWithModelRegistry level?
c4f717b to
1b18b73
Compare
| def column_type_used_in_execution_dag(column_type: str | DataDesignerColumnType) -> bool: | ||
| """Return True if the column type is used in the workflow execution DAG.""" | ||
| column_type = resolve_string_enum(column_type, DataDesignerColumnType) | ||
| dag_column_types = { | ||
| DataDesignerColumnType.EXPRESSION, | ||
| DataDesignerColumnType.LLM_CODE, | ||
| DataDesignerColumnType.LLM_JUDGE, | ||
| DataDesignerColumnType.LLM_STRUCTURED, | ||
| DataDesignerColumnType.LLM_TEXT, | ||
| DataDesignerColumnType.VALIDATION, | ||
| DataDesignerColumnType.EMBEDDING, | ||
| } | ||
| dag_column_types.update(plugin_manager.get_plugin_column_types(DataDesignerColumnType)) | ||
| return column_type in dag_column_types | ||
|
|
||
|
|
||
| def column_type_is_model_generated(column_type: str | DataDesignerColumnType) -> bool: | ||
| """Return True if the column type is a model-generated column.""" | ||
| column_type = resolve_string_enum(column_type, DataDesignerColumnType) | ||
| model_generated_column_types = { | ||
| DataDesignerColumnType.LLM_TEXT, | ||
| DataDesignerColumnType.LLM_CODE, | ||
| DataDesignerColumnType.LLM_STRUCTURED, | ||
| DataDesignerColumnType.LLM_JUDGE, | ||
| DataDesignerColumnType.EMBEDDING, | ||
| } | ||
| model_generated_column_types.update( | ||
| plugin_manager.get_plugin_column_types( | ||
| DataDesignerColumnType, | ||
| required_resources=["model_registry"], | ||
| ) | ||
| ) | ||
| return column_type in model_generated_column_types | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved these both to engine
The problem we are solving
ConfigurableTaskMetadatarequires you to specify the resources required to execute the task. In practice, however, all tasks always have access to all resources. While this issue isn’t a big deal, it is confusing for plugin builders who don’t have the above context.Changes
Remove
required_resourcesfrom metadataAlways assume all tasks have access to all resources.
Use subclasses (instead of mixins) to streamline development and simplify plugin development.
An important note is that I think we still need some what for a generator to specify its required resources. For example, say we want to filter plugin types to only grab the ones that need the model registry or just need the datastore. The solution implemented here is an abstract method called
get_required_resourcesthat must be implemented on generators. This effectively pushes this complication to a lower lever, where most developers won't need to worry about it. I'll highlight some places in the code to show what I mean.