Skip to content

Conversation

@mikeknep
Copy link
Contributor

@mikeknep mikeknep commented Dec 31, 2025

This PR updates the plugin system. Development notes are captured below.

Requirements

  1. Plugins MUST support defining both a configuration object (a Pydantic model) and some engine-related implementation object (ConfigurableTask, ColumnGenerator, etc.).
  2. The UX for making plugins discoverable MUST be simple. We should only require users define a single Plugin object that gets referenced in a single entry point.
  3. The plugin system MUST NOT introduce a dependency chain that makes any config module depend on any engine module.
    a. Breaks "slim install" support, because engine code may include third-party deps that a config-only slim install will not include.
    b. Introduces a high risk of circular imports, because engine code depends on config modules.
  4. A client using a slim-install of the library SHOULD be able to use plugins.

Current state

The current plugin system violates REQ 3 (and by extension REQ 4):

config.column_types -> data_designer.plugin_manager -> data_designer.plugins.plugin -> data_designer.engine.configurable_task

(-> means "imports" aka "depends on")

Updates here

Make the Plugin object "lazy" by defining the config and and task types as fully-qualified strings rather than objects.

By using strings in the Plugin object fields, if the plugin is structured with multiple files (e.g. config.py and task.py)*, then the core library's config code that uses plugins (to extend discriminated union types) can load the plugin and resolve only the config class type; it would not need to resolve/load/import the plugin's task-related module where engine base classes are imported and subclassed.

*This multi-file setup wouldn't be required out of the box; see "Plugin development lifecycle" below.

Example:

# src/my_plugin/config.py
from data_designer.config.column_types import SingleColumnConfig

class MyPluginConfig(SingleColumnConfig):
    foo: str



# src/my_plugin/generator.py
from data_designer.engine.column_generators.generators.base import ColumnGenerator
from my_plugin.config import MyPluginConfig

class MyPluginGenerator(ColumnGenerator[MyPluginConfig]):
    pass



# src/my_plugin/plugin.py
from data_designer.plugins.plugin import Plugin, PluginType

plugin = Plugin(
    config_qualified_name="my_plugin.config.MyPluginConfig",
    task_qualified_name="my_plugin.generator.MyPluginGenerator",
    plugin_type=PluginType.COLUMN_GENERATOR,
)

Strings instead of concrete types?

Yeah, a little sad, but seems a reasonable compromise given the benefits this unlocks.

We can add a Pydantic validator to verify that the referenced module does exist and that there is a class with the given name in that module.

For one step further, we ship a test helper function that we'd encourage plugin authors use in their unit tests:

# my_plugin/tests/test_plugin.py
from data_designer.plugins.testing import assert_valid_plugin
from my_plugin.plugin import plugin


def test_plugin_validity():
    assert_valid_plugin(plugin)

(Similar to pd.testing.assert_frame_equal.)

To start, that test helper would ensure two things:

  1. The string class names resolve to concrete types that do exist
  2. The resolved concrete types are subclasses of the expected base classes

In the future, we could extend the helper to validate other things that are more complex than just Pydantic field type validations.

Remember: we can't implement this validation as a Pydantic validator because it would break the laziness. We can at least validate that the module exists (and this branch does so), but only the test helper can go further and actually fully resolve the two fields.

Plugin development lifecycle

A plugin author could continue defining everything in one Python file and things would still work in the library. The limitation would be that a plugin defined that way would not support slim installs, and so clients like NMP would not be able to use it. This might be perfectly fine for many plugins, especially in the early going. A reasonable "plugin development lifecycle" might be:

  1. Develop everything in one file and get it working with the library
  2. Refactor the plugin to support slim installs (if ever desired)

Plugin authors would only need to do step 2 if/when we want to make the plugin available to a client that requires a slim install (NMP, possibly others). That step 2 refactor would involve breaking the plugin implementation up into multiple files and (if necessary) making sure any heavyweight, task-only third party dependencies are included under an engine extra.

@mikeknep mikeknep force-pushed the mknepper/refactor/plugins branch 2 times, most recently from 29aa3b6 to f4c501e Compare December 31, 2025 20:43
@mikeknep mikeknep force-pushed the mknepper/refactor/plugins branch from 4a2bdac to 63487fd Compare January 5, 2026 23:25
@mikeknep mikeknep force-pushed the mknepper/refactor/plugins branch from f67f352 to 54206bf Compare January 5, 2026 23:58
Copy link
Contributor

@johnnygreco johnnygreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛸

Thanks for this @mikeknep!

@mikeknep mikeknep merged commit 36a174a into main Jan 6, 2026
13 checks passed
@mikeknep mikeknep deleted the mknepper/refactor/plugins branch January 6, 2026 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants