feat: 🔌 Initial plugin system implementation #23

johnnygreco · 2025-11-10T02:55:15Z

🔌 Plugin System Overview

This PR implements the initial scaffolding for Data Designer's plugin system. Column Generators are the first plugin type we will support. For now, plugin developers will need to deal with all the boilerplate and build their generators from the bottom up. This means importing and correctly using objects like the ConfigurableTask base class. In a follow up PR, we will implement a decorator that will streamline plugin development.

Update: I am starting to have doubts about the decorator approach mentioned above. I don't think it will play nice with IDEs. I'm thinking about instead implementing helper objects that will make it easier to put column generators together. For example, so you don't have to worry about defining the task metadata or inherit from ColumnGenerator (not sure if we can avoid this one, though).

💡 Plugin Discovery

Failed attempt: file search

I initially built the plugin discovery so that it recursively searched for plugins in python files located in ~/.data_designer/plugins. However, it became clear that this approach (1) wouldn't be straightforward to transition to pip-installable plugins and (2) doesn't play nice with IDEs (e.g., how are the plugins loaded? is the plugin file in the user's python path?).

Successful approach (hopefully): entrypoints

Instead, plugin discovery has been implemented using entrypoints. In short, plugin developers need to create a python package for their plugin and specify an entrypoint (or points) that point to their plugin object(s).

For example, they might have an entrypoint like

[project.entry-points."data_designer.plugins"]
example-plugin = "example_plugin.myplugin:plugin"

Data Designer can then discover this plugin as follows:

from importlib.metadata import entry_points

for ep in entry_points(group="data_designer.plugins"):
    plugin = ep.load()

While this approach requires a bit more effort from the plugin developer, it will support pip-installed plugins out of the box. It also plays nicely with IDEs, since the user can import plugin configs directly and use them with DD:

from my_plugin.config import MyPluginColumnConfig

config_builder.add_column(MyPluginColumnConfig(...))

📢 Callouts

I split data_designer/configs/columns.py into two files: column_configs.py and column_types.py. The latter is where plugins are injected into type unions, emoji maps, the column type enum, etc. Importantly, column_types.py shouldn't be imported in the plugin implementation. With plugin extensions, column_types.py should be thought of as something that gets fully defined at runtime (with all the plugins ready to go). Thinking about renaming to _column_types.py to make this more clear.
There is a PluginManager, which handles plugin discovery, registration, and loading / fetching.
This is the Plugin object that needs to be defined and added to the entrypoint (i.e., the ":plugin" at the end of the entrypoint definition above).
These plugin helper functions are how the client-side code interacts with plugins (can't depend on engine or plugin code directly).

pyproject.toml

src/data_designer/config/analysis/column_statistics.py

src/data_designer/config/column_types.py

src/data_designer/engine/registry/data_designer_registry.py

src/data_designer/essentials/__init__.py

nabinchha

Thanks for partitioning the changes into multiple scoped PRs. It's helpful to follow along and easier to review.

src/data_designer/engine/registry/base.py

src/data_designer/plugins/manager.py

src/data_designer/config/utils/plugin_helpers.py

src/data_designer/plugins/plugin.py

tests/config/utils/test_plugin_helpers.py

tests/plugins/test_manager.py

johnnygreco · 2025-11-10T22:24:56Z

Thanks for partitioning the changes into multiple scoped PRs. It's helpful to follow along and easier to review.

Trying to follow your lead, my friend 👍

src/data_designer/config/utils/plugin_helpers.py

Co-authored-by: Nabin Mulepati <[email protected]>

johnnygreco · 2025-11-11T17:22:48Z

@nabinchha – made some internal updates to the PluginRegistry in c752bd4. I moved the thread lock call to the initialization, which calls _discover (now private), which is where all plugin registration happens (also removed the register method). Main thinking here is that discovery should only ever happen once and users shouldn't manually discover or register plugins.

src/data_designer/plugin_manager.py