Skip to content

Separate project creation and execution dependencies (inspired by Dagster) #5280

@DimedS

Description

@DimedS

Dagster implements a nice separation between project creation and project execution, and I believe this approach could be very valuable for Kedro to resolve our struggles with heavy dependencies that are not needed in production.

This separation is not only about keeping creation-time and runtime dependencies separate (which helps keep production environments lean), but also about improving the overall developer experience.

1. Project creation

A new Dagster project is created using uvx and a dedicated project-creation tool:

uvx create-dagster@latest project my-project
cd my-project
source .venv/bin/activate

Here, uvx runs the project generator in a temporary, isolated environment, so users do not need to install or manage the generator’s dependencies themselves.

We currently recommend doing something similar in our quickstart - using uvx with kedro new - but it makes less sense, because after that I still need to create a proper .venv for the same kedro package in order to run the project.

create-dagster automatically asks whether you want to run uv sync. If you answer yes, it creates a .venv with all dependencies from pyproject.toml installed.

A `uv` installation was detected. Run `uv sync`? This will create a uv.lock file and the virtual environment you need to activate in order to work on this project. If you wish to use a non-uv package manager, choose "n". (y/n) [y]:

At this step, we recommend using uv kedro run, which for me does not seem very clear, because it actually creates a .venv inside your project but activates it only temporarily. If I want to continue working with my new project, it feels better to activate the environment explicitly.

2. Project execution

Once the project is created and the .venv is activated, execution and orchestration are handled by the Dagster runtime:

dg dev

If you need to modify your Dagster project and add some assets, you should use commands from the main package, such as dg scaffold defs.

I think implementing the same approach would allow us to make core Kedro less heavy:

  1. Move kedro-new into a separate library, since it brings Cookiecutter with it. I think it is possible to do this without a major release if we add a thin wrapper around kedro new in the Kedro CLI, which would use the kedro-new library when it is installed or prompt the user to install it.
  2. Keep the rest, such as kedro pipeline create, inside kedro.

I also think it would be nice to embed uv sync into kedro new, the same way Dagster does.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue: Feature RequestNew feature or improvement to existing feature

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions