Split Python CLI job into smaller jobs #1279

phil-opp · 2025-12-23T13:11:45Z

for easier debugging

This reverts commit 796e646.

phil-opp · 2025-12-23T14:40:44Z

@haixuanTao I tried multiple approaches, but the Node example keeps on failing. I'm a bit out of ideas, maybe you know what's going wrong?

I also tried the example locally, but I had some issues with CUDA.

In my opinion the Python node example is too complex for the main dora repo since it uses some heavy dependencies (yolo, rerun). There were various CI failures caused by it, which forced me to spent a lot of time debugging them. I still don't know if they were caused by my changes or if I was just unlucky. Perhaps we can move the example to the node-hub repo and add a simpler example with no external dependencies? Would that be ok with you?

haixuanTao · 2025-12-27T16:42:03Z

I think that the fact that it works in Windows and not other OS probably means that there's probably nothing wrong with the split of the CI jobs.

I wonder if it might be linked to cli/api version mismatch or something but yeah would probably need to dive deeper beyond the changes included in this PR.

phil-opp · 2026-01-07T15:27:09Z

What do you think about moving the python example to the node hub repo?

haixuanTao · 2026-01-09T16:03:26Z

I think we need to test python feature especially when we develop them so not sure how moving them would make it easier to do this

phil-opp · 2026-01-09T16:49:44Z

Yes, we should keep testing python code, just not something so dependency-heavy and fragile. Right now we don't even know why splitting the CI jobs breaks the test, so this mainly introduces complexity into our development cycle.

phil-opp · 2026-01-09T16:56:34Z

I probably spent multiple days trying to fix the failing tests in this PR and #1278 and it is still not clear to me whether these errors are actually related to my changes or whether it's just an unrelated side effect. Reproducing these errors locally is not easy or even possible for me, so I have to debug through the CI, which is extremely slow.

I'm fine with keeping these tests in the repo if you prefer, as long as we move them to a separate, optional job. This way we could still merge with a failing job when the error seems unrelated.

haixuanTao · 2026-01-12T09:27:20Z

I understand the frustration.

I think the thing is that, the fact that the CI is failing in 2 different OS means there is definitely a bug somewhere that user will face at one point or another.

And I believe, moving those test to an optional CI would make our code less reliable and more prone to errors as we're not fixing errors that is underlying within our repo.

As we don't have all the time in the world to fix all the issues within dora, I would prefer that we either keep this PR open and wait for someone to take over or rebase it, or close it.

phil-opp · 2026-01-12T10:19:53Z

I created this PR mainly for making debugging the CI failures of #1278 easier. #1278 implements an actual improvement to Dora to reduce the version mismatch error that users are experiencing. This work is now blocked on some CI errors that I cannot reproduce locally.

I think the thing is that, the fact that the CI is failing in 2 different OS means there is definitely a bug somewhere that user will face at one point or another.

I agree with that. We should try our best to find and fix this.

The best way to do that in my opinion is to create a simpler Python test case that tells us whether the basic functionality works as expected. Right now we only have a very complex test that involves heavy dependencies, so we have no idea where to look (is this even an issue in Dora or is some dependency hanging).

As we don't have all the time in the world to fix all the issues within dora, I would prefer that we either keep this PR open and wait for someone to take over or rebase it, or close it.

If this issue is too complex for us to debug, it will probably not be fixed by a contributor that is less familiar with Dora either. So the bug will probably stay unfixed.

And given that it affects other PRs as well, we will waste a lot of time on CI failures that aren't related to the changes we make.

haixuanTao · 2026-01-12T11:02:44Z

Ok, so what you are saying is that we replace the current python CI example with something simpler?

Yeah that's fine with me!

phil-opp · 2026-01-13T16:21:03Z

Ok great! I opened #1306 for this.

Rename the object detection dataflow example to `dataflow-object-detection.yml` This was discussed in #1279 (comment)

haixuanTao · 2026-01-14T09:33:38Z

@claude , I think there is an env missmatch that is added within this PR CI/CD, could you create a new PR on top of this one that aligns the environment. Not that at moment uv can be a confused about which env to use and can fall back on a new env if there is a pyproject in the current working directory. It could also potentially select a virtual env different from the active env if there is a venv within the current directory or sometimes if there is one in a parent directory.

To makes things simpler maybe we should just either have only 1 virtual env for the whole CI/CD or make sure that each virtual env is well isolated and not conflicting with any pyproject.toml.

claude · 2026-01-14T09:38:53Z

Claude finished @haixuanTao's task in 1m 18s —— View job

CI/CD Environment Mismatch Analysis

Issue Identified

The PR splits Python CLI tests into smaller jobs, but introduces a potential environment mismatch where uv can get confused about which virtual environment to use.

Root Cause:

A single venv is created at the repository root (.venv/)
Tests navigate into subdirectories with their own pyproject.toml files (e.g., test_python_project, various examples)
uv may:
- Fall back to creating a new environment when it finds a pyproject.toml in the current directory
- Select a different virtual environment if one exists in the current or parent directory
- Not respect the activated environment in some cases

Affected Test Steps:

Test CLI (Python): Template Project - creates and enters test_python_project/ directory
Various example tests that have their own pyproject.toml files

Limitation

⚠️ I cannot modify .github/workflows/ci.yml files due to GitHub App permissions restrictions. Workflow files require elevated permissions for security reasons.

You can find more information in the FAQ.

Proposed Solutions

Option 1: Force explicit venv path (Recommended)

Set the UV_PROJECT_ENVIRONMENT environment variable to force uv to use a specific venv regardless of directory:

- name: "Set up venv"
  shell: bash
  run: |
    uv venv --seed -p 3.12 ${{ github.workspace }}/.venv
    echo "UV_PROJECT_ENVIRONMENT=${{ github.workspace }}/.venv" >> $GITHUB_ENV
    if [ "$RUNNER_OS" != "Windows" ]; then
      source .venv/bin/activate
    else
      source .venv/Scripts/activate
    fi
    
    uv pip install -e apis/python/node
    uv pip install ruff pytest

This ensures all uv commands use the same virtual environment location, preventing confusion when working in subdirectories.

Option 2: Use system-level installation

Install packages at the system level (which is safe in CI runners):

- name: "Install Python dependencies"
  shell: bash
  run: |
    uv pip install --system -e apis/python/node
    uv pip install --system ruff pytest

Then remove all venv activation blocks and use commands directly (or with uv run --system).

Option 3: Isolate each test with its own venv

Create a fresh venv for each test step to ensure complete isolation, but this would be slower and use more disk space.

Recommendation

Option 1 is the best approach because:

✅ Maintains a single venv (faster, less disk usage)
✅ Explicit control over which environment uv uses
✅ No ambiguity when navigating directories
✅ Minimal changes to existing workflow

Next Steps

Since I cannot modify the workflow file directly, here are your options:

Apply the changes manually - Add echo "UV_PROJECT_ENVIRONMENT=${{ github.workspace }}/.venv" >> $GITHUB_ENV to the "Set up venv" step
Create a patch file - I can create a detailed patch file with all the changes
Share the exact diff - I can provide the precise changes needed

Which would you prefer?

phil-opp · 2026-01-14T10:18:14Z

@claude Please create a patch file for Option 1. Could you also include a new job for running the example added in #1298?

claude · 2026-01-14T10:19:14Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

Also: Run the new example added in #1298

This same example will be tested in the `node-hub` repo as part of dora-rs/dora-hub#23 Testing with external deps will likely lead to version mismatches of the message crate. So it requires synchronous changes to two repos, which makes development difficult.

phil-opp · 2026-01-14T12:09:30Z

Since the issue is likely caused by the external dependencies that use pinned commits to the node-hub repo, I just removed the object detection example from this repo and the CI in 7f477f1. The same example already lives in https://github.com/dora-rs/dora-hub/blob/main/examples/object-detection/yolo.yml, so I opened dora-rs/dora-hub#23 to run the node hub examples on CI too. This way, we keep testing the example with the latest version of dora.

Split Python CLI job into smaller jobs

e42b585

for easier debugging

phil-opp force-pushed the split-cli-jobs branch from 57b182f to e42b585 Compare December 23, 2025 13:18

phil-opp added 5 commits December 23, 2025 14:29

Reduce log level

7135e42

Integrate uv setup in template project job again

796e646

Revert "Integrate uv setup in template project job again"

027723a

This reverts commit 796e646.

Try appending to GITHUB_PATH

c6975c5

Activate venv for each job

66f1c83

phil-opp mentioned this pull request Jan 13, 2026

Create a simpler Python dataflow example #1306

Merged

phil-opp added 4 commits January 13, 2026 17:47

Fix: install requirements for python operator dataflow example

951abc4

Merge branch 'simpler-python-example' into split-cli-jobs

75d6519

Rename jobs

752b5b6

Also run more complex Python Objection Detection node example

dcdd9ce

haixuanTao added a commit that referenced this pull request Jan 13, 2026

Create a simpler Python dataflow example (#1306)

b603e64

Rename the object detection dataflow example to `dataflow-object-detection.yml` This was discussed in #1279 (comment)

Merge branch 'main' into split-cli-jobs

894afce

phil-opp added 2 commits January 14, 2026 11:33

Use UV_PROJECT_ENVIRONMENT env variable to simplify venv handling

dcbf7c0

Also: Run the new example added in #1298

Fix: The 'multiple arrays' example does not exit on its own

969f8fb

haixuanTao approved these changes Jan 14, 2026

View reviewed changes

phil-opp enabled auto-merge (squash) January 14, 2026 13:17

Remove other instance of object detection node example

e252601

Split Python CLI job into smaller jobs #1279

Are you sure you want to change the base?

Split Python CLI job into smaller jobs #1279

Uh oh!

Conversation

phil-opp commented Dec 23, 2025

Uh oh!

phil-opp commented Dec 23, 2025

Uh oh!

haixuanTao commented Dec 27, 2025

Uh oh!

phil-opp commented Jan 7, 2026

Uh oh!

haixuanTao commented Jan 9, 2026

Uh oh!

phil-opp commented Jan 9, 2026

Uh oh!

phil-opp commented Jan 9, 2026

Uh oh!

haixuanTao commented Jan 12, 2026

Uh oh!

phil-opp commented Jan 12, 2026

Uh oh!

haixuanTao commented Jan 12, 2026

Uh oh!

phil-opp commented Jan 13, 2026

Uh oh!

haixuanTao commented Jan 14, 2026

Uh oh!

claude bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI/CD Environment Mismatch Analysis

Issue Identified

Limitation

Proposed Solutions

Recommendation

Next Steps

Uh oh!

phil-opp commented Jan 14, 2026

Uh oh!

claude bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phil-opp commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

claude bot commented Jan 14, 2026 •

edited

Loading

claude bot commented Jan 14, 2026 •

edited

Loading