ci: test-e2e reworked to use TestPypi by tarilabs · Pull Request #105 · eval-hub/eval-hub-sdk

tarilabs · 2026-03-31T09:56:24Z

What and why

Refactor and simplify test-e2e using the Dev release of eval-hub-server from TestPypi

Type

Testing

Tests added or updated
Tested manually

Breaking changes

none

Summary by CodeRabbit

Release Notes

Tests
- Added debug logging option for end-to-end tests to improve troubleshooting
- Enhanced test execution with improved marker-based test selection and debugging capabilities
- Improved temporary directory preservation for easier failure investigation
Chores
- Updated eval-hub-server dependency sourcing for better accessibility
- Streamlined test infrastructure and automated workflow configuration

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: tarilabs <matteo.mortari@gmail.com>

Signed-off-by: tarilabs <matteo.mortari@gmail.com>

coderabbitai · 2026-03-31T10:56:55Z

Warning

Rate limit exceeded

@tarilabs has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 13 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 12 minutes and 13 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ef2d023e-c827-4586-b3a9-8076fb27517b

📥 Commits

Reviewing files that changed from the base of the PR and between 6f217e0 and 76c50d9.

📒 Files selected for processing (2)

Makefile
tests/e2e/conftest.py

📝 Walkthrough

Walkthrough

This PR removes the matrix-driven workflow step that fetches eval-hub-server from a git repository, replacing it with a TestPyPI-based distribution model. The test-e2e target is updated with improved debug logging capabilities, and the E2E fixture now explicitly manages temporary directories for debugging purposes.

Changes

Cohort / File(s)	Summary
CI/CD Workflow `.github/workflows/test.yml`	Removed matrix configuration that conditionally fetched eval-hub-server from external repository; deleted the `latest_eval_hub_server` matrix variable and its conditional fetch step.
Build Configuration `Makefile`	Updated `test-e2e` target to remove `--no-sync` flag, add `-m e2e` marker selection, and enable `--e2e-debug` flag for pytest invocation; removed warning echo.
Dependency Management `pyproject.toml`	Changed `eval-hub-server` from VCS-based git reference to plain package name; added custom TestPyPI index configuration with explicit flag and source mapping for `eval-hub-server`.
Test Infrastructure `tests/conftest.py`, `tests/e2e/conftest.py`	Added `--e2e-debug` CLI option with conditional DEBUG-level logging setup; replaced print statements with logger calls in E2E fixture; removed `_run_server()` function; simplified binary path validation; changed temp directory from context manager to explicit mkdtemp with try/finally for debugging preservation.

Sequence Diagram(s)

sequenceDiagram
    participant Workflow as Workflow
    participant OldSource as Git Repository
    participant NewSource as TestPyPI
    participant Test as E2E Test

    rect rgba(100, 100, 150, 0.5)
    Note over Workflow,Test: Previous Flow (Removed)
    Workflow->>Workflow: Matrix includes latest_eval_hub_server
    Workflow->>OldSource: Fetch eval-hub-server from Git subdirectory
    OldSource-->>Workflow: eval-hub-server binary
    Workflow->>Test: Run E2E tests with fetched binary
    end

    rect rgba(100, 150, 100, 0.5)
    Note over Workflow,Test: New Flow (Current)
    Workflow->>NewSource: Resolve eval-hub-server from TestPyPI
    NewSource-->>Workflow: eval-hub-server package
    Workflow->>Test: Run E2E tests with --e2e-debug logging
    Test->>Test: Preserve temp directory for debugging
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

ci(gha): optional E2E with latest eval-hub-server #44: Directly reverses the matrix-driven workflow behavior that fetches eval-hub-server from git, replacing it with TestPyPI distribution as the primary mechanism.

Suggested reviewers

ppadashe-psp
ruivieira
gnaulak-redhat
julpayne

Poem

🐰 The matrix once multiplied with care,
Fetching binaries from branches fair,
But TestPyPI proved far more true,
With debug logs painted fresh and new!
Temp dirs now dance in gentle keep,
While tests their precious secrets sleep. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly references the main change: adopting TestPyPI for the test-e2e workflow, which is the core purpose of this refactoring.
Description check	✅ Passed	The description provides context (TestPyPI usage for test-e2e), correctly marks the PR type (refactor/chore and test/ci), confirms testing was performed, and explicitly states no breaking changes—meeting template requirements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/e2e/conftest.py (1)

169-176: ⚠️ Potential issue | 🟠 Major

Potential UnboundLocalError if server startup fails early.

If an exception occurs before server_process is assigned (e.g., during shutil.copytree at line 104 or any earlier operation), the finally block will raise UnboundLocalError: local variable 'server_process' referenced before assignment.

🐛 Proposed fix: guard the cleanup with existence check

     finally:
         # Cleanup: terminate the server subprocess
-        try:
-            server_process.terminate()
-            server_process.wait(timeout=5)
-        except subprocess.TimeoutExpired:
-            server_process.kill()
-            server_process.wait()
+        if "server_process" in dir():
+            try:
+                server_process.terminate()
+                server_process.wait(timeout=5)
+            except subprocess.TimeoutExpired:
+                server_process.kill()
+                server_process.wait()

Alternatively, initialize server_process = None before the try block:

     tmpdir = tempfile.mkdtemp(prefix="evalhub-e2e-")
+    server_process = None
     try:
         ...
     finally:
         # Cleanup: terminate the server subprocess
-        try:
-            server_process.terminate()
+        if server_process is not None:
+            try:
+                server_process.terminate()
+                server_process.wait(timeout=5)
+            except subprocess.TimeoutExpired:
+                server_process.kill()
+                server_process.wait()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/conftest.py` around lines 169 - 176, The finally block can
reference server_process before assignment causing UnboundLocalError; initialize
server_process = None prior to the try that starts the server (or alternatively
guard the cleanup) and then in the finally only call
server_process.terminate()/wait()/kill() if server_process is not None (or if
'server_process' in locals()); update the variables around the server startup
code where server_process is created and the cleanup code that calls
server_process.terminate(), server_process.wait(), and server_process.kill()
accordingly.

🧹 Nitpick comments (1)

tests/e2e/conftest.py (1)

98-100: Temp directory intentionally preserved for debugging — consider documenting cleanup.

The switch from TemporaryDirectory (auto-cleanup) to mkdtemp (manual) preserves the directory after test runs for debugging server logs. This is useful but will accumulate directories over time.

Consider adding a comment or log message indicating where the directory is and that it won't be auto-cleaned, or add a conditional cleanup based on test success.

     # Create temporary directory for server files (preserved after run for debugging of server logfiles, etc)
     tmpdir = tempfile.mkdtemp(prefix="evalhub-e2e-")
+    # NOTE: This directory is intentionally NOT cleaned up to allow post-run debugging.
+    # Clean up manually or run: find /tmp -name 'evalhub-e2e-*' -type d -mtime +1 -exec rm -rf {} +
     try:
         logger.debug(f"\nTemp directory for this run: {tmpdir}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/conftest.py` around lines 98 - 100, The tmpdir is created with
tempfile.mkdtemp (variable tmpdir) and left on disk, which can accumulate;
update conftest.py to either add a clear inline comment near the
tempfile.mkdtemp call explaining the intentional preservation and its path
(tmpdir) or emit a log message (e.g., via pytest's logging or print) that shows
the tmpdir path and that it is not auto-cleaned, and optionally implement
conditional cleanup after the test run (based on a success flag or an
environment variable) that removes tmpdir when desired; reference the
tempfile.mkdtemp call and the tmpdir variable when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Makefile`:
- Around line 10-12: The test selection now uses pytest marker filtering so the
missing `@pytest.mark.e2e` decorators are silently excluding tests; add
`@pytest.mark.e2e` to the TestOCIArtifactPersistenceE2E class in
tests/e2e/test_persist_with_mock_persister.py (so all 3 methods inherit the
marker), and add `@pytest.mark.e2e` to each of the five standalone test functions
in tests/e2e/test_smoke_test_eval_hub_server.py and the four standalone test
functions in tests/e2e/test_eval_hub_server_with_real_config.py so they are
included when running make test-e2e.

---

Outside diff comments:
In `@tests/e2e/conftest.py`:
- Around line 169-176: The finally block can reference server_process before
assignment causing UnboundLocalError; initialize server_process = None prior to
the try that starts the server (or alternatively guard the cleanup) and then in
the finally only call server_process.terminate()/wait()/kill() if server_process
is not None (or if 'server_process' in locals()); update the variables around
the server startup code where server_process is created and the cleanup code
that calls server_process.terminate(), server_process.wait(), and
server_process.kill() accordingly.

---

Nitpick comments:
In `@tests/e2e/conftest.py`:
- Around line 98-100: The tmpdir is created with tempfile.mkdtemp (variable
tmpdir) and left on disk, which can accumulate; update conftest.py to either add
a clear inline comment near the tempfile.mkdtemp call explaining the intentional
preservation and its path (tmpdir) or emit a log message (e.g., via pytest's
logging or print) that shows the tmpdir path and that it is not auto-cleaned,
and optionally implement conditional cleanup after the test run (based on a
success flag or an environment variable) that removes tmpdir when desired;
reference the tempfile.mkdtemp call and the tmpdir variable when making the
change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 768188fa-304a-4a8a-bb6e-752be4b6bc46

📥 Commits

Reviewing files that changed from the base of the PR and between ba08ff0 and 6f217e0.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

.github/workflows/test.yml
Makefile
pyproject.toml
tests/conftest.py
tests/e2e/conftest.py

Makefile

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: tarilabs <matteo.mortari@gmail.com>

tests/conftest.py

gnaulak-redhat

~~added a comment, but not an objection to PR merge~~ LGTM

tarilabs and others added 2 commits March 31, 2026 11:55

ci: test-e2e reworked to use TestPypi

52dc7f2

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: tarilabs <matteo.mortari@gmail.com>

chore: linting

6f217e0

Signed-off-by: tarilabs <matteo.mortari@gmail.com>

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

Makefile Outdated Show resolved Hide resolved

tarilabs and others added 2 commits March 31, 2026 13:12

chore: impl code review feedback

f0638a0

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: tarilabs <matteo.mortari@gmail.com>

chore: impl review feedback

76c50d9

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: tarilabs <matteo.mortari@gmail.com>

tarilabs requested review from gnaulak-redhat, julpayne, mariusdanciu, nbs-rh, ppadashe-psp, ruivieira and scheruku-rh March 31, 2026 11:23

gnaulak-redhat reviewed Mar 31, 2026

View reviewed changes

tests/conftest.py Show resolved Hide resolved

gnaulak-redhat approved these changes Mar 31, 2026

View reviewed changes

tarilabs merged commit 3bc6c62 into eval-hub:main Mar 31, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: test-e2e reworked to use TestPypi#105

ci: test-e2e reworked to use TestPypi#105
tarilabs merged 4 commits intoeval-hub:mainfrom
tarilabs:tarilabs-20260331-e2e

tarilabs commented Mar 31, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 31, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

gnaulak-redhat left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tarilabs commented Mar 31, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What and why

Type

Testing

Breaking changes

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gnaulak-redhat left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tarilabs commented Mar 31, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 31, 2026 •

edited

Loading

gnaulak-redhat left a comment •

edited

Loading