Skip to content

ci: test-e2e reworked to use TestPypi#105

Merged
tarilabs merged 4 commits intoeval-hub:mainfrom
tarilabs:tarilabs-20260331-e2e
Mar 31, 2026
Merged

ci: test-e2e reworked to use TestPypi#105
tarilabs merged 4 commits intoeval-hub:mainfrom
tarilabs:tarilabs-20260331-e2e

Conversation

@tarilabs
Copy link
Copy Markdown
Member

@tarilabs tarilabs commented Mar 31, 2026

What and why

Refactor and simplify test-e2e using the Dev release of eval-hub-server from TestPypi

Type

  • feat
  • fix
  • docs
  • refactor / chore
  • test / ci

Testing

  • Tests added or updated
  • Tested manually

Breaking changes

none

Summary by CodeRabbit

Release Notes

  • Tests

    • Added debug logging option for end-to-end tests to improve troubleshooting
    • Enhanced test execution with improved marker-based test selection and debugging capabilities
    • Improved temporary directory preservation for easier failure investigation
  • Chores

    • Updated eval-hub-server dependency sourcing for better accessibility
    • Streamlined test infrastructure and automated workflow configuration

tarilabs and others added 2 commits March 31, 2026 11:55
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: tarilabs <matteo.mortari@gmail.com>
Signed-off-by: tarilabs <matteo.mortari@gmail.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

Warning

Rate limit exceeded

@tarilabs has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 13 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 12 minutes and 13 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ef2d023e-c827-4586-b3a9-8076fb27517b

📥 Commits

Reviewing files that changed from the base of the PR and between 6f217e0 and 76c50d9.

📒 Files selected for processing (2)
  • Makefile
  • tests/e2e/conftest.py
📝 Walkthrough

Walkthrough

This PR removes the matrix-driven workflow step that fetches eval-hub-server from a git repository, replacing it with a TestPyPI-based distribution model. The test-e2e target is updated with improved debug logging capabilities, and the E2E fixture now explicitly manages temporary directories for debugging purposes.

Changes

Cohort / File(s) Summary
CI/CD Workflow
.github/workflows/test.yml
Removed matrix configuration that conditionally fetched eval-hub-server from external repository; deleted the latest_eval_hub_server matrix variable and its conditional fetch step.
Build Configuration
Makefile
Updated test-e2e target to remove --no-sync flag, add -m e2e marker selection, and enable --e2e-debug flag for pytest invocation; removed warning echo.
Dependency Management
pyproject.toml
Changed eval-hub-server from VCS-based git reference to plain package name; added custom TestPyPI index configuration with explicit flag and source mapping for eval-hub-server.
Test Infrastructure
tests/conftest.py, tests/e2e/conftest.py
Added --e2e-debug CLI option with conditional DEBUG-level logging setup; replaced print statements with logger calls in E2E fixture; removed _run_server() function; simplified binary path validation; changed temp directory from context manager to explicit mkdtemp with try/finally for debugging preservation.

Sequence Diagram(s)

sequenceDiagram
    participant Workflow as Workflow
    participant OldSource as Git Repository
    participant NewSource as TestPyPI
    participant Test as E2E Test

    rect rgba(100, 100, 150, 0.5)
    Note over Workflow,Test: Previous Flow (Removed)
    Workflow->>Workflow: Matrix includes latest_eval_hub_server
    Workflow->>OldSource: Fetch eval-hub-server from Git subdirectory
    OldSource-->>Workflow: eval-hub-server binary
    Workflow->>Test: Run E2E tests with fetched binary
    end

    rect rgba(100, 150, 100, 0.5)
    Note over Workflow,Test: New Flow (Current)
    Workflow->>NewSource: Resolve eval-hub-server from TestPyPI
    NewSource-->>Workflow: eval-hub-server package
    Workflow->>Test: Run E2E tests with --e2e-debug logging
    Test->>Test: Preserve temp directory for debugging
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • ppadashe-psp
  • ruivieira
  • gnaulak-redhat
  • julpayne

Poem

🐰 The matrix once multiplied with care,
Fetching binaries from branches fair,
But TestPyPI proved far more true,
With debug logs painted fresh and new!
Temp dirs now dance in gentle keep,
While tests their precious secrets sleep. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly references the main change: adopting TestPyPI for the test-e2e workflow, which is the core purpose of this refactoring.
Description check ✅ Passed The description provides context (TestPyPI usage for test-e2e), correctly marks the PR type (refactor/chore and test/ci), confirms testing was performed, and explicitly states no breaking changes—meeting template requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/e2e/conftest.py (1)

169-176: ⚠️ Potential issue | 🟠 Major

Potential UnboundLocalError if server startup fails early.

If an exception occurs before server_process is assigned (e.g., during shutil.copytree at line 104 or any earlier operation), the finally block will raise UnboundLocalError: local variable 'server_process' referenced before assignment.

🐛 Proposed fix: guard the cleanup with existence check
     finally:
         # Cleanup: terminate the server subprocess
-        try:
-            server_process.terminate()
-            server_process.wait(timeout=5)
-        except subprocess.TimeoutExpired:
-            server_process.kill()
-            server_process.wait()
+        if "server_process" in dir():
+            try:
+                server_process.terminate()
+                server_process.wait(timeout=5)
+            except subprocess.TimeoutExpired:
+                server_process.kill()
+                server_process.wait()

Alternatively, initialize server_process = None before the try block:

     tmpdir = tempfile.mkdtemp(prefix="evalhub-e2e-")
+    server_process = None
     try:
         ...
     finally:
         # Cleanup: terminate the server subprocess
-        try:
-            server_process.terminate()
+        if server_process is not None:
+            try:
+                server_process.terminate()
+                server_process.wait(timeout=5)
+            except subprocess.TimeoutExpired:
+                server_process.kill()
+                server_process.wait()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/conftest.py` around lines 169 - 176, The finally block can
reference server_process before assignment causing UnboundLocalError; initialize
server_process = None prior to the try that starts the server (or alternatively
guard the cleanup) and then in the finally only call
server_process.terminate()/wait()/kill() if server_process is not None (or if
'server_process' in locals()); update the variables around the server startup
code where server_process is created and the cleanup code that calls
server_process.terminate(), server_process.wait(), and server_process.kill()
accordingly.
🧹 Nitpick comments (1)
tests/e2e/conftest.py (1)

98-100: Temp directory intentionally preserved for debugging — consider documenting cleanup.

The switch from TemporaryDirectory (auto-cleanup) to mkdtemp (manual) preserves the directory after test runs for debugging server logs. This is useful but will accumulate directories over time.

Consider adding a comment or log message indicating where the directory is and that it won't be auto-cleaned, or add a conditional cleanup based on test success.

     # Create temporary directory for server files (preserved after run for debugging of server logfiles, etc)
     tmpdir = tempfile.mkdtemp(prefix="evalhub-e2e-")
+    # NOTE: This directory is intentionally NOT cleaned up to allow post-run debugging.
+    # Clean up manually or run: find /tmp -name 'evalhub-e2e-*' -type d -mtime +1 -exec rm -rf {} +
     try:
         logger.debug(f"\nTemp directory for this run: {tmpdir}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/conftest.py` around lines 98 - 100, The tmpdir is created with
tempfile.mkdtemp (variable tmpdir) and left on disk, which can accumulate;
update conftest.py to either add a clear inline comment near the
tempfile.mkdtemp call explaining the intentional preservation and its path
(tmpdir) or emit a log message (e.g., via pytest's logging or print) that shows
the tmpdir path and that it is not auto-cleaned, and optionally implement
conditional cleanup after the test run (based on a success flag or an
environment variable) that removes tmpdir when desired; reference the
tempfile.mkdtemp call and the tmpdir variable when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Makefile`:
- Around line 10-12: The test selection now uses pytest marker filtering so the
missing `@pytest.mark.e2e` decorators are silently excluding tests; add
`@pytest.mark.e2e` to the TestOCIArtifactPersistenceE2E class in
tests/e2e/test_persist_with_mock_persister.py (so all 3 methods inherit the
marker), and add `@pytest.mark.e2e` to each of the five standalone test functions
in tests/e2e/test_smoke_test_eval_hub_server.py and the four standalone test
functions in tests/e2e/test_eval_hub_server_with_real_config.py so they are
included when running make test-e2e.

---

Outside diff comments:
In `@tests/e2e/conftest.py`:
- Around line 169-176: The finally block can reference server_process before
assignment causing UnboundLocalError; initialize server_process = None prior to
the try that starts the server (or alternatively guard the cleanup) and then in
the finally only call server_process.terminate()/wait()/kill() if server_process
is not None (or if 'server_process' in locals()); update the variables around
the server startup code where server_process is created and the cleanup code
that calls server_process.terminate(), server_process.wait(), and
server_process.kill() accordingly.

---

Nitpick comments:
In `@tests/e2e/conftest.py`:
- Around line 98-100: The tmpdir is created with tempfile.mkdtemp (variable
tmpdir) and left on disk, which can accumulate; update conftest.py to either add
a clear inline comment near the tempfile.mkdtemp call explaining the intentional
preservation and its path (tmpdir) or emit a log message (e.g., via pytest's
logging or print) that shows the tmpdir path and that it is not auto-cleaned,
and optionally implement conditional cleanup after the test run (based on a
success flag or an environment variable) that removes tmpdir when desired;
reference the tempfile.mkdtemp call and the tmpdir variable when making the
change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 768188fa-304a-4a8a-bb6e-752be4b6bc46

📥 Commits

Reviewing files that changed from the base of the PR and between ba08ff0 and 6f217e0.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • .github/workflows/test.yml
  • Makefile
  • pyproject.toml
  • tests/conftest.py
  • tests/e2e/conftest.py

tarilabs and others added 2 commits March 31, 2026 13:12
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: tarilabs <matteo.mortari@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: tarilabs <matteo.mortari@gmail.com>
Copy link
Copy Markdown
Collaborator

@gnaulak-redhat gnaulak-redhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment, but not an objection to PR merge LGTM

@tarilabs tarilabs merged commit 3bc6c62 into eval-hub:main Mar 31, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants