Skip to content

build-template: add codeserver build optimizations and monitoring#3039

Merged
ysok merged 1 commit intoopendatahub-io:mainfrom
ysok-red-hat-data-services:odh-template-codeserver-debug
Feb 27, 2026
Merged

build-template: add codeserver build optimizations and monitoring#3039
ysok merged 1 commit intoopendatahub-io:mainfrom
ysok-red-hat-data-services:odh-template-codeserver-debug

Conversation

@ysok
Copy link
Contributor

@ysok ysok commented Feb 27, 2026

Summary

Codeserver hermetic builds on GHA runners are failing silently (runner loses communication) due to OOM / disk exhaustion. This PR adds:

  • Free disk space for codeserver targets — same treatment as rocm, cuda, pytorch, tensorflow
  • --layers=false to reduce peak disk usage — codeserver compiles from source every run so there's no layer cache to benefit from
  • --build-arg GHA_BUILD=true to reduce VS Code build parallelism — GHA runners only have 16GB RAM
  • Build monitoring with timestamps + free -h — so future OOM/disk-full failures leave a clear breadcrumb trail in CI logs instead of the uninformative "runner lost communication" message

These changes are needed on main so that pull_request_target workflows pick them up.

Context

Test plan

Made with Cursor

Summary by CodeRabbit

Release Notes

  • Chores
    • Optimized codeserver target builds with adjusted caching configuration
    • Enhanced build process monitoring with timestamped progress tracking and memory usage reporting for improved visibility into build operations

- Free disk space for codeserver targets (same as rocm/cuda/pytorch)
- Use --layers=false to halve peak disk use (no layer cache to reuse)
- Pass GHA_BUILD=true for reduced VS Code build parallelism on 16GB runners
- Add timestamps + free -h to build monitoring loop for OOM/disk debugging

Made-with: Cursor
@openshift-ci openshift-ci bot requested review from atheo89 and jiridanek February 27, 2026 20:37
@github-actions github-actions bot added the review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel label Feb 27, 2026
@openshift-ci openshift-ci bot added the size/s label Feb 27, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 27, 2026

📝 Walkthrough

Walkthrough

Modified the GitHub Actions notebook build workflow to conditionally append layer caching disablement and a GHA build argument when codeserver targets are included, and enhanced the disk usage loop to include timestamped progress reporting and memory usage metrics.

Changes

Cohort / File(s) Summary
Workflow Configuration
.github/workflows/build-notebooks-TEMPLATE.yaml
Added conditional codeserver build logic that appends --layers=false and GHA_BUILD=true build-arg. Enhanced disk usage monitoring loop with timestamped progress output and memory usage reporting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: adding codeserver build optimizations and enhanced monitoring to the build template workflow.
Description check ✅ Passed The description comprehensively covers the rationale, specific changes, context, and test plan, meeting all essential requirements of the template despite some merge criteria checkboxes being unchecked.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added size/s and removed size/s labels Feb 27, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.github/workflows/build-notebooks-TEMPLATE.yaml (1)

313-320: Consider trapping and killing the background monitoring process.

The background monitoring loop is useful for debugging OOM/disk issues, but it continues running after make completes. While GitHub Actions typically cleans up orphaned processes at step boundaries, explicitly capturing the PID and killing it ensures deterministic cleanup and prevents potential interference with subsequent steps.

♻️ Optional: Add explicit cleanup
          # Print disk and memory stats every 30s so OOM/disk-full failures
          # leave a breadcrumb trail in the logs.
          (while true; do
            echo "=== $(date -u '+%H:%M:%S') ==="
            df -h | grep "${HOME}/.local/share/containers"
            free -h
            sleep 30
-         done) &
+         done) &
+         MONITOR_PID=$!
+         trap "kill $MONITOR_PID 2>/dev/null || true" EXIT

          make ${{ inputs.target }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/build-notebooks-TEMPLATE.yaml around lines 313 - 320, The
background monitoring subshell started at the end of the step should capture its
PID and ensure it is killed when the step ends; modify the subshell invocation
that starts the loop (the "(while true; do ... done) &" block) to save the PID
(e.g. pid=$!) and add a trap/cleanup that kills $pid on EXIT (or explicitly kill
$pid after make completes) so the monitoring process is deterministically
cleaned up.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/build-notebooks-TEMPLATE.yaml:
- Around line 313-320: The background monitoring subshell started at the end of
the step should capture its PID and ensure it is killed when the step ends;
modify the subshell invocation that starts the loop (the "(while true; do ...
done) &" block) to save the PID (e.g. pid=$!) and add a trap/cleanup that kills
$pid on EXIT (or explicitly kill $pid after make completes) so the monitoring
process is deterministically cleaned up.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 137aa9b and fc15cbc.

📒 Files selected for processing (1)
  • .github/workflows/build-notebooks-TEMPLATE.yaml

@daniellutz daniellutz self-requested a review February 27, 2026 20:43
@daniellutz
Copy link
Contributor

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ide-developer
Once this PR has been reviewed and has the lgtm label, please ask for approval from daniellutz. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ysok ysok merged commit ae2bb4c into opendatahub-io:main Feb 27, 2026
14 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel size/s

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants