build-template: add codeserver build optimizations and monitoring#3039
Conversation
- Free disk space for codeserver targets (same as rocm/cuda/pytorch) - Use --layers=false to halve peak disk use (no layer cache to reuse) - Pass GHA_BUILD=true for reduced VS Code build parallelism on 16GB runners - Add timestamps + free -h to build monitoring loop for OOM/disk debugging Made-with: Cursor
📝 WalkthroughWalkthroughModified the GitHub Actions notebook build workflow to conditionally append layer caching disablement and a GHA build argument when codeserver targets are included, and enhanced the disk usage loop to include timestamped progress reporting and memory usage metrics. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/build-notebooks-TEMPLATE.yaml (1)
313-320: Consider trapping and killing the background monitoring process.The background monitoring loop is useful for debugging OOM/disk issues, but it continues running after
makecompletes. While GitHub Actions typically cleans up orphaned processes at step boundaries, explicitly capturing the PID and killing it ensures deterministic cleanup and prevents potential interference with subsequent steps.♻️ Optional: Add explicit cleanup
# Print disk and memory stats every 30s so OOM/disk-full failures # leave a breadcrumb trail in the logs. (while true; do echo "=== $(date -u '+%H:%M:%S') ===" df -h | grep "${HOME}/.local/share/containers" free -h sleep 30 - done) & + done) & + MONITOR_PID=$! + trap "kill $MONITOR_PID 2>/dev/null || true" EXIT make ${{ inputs.target }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/build-notebooks-TEMPLATE.yaml around lines 313 - 320, The background monitoring subshell started at the end of the step should capture its PID and ensure it is killed when the step ends; modify the subshell invocation that starts the loop (the "(while true; do ... done) &" block) to save the PID (e.g. pid=$!) and add a trap/cleanup that kills $pid on EXIT (or explicitly kill $pid after make completes) so the monitoring process is deterministically cleaned up.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.github/workflows/build-notebooks-TEMPLATE.yaml:
- Around line 313-320: The background monitoring subshell started at the end of
the step should capture its PID and ensure it is killed when the step ends;
modify the subshell invocation that starts the loop (the "(while true; do ...
done) &" block) to save the PID (e.g. pid=$!) and add a trap/cleanup that kills
$pid on EXIT (or explicitly kill $pid after make completes) so the monitoring
process is deterministically cleaned up.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
.github/workflows/build-notebooks-TEMPLATE.yaml
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ide-developer The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary
Codeserver hermetic builds on GHA runners are failing silently (runner loses communication) due to OOM / disk exhaustion. This PR adds:
codeservertargets — same treatment asrocm,cuda,pytorch,tensorflow--layers=falseto reduce peak disk usage — codeserver compiles from source every run so there's no layer cache to benefit from--build-arg GHA_BUILD=trueto reduce VS Code build parallelism — GHA runners only have 16GB RAMfree -h— so future OOM/disk-full failures leave a clear breadcrumb trail in CI logs instead of the uninformative "runner lost communication" messageThese changes are needed on
mainso thatpull_request_targetworkflows pick them up.Context
Test plan
Made with Cursor
Summary by CodeRabbit
Release Notes