-
Notifications
You must be signed in to change notification settings - Fork 113
Issue #2404: chore(ci): reduce swap size in gha_lvm_overlay.sh #2529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughChanged Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related issues
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
ci/cached-builds/gha_lvm_overlay.sh (2)
14-15
: Make swap size configurable; 512MiB may be too low for source builds512MiB risks OOM during codeserver/source compiles. Keep the lower default but allow easy override via env. Consider later aligning with Issue #2398 for dynamic sizing (≥1024MiB, ~20% VG free).
Apply this minimal change:
-# https://github.com/opendatahub-io/notebooks/issues/2404 -swap_size_mb=512 +# https://github.com/opendatahub-io/notebooks/issues/2404 +# Allow override; default intentionally small. Adjust via SWAP_SIZE_MB when needed. +swap_size_mb="${SWAP_SIZE_MB:-512}"Please verify in CI that memory‑heavy jobs (e.g., codeserver builds) don’t hit OOM with the new default. If they do, bump via SWAP_SIZE_MB or implement the dynamic approach from Issue #2398.
58-59
: Harden swap creation: force mkswap and lower swappiness (per Issue #2398)Prevents interactive prompts on stale signatures and reduces aggressive swapping on CI.
Apply:
-sudo mkswap "/dev/mapper/${VG_NAME}-swap" -sudo swapon "/dev/mapper/${VG_NAME}-swap" +sudo mkswap -f "/dev/mapper/${VG_NAME}-swap" +sudo swapon "/dev/mapper/${VG_NAME}-swap" +# Reduce swap aggressiveness on CI +sudo sysctl -w vm.swappiness=10 || true
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ci/cached-builds/gha_lvm_overlay.sh
(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-09-10T21:21:02.884Z
Learning: jiridanek requested GitHub issue creation for swap creation resilience improvement in ci/cached-builds/gha_lvm_overlay.sh during PR #2356 review. Issue #2398 was created addressing lvcreate failures on small runners by implementing dynamic swap sizing (20% of VG free space, minimum 1024MiB), using mkswap -f flag to prevent signature prompts, and reducing vm.swappiness to 10 for less aggressive swapping, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-09-10T21:21:02.884Z
Learning: jiridanek requested GitHub issue creation for swap creation resilience improvement in ci/cached-builds/gha_lvm_overlay.sh during PR #2356 review. The issue addresses lvcreate failures on small runners by implementing dynamic swap sizing (20% of VG free space, minimum 1024MiB), using mkswap -f flag to prevent signature prompts, and reducing vm.swappiness to 10 for less aggressive swapping, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-22T07:30:12.760Z
Learning: jiridanek requested GitHub issue creation for GitHub Actions LVM setup failure during PR #1425 review, specifically addressing ext4 signature detection causing lvcreate to fail with interactive prompts. Issue #1434 was successfully created with comprehensive problem description covering stale filesystem signatures, interactive prompt failures, runner cleanup issues, disk space pressure, detailed technical analysis, multiple solution options (immediate signature wiping, robust pre-cleanup logic, enhanced runner cleanup, alternative volume strategies), clear acceptance criteria, implementation guidance, risk mitigation strategies, and proper context linking. This infrastructure issue is related to but distinct from existing Issue #1196 about fallocate failures in the same gha_lvm_overlay.sh script, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-08-05T17:24:08.616Z
Learning: jiridanek requested PR review for #1521 covering s390x architecture support improvements, demonstrating continued focus on systematic multi-architecture compatibility enhancements in the opendatahub-io/notebooks repository through clean implementation with centralized configuration, proper CI integration, and architecture-aware testing patterns.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2215
File: runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu:0-0
Timestamp: 2025-09-05T11:27:31.040Z
Learning: jiridanek requested GitHub issue creation for build toolchain optimization in datascience runtime during PR #2215 review. Issue #2308 was created addressing unnecessary build dependencies (gcc-toolset-13, cmake, ninja-build, rust, cargo) in final runtime image for ppc64le architecture, covering comprehensive problem analysis with specific line numbers, multiple solution options for builder-only toolchains, clear acceptance criteria for size reduction and security improvement, detailed implementation guidance for package segregation, and proper context linking to PR #2215 review comment, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2265
File: .tekton/odh-workbench-jupyter-pytorch-llmcompressor-cuda-py312-ubi9-pull-request.yaml:16-16
Timestamp: 2025-09-05T12:10:28.916Z
Learning: jiridanek requested GitHub issue creation for trigger path cleanup in pytorch+llmcompressor pipeline during PR #2265 review. Issue #2310 was successfully created addressing copy-paste errors where irrelevant Minimal/DataScience trigger paths were included in the pytorch+llmcompressor pipeline on-cel-expression, causing unnecessary pipeline triggers. The issue includes comprehensive problem description covering specific irrelevant paths, detailed solution with before/after YAML code examples, clear acceptance criteria for implementation and testing, repository-wide scope consideration for similar issues, and proper context linking to PR #2265 review comment, assigned to jiridanek.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-09-10T21:24:07.914Z
Learning: jiridanek requested GitHub issue creation for Docker chown optimization in codeserver/ubi9-python-3.12/Dockerfile.cpu during PR #2356 review. Issue #2403 was created addressing performance impact of broad recursive chown on entire /opt/app-root directory (line 235), proposing three solution approaches: scoped chown targeting specific changed paths, root cause fix during file creation, and test modification for permission validation, with detailed benefits analysis covering layer size reduction and build time optimization, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: jupyter/minimal/ubi9-python-3.12/Dockerfile.cpu:70-70
Timestamp: 2025-08-27T15:00:11.931Z
Learning: jiridanek requested GitHub issue creation for uv pip sync optimization during PR #2145 review. Issue #2150 was created addressing the systematic replacement of `uv pip install --requirements=` with `uv pip sync` across 34+ Dockerfiles migrated from requirements.txt to pylock.toml. The issue includes comprehensive problem description covering drift risk and redundant flags, detailed solution with benefits analysis, clear acceptance criteria for all affected files, and proper context linking to PR #2145 review comment, assigned to jiridanek.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2217
File: .github/workflows/security.yaml:22-30
Timestamp: 2025-08-29T15:01:07.674Z
Learning: jiridanek created PR #2223 to systematically address uv setup configuration improvements (version pinning, caching, environment activation) during PR #2217 review, continuing the established pattern of implementing systematic improvements through dedicated PRs rather than applying individual point fixes during reviews.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2185
File: jupyter/pytorch/ubi9-python-3.12/Dockerfile.cuda:109-117
Timestamp: 2025-08-29T08:48:55.985Z
Learning: jiridanek prefers to implement systematic cleanup improvements through dedicated GitHub issues (like #2076) rather than applying individual point fixes during PR reviews, maintaining consistency with their established pattern of comprehensive code quality improvements.
📚 Learning: 2025-09-10T21:21:02.884Z
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-09-10T21:21:02.884Z
Learning: jiridanek requested GitHub issue creation for swap creation resilience improvement in ci/cached-builds/gha_lvm_overlay.sh during PR #2356 review. Issue #2398 was created addressing lvcreate failures on small runners by implementing dynamic swap sizing (20% of VG free space, minimum 1024MiB), using mkswap -f flag to prevent signature prompts, and reducing vm.swappiness to 10 for less aggressive swapping, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Applied to files:
ci/cached-builds/gha_lvm_overlay.sh
📚 Learning: 2025-09-10T21:21:02.884Z
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-09-10T21:21:02.884Z
Learning: jiridanek requested GitHub issue creation for swap creation resilience improvement in ci/cached-builds/gha_lvm_overlay.sh during PR #2356 review. The issue addresses lvcreate failures on small runners by implementing dynamic swap sizing (20% of VG free space, minimum 1024MiB), using mkswap -f flag to prevent signature prompts, and reducing vm.swappiness to 10 for less aggressive swapping, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Applied to files:
ci/cached-builds/gha_lvm_overlay.sh
📚 Learning: 2025-07-22T07:30:12.760Z
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-22T07:30:12.760Z
Learning: jiridanek requested GitHub issue creation for GitHub Actions LVM setup failure during PR #1425 review, specifically addressing ext4 signature detection causing lvcreate to fail with interactive prompts. Issue #1434 was successfully created with comprehensive problem description covering stale filesystem signatures, interactive prompt failures, runner cleanup issues, disk space pressure, detailed technical analysis, multiple solution options (immediate signature wiping, robust pre-cleanup logic, enhanced runner cleanup, alternative volume strategies), clear acceptance criteria, implementation guidance, risk mitigation strategies, and proper context linking. This infrastructure issue is related to but distinct from existing Issue #1196 about fallocate failures in the same gha_lvm_overlay.sh script, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Applied to files:
ci/cached-builds/gha_lvm_overlay.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: code-static-analysis
- GitHub Check: Trivy scan (fs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ide-developer The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
New changes are detected. LGTM label has been removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
ci/cached-builds/gha_lvm_overlay.sh (1)
14-16
: Prefer 8192MiB (power-of-two) and add mkswap -f; consider dynamic sizing8196 looks accidental — change to 8192. Use
mkswap -f
to avoid interactive signature prompts. Optional follow-up: derive swap size from VG free (≈20% with 1024MiB min) and setvm.swappiness=10
(track under #2398).Apply within this hunk:
-# 16GB is wasteful, 0.5GB is not enough, arm64 froze with 2GB and 4GB -# https://github.com/opendatahub-io/notebooks/issues/2404 -swap_size_mb=8196 +# 16GiB is wasteful; 0.5GiB insufficient; arm64 froze with 2GiB and 4GiB +# https://github.com/opendatahub-io/notebooks/issues/2404 +swap_size_mb=8192Outside this hunk:
- sudo mkswap "/dev/mapper/${VG_NAME}-swap" + sudo mkswap -f "/dev/mapper/${VG_NAME}-swap"
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ci/cached-builds/gha_lvm_overlay.sh
(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-09-10T21:21:02.884Z
Learning: jiridanek requested GitHub issue creation for swap creation resilience improvement in ci/cached-builds/gha_lvm_overlay.sh during PR #2356 review. Issue #2398 was created addressing lvcreate failures on small runners by implementing dynamic swap sizing (20% of VG free space, minimum 1024MiB), using mkswap -f flag to prevent signature prompts, and reducing vm.swappiness to 10 for less aggressive swapping, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-09-10T21:21:02.884Z
Learning: jiridanek requested GitHub issue creation for swap creation resilience improvement in ci/cached-builds/gha_lvm_overlay.sh during PR #2356 review. The issue addresses lvcreate failures on small runners by implementing dynamic swap sizing (20% of VG free space, minimum 1024MiB), using mkswap -f flag to prevent signature prompts, and reducing vm.swappiness to 10 for less aggressive swapping, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-08-05T17:24:08.616Z
Learning: jiridanek requested PR review for #1521 covering s390x architecture support improvements, demonstrating continued focus on systematic multi-architecture compatibility enhancements in the opendatahub-io/notebooks repository through clean implementation with centralized configuration, proper CI integration, and architecture-aware testing patterns.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-22T07:30:12.760Z
Learning: jiridanek requested GitHub issue creation for GitHub Actions LVM setup failure during PR #1425 review, specifically addressing ext4 signature detection causing lvcreate to fail with interactive prompts. Issue #1434 was successfully created with comprehensive problem description covering stale filesystem signatures, interactive prompt failures, runner cleanup issues, disk space pressure, detailed technical analysis, multiple solution options (immediate signature wiping, robust pre-cleanup logic, enhanced runner cleanup, alternative volume strategies), clear acceptance criteria, implementation guidance, risk mitigation strategies, and proper context linking. This infrastructure issue is related to but distinct from existing Issue #1196 about fallocate failures in the same gha_lvm_overlay.sh script, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2215
File: runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu:0-0
Timestamp: 2025-09-05T11:27:31.040Z
Learning: jiridanek requested GitHub issue creation for build toolchain optimization in datascience runtime during PR #2215 review. Issue #2308 was created addressing unnecessary build dependencies (gcc-toolset-13, cmake, ninja-build, rust, cargo) in final runtime image for ppc64le architecture, covering comprehensive problem analysis with specific line numbers, multiple solution options for builder-only toolchains, clear acceptance criteria for size reduction and security improvement, detailed implementation guidance for package segregation, and proper context linking to PR #2215 review comment, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1259
File: jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/service.yaml:5-15
Timestamp: 2025-07-02T18:59:15.788Z
Learning: jiridanek creates targeted GitHub issues for specific test quality improvements identified during PR reviews in opendatahub-io/notebooks. Issue #1268 demonstrates this by converting a review comment about insufficient tf2onnx conversion test validation into a comprehensive improvement plan with clear acceptance criteria, code examples, and ROCm-specific context.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1909
File: runtimes/pytorch+llmcompressor/ubi9-python-3.11/Dockerfile.cuda:11-15
Timestamp: 2025-08-12T08:40:55.286Z
Learning: jiridanek requested GitHub issue creation for redundant CUDA upgrade optimization during PR #1909 review. Analysis revealed all 14 CUDA Dockerfiles contain redundant `yum upgrade -y` commands in cuda-base stages that execute after base stages already performed comprehensive `dnf upgrade` via pre-upgrade blocks, causing unnecessary CI latency and build inefficiency. Issue includes complete scope analysis with specific line numbers, investigation framework requiring NVIDIA upstream documentation review, multiple solution options, comprehensive acceptance criteria covering systematic testing and performance measurement, and proper context linking to PR #1909 review comment.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-09-10T21:24:07.914Z
Learning: jiridanek requested GitHub issue creation for Docker chown optimization in codeserver/ubi9-python-3.12/Dockerfile.cpu during PR #2356 review. Issue #2403 was created addressing performance impact of broad recursive chown on entire /opt/app-root directory (line 235), proposing three solution approaches: scoped chown targeting specific changed paths, root cause fix during file creation, and test modification for permission validation, with detailed benefits analysis covering layer size reduction and build time optimization, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2215
File: runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu:0-0
Timestamp: 2025-09-05T12:10:50.856Z
Learning: jiridanek requested GitHub issue creation for Dockerfile environment variable refactoring during PR #2215 review. Issue #2311 was created addressing build-only variables (OPENBLAS_VERSION, ONNX_VERSION, GRPC_PYTHON_BUILD_SYSTEM_OPENSSL) being unnecessarily written to /etc/profile.d/ppc64le.sh in runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu, causing variable duplication across stages, unreliable sourcing in non-login build contexts, and violation of DRY principles. The issue includes comprehensive problem description covering affected lines 30-37, detailed impact analysis of build reliability and maintenance overhead, three solution options with centralized ARG/ENV approach as recommended, clear acceptance criteria for version centralization and build-only variable cleanup, and specific implementation guidance with code examples, assigned to jiridanek, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2265
File: .tekton/odh-workbench-jupyter-pytorch-llmcompressor-cuda-py312-ubi9-pull-request.yaml:16-16
Timestamp: 2025-09-05T12:10:28.916Z
Learning: jiridanek requested GitHub issue creation for trigger path cleanup in pytorch+llmcompressor pipeline during PR #2265 review. Issue #2310 was successfully created addressing copy-paste errors where irrelevant Minimal/DataScience trigger paths were included in the pytorch+llmcompressor pipeline on-cel-expression, causing unnecessary pipeline triggers. The issue includes comprehensive problem description covering specific irrelevant paths, detailed solution with before/after YAML code examples, clear acceptance criteria for implementation and testing, repository-wide scope consideration for similar issues, and proper context linking to PR #2265 review comment, assigned to jiridanek.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2185
File: jupyter/pytorch/ubi9-python-3.12/Dockerfile.cuda:109-117
Timestamp: 2025-08-29T08:48:55.985Z
Learning: jiridanek prefers to implement systematic cleanup improvements through dedicated GitHub issues (like #2076) rather than applying individual point fixes during PR reviews, maintaining consistency with their established pattern of comprehensive code quality improvements.
📚 Learning: 2025-09-10T21:21:02.884Z
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-09-10T21:21:02.884Z
Learning: jiridanek requested GitHub issue creation for swap creation resilience improvement in ci/cached-builds/gha_lvm_overlay.sh during PR #2356 review. The issue addresses lvcreate failures on small runners by implementing dynamic swap sizing (20% of VG free space, minimum 1024MiB), using mkswap -f flag to prevent signature prompts, and reducing vm.swappiness to 10 for less aggressive swapping, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Applied to files:
ci/cached-builds/gha_lvm_overlay.sh
📚 Learning: 2025-09-10T21:21:02.884Z
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-09-10T21:21:02.884Z
Learning: jiridanek requested GitHub issue creation for swap creation resilience improvement in ci/cached-builds/gha_lvm_overlay.sh during PR #2356 review. Issue #2398 was created addressing lvcreate failures on small runners by implementing dynamic swap sizing (20% of VG free space, minimum 1024MiB), using mkswap -f flag to prevent signature prompts, and reducing vm.swappiness to 10 for less aggressive swapping, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Applied to files:
ci/cached-builds/gha_lvm_overlay.sh
📚 Learning: 2025-07-22T07:30:12.760Z
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-22T07:30:12.760Z
Learning: jiridanek requested GitHub issue creation for GitHub Actions LVM setup failure during PR #1425 review, specifically addressing ext4 signature detection causing lvcreate to fail with interactive prompts. Issue #1434 was successfully created with comprehensive problem description covering stale filesystem signatures, interactive prompt failures, runner cleanup issues, disk space pressure, detailed technical analysis, multiple solution options (immediate signature wiping, robust pre-cleanup logic, enhanced runner cleanup, alternative volume strategies), clear acceptance criteria, implementation guidance, risk mitigation strategies, and proper context linking. This infrastructure issue is related to but distinct from existing Issue #1196 about fallocate failures in the same gha_lvm_overlay.sh script, continuing the established pattern of systematic infrastructure improvements through detailed issue tracking.
Applied to files:
ci/cached-builds/gha_lvm_overlay.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: code-static-analysis
@jiridanek: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/hold could not get a clean build on gha with this |
Description
How Has This Been Tested?
Self checklist (all need to be checked):
make test
(gmake
on macOS) before asking for reviewDockerfile.konflux
files should be done inodh/notebooks
and automatically synced torhds/notebooks
. For Konflux-specific changes, modifyDockerfile.konflux
files directly inrhds/notebooks
as these require special attention in the downstream repository and flow to the upcoming RHOAI release.Merge criteria:
Summary by CodeRabbit