-
Notifications
You must be signed in to change notification settings - Fork 70
fix the universal image build #515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix the universal image build #515
Conversation
|
Caution There are some errors in your PipelineRun template.
|
WalkthroughTwo new Tekton CI/CD pipeline configurations for a training image (th03-cuda128-torch28-py312-rhel9) are introduced for pull request and push workflows. The accompanying Dockerfile is updated with dependency version bumps, additional runtime packages, build tooling additions, and modified source references. Changes
Sequence DiagramsequenceDiagram
participant User as Developer
participant Git as Git
participant Tekton as Tekton Pipeline
participant Build as Build Task
participant Scan as Scan Tasks
participant Registry as Image Registry
User->>Git: Push/PR Code
Git->>Tekton: Trigger Pipeline
Tekton->>Tekton: Init & Clone
Tekton->>Build: Prefetch Dependencies
Tekton->>Build: Build Images (Multi-platform)
Build->>Build: Build Image Index
Tekton->>Scan: Run Security Checks<br/>(clair-scan, sast, clamav)
Scan->>Scan: Deprecated Base Image Check
Scan->>Scan: Ecosystem Cert Preflight
Tekton->>Registry: Apply Tags & Push
Tekton->>Registry: Push Dockerfile
Registry->>User: Artifact Published
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
79abe4d to
409b486
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
images/universal/training/th03-cuda128-torch280-py312/Dockerfile (1)
119-157: Address Hadolint warning: multiple redirections.Hadolint (SC2261) flagged potential multiple redirections in this RUN block. While the code appears syntactically correct, ensure there are no unintended shell redirections. The multi-line continuation is valid, but verify the intent is correct.
If this is a false positive from Hadolint, we can document it. If there is a shell issue, I can help refactor to resolve it.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.tekton/odh-training-th03-cuda128-torch28-py312-rhel9-pull-request.yaml(1 hunks).tekton/odh-training-th03-cuda128-torch28-py312-rhel9-push.yaml(1 hunks)images/universal/training/th03-cuda128-torch280-py312/Dockerfile(3 hunks)
🧰 Additional context used
🪛 Hadolint (2.14.0)
images/universal/training/th03-cuda128-torch280-py312/Dockerfile
[error] 119-119: Multiple redirections compete for stdout. Use cat, tee, or pass filenames instead.
(SC2261)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Red Hat Konflux / odh-training-th03-cuda128-torch28-py312-rhel9-on-pull-request
🔇 Additional comments (6)
images/universal/training/th03-cuda128-torch280-py312/Dockerfile (3)
111-111: Verify the SDK git branch reference.The git URL now references
add-training-hubbranch instead of a stable ref. For production builds, confirm this branch is stable and the intent is not to usemainor a tagged release.
167-169: Approve the deterministic dependency install sequence.The 2-step no-build-isolation install for
causal-conv1dandmamba-ssmfollowed byfix-permissionsis a good practice for ensuring reproducible builds and handling build-time dependencies correctly.
172-172: No action required—entrypoint file location is correct.The search confirms that
entrypoint-universal.shexists atimages/universal/training/th03-cuda128-torch280-py312/entrypoint-universal.sh, which is in the same directory as the Dockerfile. TheCOPYcommand on line 172 correctly references the file from the build context root and will succeed without modification..tekton/odh-training-th03-cuda128-torch28-py312-rhel9-push.yaml (2)
1-50: Push workflow configuration is appropriate for release builds.Key differences from PR pipeline are correct:
- Output tag is
latest(no revision suffix)- No image expiration (permanent release)
cancel-in-progress: false(allow concurrent pushes)- Trigger on push to main vs pull_request
Configuration aligns with release workflow expectations.
10-11: Verify file path watchers in on-push trigger.The
pathChanged()expressions monitor Dockerfile, entrypoint, and pipeline files. Confirm these paths match your repository structure. If the entrypoint file is not at the expected location (see Dockerfile review comment), update the path here as well.Also applies to: 12-12
.tekton/odh-training-th03-cuda128-torch28-py312-rhel9-pull-request.yaml (1)
1-50: Bundle verification requires live Konflux environment access.The verification script could not reach the quay.io registry due to sandbox environment limitations. The task bundle references appear syntactically valid with correct SHA256 digests, but this does not confirm their availability or compatibility with your actual Konflux instance. Run the verification script in an environment with network access to quay.io and appropriate credentials to confirm bundle accessibility and compatibility before merging.
|
|
||
| # Provide ninja via pip (RHEL/UBI repo ninja-build may be unavailable) | ||
| RUN pip install --retries 5 --timeout 300 --no-cache-dir ninja | ||
| # RUN pip install --retries 5 --timeout 300 --no-cache-dir ninja |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Remove redundant ninja installation.
Ninja is commented out on line 114 but re-installed on line 161 via pip install -U pip setuptools wheel ninja cmake. Consolidate by removing the comment on line 114 (since ninja will be installed via pip), or clarify the intent.
Apply this diff:
- # Provide ninja via pip (RHEL/UBI repo ninja-build may be unavailable)
- # RUN pip install --retries 5 --timeout 300 --no-cache-dir ninja
# Install remaining runtime packages (resolved from default PyPI), including FlashAttentionAlso applies to: 161-161
🤖 Prompt for AI Agents
In images/universal/training/th03-cuda128-torch280-py312/Dockerfile around lines
114 and 161, the Dockerfile redundantly references ninja: line 114 has a
commented pip install of ninja while line 161 installs ninja via a combined pip
command; remove the commented-out ninja line (or if intent was to pre-install
ninja earlier, instead uncomment and remove it from the later combined pip
install) so ninja is installed exactly once and the intent is clear—update the
Dockerfile to either delete line 114 or adjust line 161 accordingly and keep a
single pip install for ninja.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: briangallagher The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description
Universal image konflux builds fix
Dockerfile based of current state of #502
Merge criteria:
Konflux successfully completes
The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work
Summary by CodeRabbit
Infrastructure
Chores