Skip to content

Issue #953: add MAKEFLAGS="-j4" to RStudio Dockerfiles for parallel compilation #1395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions rstudio/c9s-python-3.11/Dockerfile.cpu
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai the benefits on dockerfile buildtime are small, suggest ways to speed up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiridanek Looking at the RStudio Dockerfile, I can suggest several optimization strategies that would likely provide more significant build time improvements than the MAKEFLAGS change:

Layer Caching Optimizations

1. Consolidate Package Operations

# Instead of multiple dnf/yum calls, combine them:
RUN dnf install -y yum-utils && \
    yum-config-manager --enable crb && \
    yum install -y https://download.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm \
                   R-core R-core-devel R-java R-Rcpp R-highlight \
                   R-littler R-littler-examples openssl-libs compat-openssl11 \
                   libsodium-devel.x86_64 libgit2-devel.x86_64 libcurl-devel \
                   harfbuzz-devel.x86_64 fribidi-devel.x86_64 cmake "flexiblas-*" \
                   texlive-framed && \
    dnf clean all && rm -rf /var/cache/yum

2. Move Stable Dependencies Earlier

# Move R package installation before RStudio installation
# R packages change less frequently than RStudio versions

Build Context Optimizations

3. Use .dockerignore

# Create .dockerignore to exclude unnecessary files
*.md
.git/
.github/
test/
docs/
*.log

4. Multi-stage Build Improvements

# Use BuildKit for parallel stage execution
# syntax=docker/dockerfile:1
FROM base AS r-packages
# Install R packages in parallel stage

FROM base AS rstudio-download
# Download RStudio in parallel

Network and Download Optimizations

5. Pre-download Large Assets

# Use multi-stage to download RStudio RPM in parallel
FROM base AS rstudio-downloader
RUN wget --progress=dot:giga https://download2.rstudio.org/server/rhel8/x86_64/${RSTUDIO_RPM} -O /tmp/${RSTUDIO_RPM}

FROM base AS rstudio
COPY --from=rstudio-downloader /tmp/${RSTUDIO_RPM} /tmp/

6. Use Package Manager Caching

# Enable DNF caching for faster rebuilds
RUN echo 'keepcache=1' >> /etc/dnf/dnf.conf

Build-time Performance

7. Parallel R Package Installation

# In install_packages.R, use parallel installation
options(Ncpus = parallel::detectCores())

8. Use BuildKit Cache Mounts

# Use cache mounts for package managers
RUN --mount=type=cache,target=/var/cache/dnf \
    dnf install -y [packages]

Specific to This Dockerfile

9. Optimize Permission Changes

# Combine permission changes into fewer operations
RUN chmod -R a+w ${LIBLOC} ${R_LIBS_USER} && \
    chown -R 1001:0 ${NGINX_CONF_PATH} ${NGINX_APP_ROOT}/etc /var/lib/nginx /var/log/nginx /run

10. Use Faster Base Images
Consider using images with more packages pre-installed or create a custom base image for RStudio builds.

The most impactful optimizations would likely be:

  1. Layer consolidation (immediate 20-30% improvement)
  2. Multi-stage parallelization (15-25% improvement)
  3. Build context optimization (10-15% improvement)
  4. Package manager caching (significant on rebuilds)

Would you like me to create a GitHub issue to track these Docker build optimization improvements?

ᕱ⁾⁾ᕱ thoughtful about build optimization strategies


🧠 Learnings used
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1306
File: jupyter/trustyai/ubi9-python-3.12/Dockerfile.cpu:132-134
Timestamp: 2025-07-08T19:07:58.135Z
Learning: jiridanek requested GitHub issue creation for Docker build robustness problem with rm glob patterns during PR #1306 review. Issue was created covering 11 affected Dockerfiles with fragile rm commands that fail on empty globs, comprehensive problem description, multiple solution options, acceptance criteria, implementation guidance, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1306
File: jupyter/trustyai/ubi9-python-3.12/Dockerfile.cpu:132-134
Timestamp: 2025-07-08T19:07:58.135Z
Learning: jiridanek requested GitHub issue creation for Docker build robustness problem with rm glob patterns during PR #1306 review. Issue #1337 was successfully created covering 11 affected Dockerfiles with fragile rm commands that fail on empty globs, comprehensive problem description, multiple solution options, acceptance criteria, implementation guidance, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1306
File: jupyter/trustyai/ubi9-python-3.12/Dockerfile.cpu:4-4
Timestamp: 2025-07-04T05:49:10.314Z
Learning: jiridanek directs base image pinning security concerns to existing comprehensive issue #1242 "Improve Docker FROM image versioning by avoiding :latest tags" rather than addressing them in individual PRs, continuing the established pattern of systematic security and quality tracking in opendatahub-io/notebooks.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1320
File: jupyter/pytorch/ubi9-python-3.12/Dockerfile.cuda:42-52
Timestamp: 2025-07-09T12:29:56.162Z
Learning: jiridanek requested GitHub issue creation for OpenShift client architecture mapping problem affecting 29 Dockerfiles during PR #1320 review. Issue was created with comprehensive analysis covering all affected files using $(uname -m) returning 'aarch64' but OpenShift mirror expecting 'arm64', systematic solution using BuildKit TARGETARCH mapping with proper amd64→x86_64 and arm64→arm64 conversion, detailed acceptance criteria, and implementation guidance, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1269
File: codeserver/ubi9-python-3.12/utils/process.sh:17-19
Timestamp: 2025-07-03T14:00:00.909Z
Learning: jiridanek efficiently identifies when CodeRabbit review suggestions are already covered by existing comprehensive issues, demonstrating excellent issue management and avoiding duplicate tracking of the same improvements across multiple locations.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1320
File: jupyter/pytorch/ubi9-python-3.12/Dockerfile.cuda:65-66
Timestamp: 2025-07-09T12:31:02.033Z
Learning: jiridanek requested GitHub issue creation for MSSQL repo file hardcoding problem during PR #1320 review. Issue #1357 was created with comprehensive problem description covering hardcoded x86_64 MSSQL repo files breaking multi-architecture builds across 6 affected Dockerfiles, detailed root cause analysis, three solution options with code examples, clear acceptance criteria, implementation guidance following established multi-architecture patterns, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1320
File: jupyter/pytorch/ubi9-python-3.12/Dockerfile.cuda:65-66
Timestamp: 2025-07-09T12:31:02.033Z
Learning: jiridanek requested GitHub issue creation for MSSQL repo file hardcoding problem during PR #1320 review. Issue #1363 was created and updated with comprehensive problem description covering hardcoded x86_64 MSSQL repo files breaking multi-architecture builds across 10 affected Dockerfiles (including datascience, CUDA, ROCm, and TrustyAI variants), detailed root cause analysis, three solution options with code examples, clear acceptance criteria for all image types, implementation guidance following established multi-architecture patterns, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-11T11:16:05.131Z
Learning: jiridanek requested GitHub issue creation for RStudio py311 Tekton push pipelines during PR #1379 review. Issue #1384 was successfully created covering two RStudio variants (CPU and CUDA) found in manifests/base/params-latest.env, with comprehensive problem description, implementation requirements following the same pattern as other workbench pipelines, clear acceptance criteria, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-11T11:16:05.131Z
Learning: jiridanek requested GitHub issue creation for adding RStudio py311 Tekton push pipelines during PR #1379 review, referencing existing registry entries in manifests/base/params-latest.env but missing corresponding .tekton pipeline files. A comprehensive issue was created with detailed problem description, implementation requirements following the same pattern as other workbench pipelines, clear acceptance criteria, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1259
File: jupyter/rocm/tensorflow/ubi9-python-3.12/Pipfile:13-15
Timestamp: 2025-07-03T07:03:45.020Z
Learning: The Python 3.11 infrastructure for ROCm TensorFlow images in opendatahub-io/notebooks is already properly configured in the Makefile with both BASE_DIRS entries for Pipfile lock renewals and all-images targets for CI builds, requiring only commenting out the corresponding Python 3.12 entries when downgrading due to wheel availability issues.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1306
File: jupyter/trustyai/ubi9-python-3.12/kustomize/base/kustomization.yaml:8-12
Timestamp: 2025-07-08T19:09:48.746Z
Learning: jiridanek requested GitHub issue creation for misleading CUDA prefix in TrustyAI image tags during PR #1306 review, affecting both Python 3.11 and 3.12 versions. Issue #1338 was created with comprehensive problem description covering both affected images, repository pattern analysis comparing correct vs incorrect naming conventions, clear solution with code examples, detailed acceptance criteria, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: atheo89
PR: opendatahub-io/notebooks#1258
File: codeserver/ubi9-python-3.11/Dockerfile.cpu:32-32
Timestamp: 2025-07-07T11:08:48.524Z
Learning: atheo89 requested GitHub issue creation for multi-architecture Dockerfile improvements during PR #1258 review, specifically for enhancing structural consistency across Docker stages, replacing $(uname -m) with ${TARGETARCH} for cross-architecture builds, and adding OCI-compliant metadata labels. Issue #1332 was created with comprehensive problem description, phased implementation approach, detailed acceptance criteria, implementation guidance with code examples, and proper context linking, continuing the established pattern of systematic code quality improvements.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1306
File: jupyter/trustyai/ubi9-python-3.12/kustomize/base/kustomization.yaml:8-12
Timestamp: 2025-07-08T19:09:48.746Z
Learning: jiridanek requested GitHub issue creation for misleading CUDA prefix in TrustyAI image tags during PR #1306 review. Issue was created with comprehensive problem description covering both Python 3.11 and 3.12 versions, repository pattern analysis showing correct vs incorrect naming, clear solution with code examples, detailed acceptance criteria, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: grdryn
PR: opendatahub-io/notebooks#1320
File: rstudio/rhel9-python-3.11/Dockerfile.cuda:34-35
Timestamp: 2025-07-04T10:41:13.061Z
Learning: In the opendatahub-io/notebooks repository, when adapting NVIDIA CUDA Dockerfiles, the project intentionally maintains consistency with upstream NVIDIA patterns even when it might involve potential risks like empty variable expansions in package installation commands. This is considered acceptable because the containers only run on RHEL 9 with known yum/dnf behavior, and upstream consistency is prioritized over defensive coding practices.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1306
File: jupyter/trustyai/ubi9-python-3.12/Dockerfile.cpu:135-136
Timestamp: 2025-07-04T05:52:49.464Z
Learning: jiridanek requested GitHub issue creation for improving fragile sed-based Jupyter kernel display_name modification in jupyter/trustyai/ubi9-python-3.12/Dockerfile.cpu during PR #1306 review. Issue #1321 was created with comprehensive problem description covering JSON corruption risks, greedy regex patterns, maintenance burden, and proposed Python-based JSON parsing solution with detailed acceptance criteria, implementation guidance, and proper context linking, continuing the established pattern of systematic code quality improvements.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1269
File: codeserver/ubi9-python-3.12/nginx/root/usr/share/container-scripts/nginx/common.sh:1-3
Timestamp: 2025-07-03T12:07:19.365Z
Learning: jiridanek consistently requests GitHub issue creation for technical improvements identified during code reviews in opendatahub-io/notebooks, ensuring systematic tracking of code quality enhancements like shell script portability issues with comprehensive descriptions, solution options, and acceptance criteria.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/test/test_notebook.ipynb:31-34
Timestamp: 2025-07-01T07:03:05.385Z
Learning: jiridanek demonstrates excellent pattern recognition for identifying duplicated code issues across the opendatahub-io/notebooks repository. When spotting a potential problem in test notebooks, he correctly assesses that such patterns are likely replicated across multiple similar files rather than being isolated incidents, leading to more effective systematic solutions.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1259
File: jupyter/rocm/tensorflow/ubi9-python-3.12/Dockerfile.rocm:56-66
Timestamp: 2025-07-02T18:19:49.397Z
Learning: jiridanek consistently creates comprehensive follow-up GitHub issues for security concerns raised during PR reviews in opendatahub-io/notebooks, ensuring systematic tracking and resolution of supply-chain security improvements like GPG signature verification for package repositories.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1320
File: scripts/buildinputs/dockerfile.go:35-35
Timestamp: 2025-07-10T15:39:23.433Z
Learning: jiridanek requested GitHub issue creation for improving architecture detection in buildinputs tool during PR #1320 review. Issue #1373 was successfully created on the second attempt with comprehensive problem description covering qemu-user s390x build failures where runtime.GOARCH returns host architecture instead of target architecture, detailed impact analysis, four solution options (environment variable detection, configuration parameter, platform detection logic, hybrid approach), clear acceptance criteria for multi-architecture support, phased implementation guidance, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1320
File: scripts/buildinputs/dockerfile.go:35-35
Timestamp: 2025-07-10T15:02:13.228Z
Learning: jiridanek requested GitHub issue creation for improving architecture detection in buildinputs tool during PR #1320 review. Issue was created with comprehensive problem description covering qemu-user s390x build failures where runtime.GOARCH returns host architecture instead of target architecture, detailed impact analysis, four solution options (environment variable detection, configuration parameter, platform detection logic, hybrid approach), clear acceptance criteria for multi-architecture support, phased implementation guidance, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ LABEL name="odh-notebook-rstudio-server-c9s-python-3.11" \
USER 0

ENV R_VERSION=4.4.3
ENV MAKEFLAGS="-j4"

# Install R
RUN yum install -y yum-utils && \
Expand Down
1 change: 1 addition & 0 deletions rstudio/c9s-python-3.11/Dockerfile.cuda
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ LABEL name="odh-notebook-rstudio-server-cuda-c9s-python-3.11" \
USER 0

ENV R_VERSION=4.4.3
ENV MAKEFLAGS="-j4"

# Install R
RUN yum install -y yum-utils && \
Expand Down
1 change: 1 addition & 0 deletions rstudio/rhel9-python-3.11/Dockerfile.cpu
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ RUN if [ -d "${SECRET_DIR}" ]; then \
fi

ENV R_VERSION=4.4.3
ENV MAKEFLAGS="-j4"

# Install R
RUN yum install -y yum-utils && \
Expand Down
1 change: 1 addition & 0 deletions rstudio/rhel9-python-3.11/Dockerfile.cuda
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,7 @@ RUN if [ -d "${SECRET_DIR}" ]; then \
fi

ENV R_VERSION=4.4.3
ENV MAKEFLAGS="-j4"

# Install R
RUN yum install -y yum-utils && \
Expand Down
Loading