Skip to content

Conversation

@danieljvickers
Copy link
Member

@danieljvickers danieljvickers commented Nov 10, 2025

User description

Description

Fixes an issue when calling MPI with multiple ranks on hipervator


PR Type

Bug fix


Description

  • Fixes LD_LIBRARY_PATH configuration for MPI multi-rank execution

  • Adds CUDA library path to environment on HiperGator cluster

  • Ensures proper library resolution for OpenMPI with NVHPC


Diagram Walkthrough

flowchart LR
  A["MPI Multi-rank Job"] -->|Missing CUDA libs| B["Library Resolution Fails"]
  C["Add LD_LIBRARY_PATH"] -->|CUDA 12.8.1 lib64| D["Proper Library Loading"]
  B -->|Fix Applied| D
Loading

File Walkthrough

Relevant files
Bug fix
modules
Add CUDA library path to environment configuration             

toolchain/modules

  • Added LD_LIBRARY_PATH environment variable configuration for
    HiperGator cluster
  • Points to CUDA 12.8.1 lib64 directory with proper path prepending
  • Applied globally to all HiperGator nodes (h-all) to ensure consistent
    library resolution
+1/-0     

@danieljvickers danieljvickers requested a review from a team as a code owner November 10, 2025 00:42
@qodo-merge-pro
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

Overwriting rather than appending to LD_LIBRARY_PATH can break downstream library resolution; ensure the variable expansion preserves existing values and handles empty/undefined cases robustly across shells.

h-all LD_LIBRARY_PATH=/apps/compilers/cuda/12.8.1/lib64:$LD_LIBRARY_PATH
h-gpu MFC_CUDA_CC=100 NVHPC_CUDA_HOME="/apps/compilers/cuda/12.8.1"
Portability

Bare variable expansion may fail if LD_LIBRARY_PATH is unset; consider using a default (e.g., ${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}) or module system constructs to avoid trailing/leading colons and shell-compat issues.

h-all LD_LIBRARY_PATH=/apps/compilers/cuda/12.8.1/lib64:$LD_LIBRARY_PATH
h-gpu MFC_CUDA_CC=100 NVHPC_CUDA_HOME="/apps/compilers/cuda/12.8.1"

h-all HPC_OMPI_BIN="/apps/mpi/cuda/12.8.1/nvhpc/25.3/openmpi/5.0.7/bin"
h-all OMPI_MCA_pml=ob1 OMPI_MCA_coll_hcoll_enable=0
h-gpu PATH="/apps/mpi/cuda/12.8.1/nvhpc/25.3/openmpi/5.0.7/bin:${PATH}"
h-all LD_LIBRARY_PATH=/apps/compilers/cuda/12.8.1/lib64:$LD_LIBRARY_PATH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: To prevent potential runtime errors on non-GPU systems, scope the CUDA LD_LIBRARY_PATH setting to GPU hosts only by changing h-all to h-gpu. [possible issue, importance: 8]

Suggested change
h-all LD_LIBRARY_PATH=/apps/compilers/cuda/12.8.1/lib64:$LD_LIBRARY_PATH
h-gpu LD_LIBRARY_PATH=/apps/compilers/cuda/12.8.1/lib64:$LD_LIBRARY_PATH

@sbryngelson sbryngelson merged commit 4f0704f into MFlowCode:master Nov 10, 2025
20 checks passed
@qodo-merge-pro
Copy link
Contributor

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No logging: The added environment variable export does not include any audit logging of critical
actions, but the snippet may be part of a configuration context where logging is handled
elsewhere.

Referred Code
h-all LD_LIBRARY_PATH=/apps/compilers/cuda/12.8.1/lib64:$LD_LIBRARY_PATH
h-gpu MFC_CUDA_CC=100 NVHPC_CUDA_HOME="/apps/compilers/cuda/12.8.1"

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
No error handling: The new line sets LD_LIBRARY_PATH without any validation or fallback if the path is
missing or unreadable, though such checks may be performed elsewhere in the toolchain.

Referred Code
h-all LD_LIBRARY_PATH=/apps/compilers/cuda/12.8.1/lib64:$LD_LIBRARY_PATH
h-gpu MFC_CUDA_CC=100 NVHPC_CUDA_HOME="/apps/compilers/cuda/12.8.1"

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Env var safety: The change prepends a system path to LD_LIBRARY_PATH without validation or sanitization,
which could be acceptable in this controlled module context but warrants verification of
path integrity and quoting.

Referred Code
h-all LD_LIBRARY_PATH=/apps/compilers/cuda/12.8.1/lib64:$LD_LIBRARY_PATH
h-gpu MFC_CUDA_CC=100 NVHPC_CUDA_HOME="/apps/compilers/cuda/12.8.1"

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants