-
Notifications
You must be signed in to change notification settings - Fork 121
--rdma-mpi flag fix
#996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--rdma-mpi flag fix
#996
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a proper fix for the --rdma-mpi flag that controls whether RDMA MPI tests are included in the test suite. Previously, RDMA MPI tests were always skipped, but now they are conditionally added based on the flag.
Key Changes
- Changed test filtering logic to conditionally add RDMA MPI tests instead of always skipping them
- Set default value for
rdma_mpiargument toFalsein the global state - Updated GitHub Actions workflow scripts to use the flag appropriately for different scenarios
Reviewed Changes
Copilot reviewed 9 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| toolchain/mfc/test/test.py | Removed code that unconditionally skipped RDMA MPI tests |
| toolchain/mfc/test/cases.py | Added conditional logic to include RDMA MPI test cases only when flag is enabled |
| toolchain/mfc/state.py | Set default value for rdma_mpi argument to False |
| tests/FA4D8FEF/golden.txt | Test output data for RDMA MPI test case |
| tests/FA4D8FEF/golden-metadata.txt | Test metadata for RDMA MPI test case |
| tests/2C9844EF/golden-metadata.txt | Test metadata for RDMA MPI test case |
| tests/1B300F28/golden-metadata.txt | Test metadata for RDMA MPI test case |
| .github/workflows/frontier/test.sh | Updated test command to use --rdma-mpi flag conditionally |
| .github/workflows/frontier/build.sh | Updated build command to use --rdma-mpi flag |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
| if ARG("rdma_mpi"): | ||
| cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'m': 29, 'n': 29, 'p': 49, 'rdma_mpi': 'T'}, ppn=2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Guard against non-boolean or string values from ARG("rdma_mpi") to avoid truthy string pitfalls. Normalize the flag to a strict boolean before the conditional. [possible issue, importance: 6]
| if ARG("rdma_mpi"): | |
| cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'m': 29, 'n': 29, 'p': 49, 'rdma_mpi': 'T'}, ppn=2)) | |
| rdma_enabled = bool(ARG("rdma_mpi")) is True if isinstance(ARG("rdma_mpi"), bool) else str(ARG("rdma_mpi")).lower() in ("1", "t", "true", "yes", "y", "on") | |
| if rdma_enabled: | |
| cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'m': 29, 'n': 29, 'p': 49, 'rdma_mpi': 'T'}, ppn=2)) |
| if ARG("rdma_mpi"): | ||
| cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'rdma_mpi': 'T'}, ppn=2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Mirror the boolean normalization here to keep behavior consistent across 2D and 3D paths. This prevents accidental enabling when ARG("rdma_mpi") is a non-empty string. [general, importance: 6]
| if ARG("rdma_mpi"): | |
| cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'rdma_mpi': 'T'}, ppn=2)) | |
| rdma_enabled = bool(ARG("rdma_mpi")) is True if isinstance(ARG("rdma_mpi"), bool) else str(ARG("rdma_mpi")).lower() in ("1", "t", "true", "yes", "y", "on") | |
| if rdma_enabled: | |
| cases.append(define_case_d(stack, '2 MPI Ranks -> RDMA MPI', {'rdma_mpi': 'T'}, ppn=2)) |
|
|
||
| gCFG: MFCConfig = MFCConfig() | ||
| gARG: dict = {} | ||
| gARG: dict = {"rdma_mpi": False} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Ensure the default aligns with expected type usage by explicitly using a boolean and documenting accepted truthy strings. Add a simple helper to parse flags once and reuse. [general, importance: 7]
| gARG: dict = {"rdma_mpi": False} | |
| gARG: dict = {"rdma_mpi": False} | |
| def arg_bool(name: str, default: bool = False) -> bool: | |
| val = ARG(name, default) | |
| if isinstance(val, bool): | |
| return val | |
| return str(val).lower() in ("1", "t", "true", "yes", "y", "on") |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #996 +/- ##
=======================================
Coverage 40.91% 40.91%
=======================================
Files 70 70
Lines 20270 20270
Branches 2520 2520
=======================================
Hits 8293 8293
Misses 10439 10439
Partials 1538 1538 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Nothing else to add, ready for review and merge. |
.github/workflows/frontier/test.sh
Outdated
| ./mfc.sh test --rdma-mpi --max-attempts 3 -j $ngpus -- -c frontier | ||
| else | ||
| ./mfc.sh test -a --rdma-mpi --max-attempts 3 -j 32 -- -c frontier | ||
| ./mfc.sh test --max-attempts 3 -j 32 -- -c frontier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you removed testing of post_process on frontier (that's what -a does). please put it back 😄
User description
Description
Proper implementation of the flag for self-hosted frontier gpu test. Simply, it is intended to append the additional tests when needed instead of including/skipping. Also,
./mfc.sh test --rdma-mpi --gpu -- -c frontierworked as anticipated.Subsequent to (#878)
PR Type
Enhancement, Tests
Description
• Fixed
--rdma-mpiflag implementation to properly append additional tests instead of including/skipping• Added conditional logic to RDMA MPI test case generation in
cases.pyusingARG("rdma_mpi")check• Initialized default
rdma_mpi: Falsevalue in state configuration• Updated frontier workflow commands by removing
-aflags and adjusting--rdma-mpiflag usage• Added new golden reference test data files and metadata for test cases FA4D8FEF, 1B300F28, and 2C9844EF
• Enhanced test metadata to include CMake configuration, environment variables, and CPU details for RDMA-MPI GPU tests
Diagram Walkthrough
flowchart LR A["state.py"] -- "initialize default" --> B["rdma_mpi: False"] B --> C["cases.py"] C -- "conditional check" --> D["ARG('rdma_mpi')"] D -- "wrap test generation" --> E["RDMA MPI test cases"] F["frontier workflows"] -- "remove -a flags" --> G["updated commands"] H["new test files"] --> I["golden reference data"]File Walkthrough
1 files
cases.py
Conditional RDMA MPI test case generationtoolchain/mfc/test/cases.py
• Added import for
ARGfrom..statemodule• Wrapped RDMA MPI test
case generation with conditional check using
ARG("rdma_mpi")• Applied
conditional logic to both 3D and 2D test case scenarios
3 files
state.py
Initialize RDMA MPI flag default valuetoolchain/mfc/state.py
• Initialized
gARGdictionary with defaultrdma_mpi: Falsevaluetest.sh
Update frontier test workflow commands.github/workflows/frontier/test.sh
• Removed
-aflag from GPU test command• Removed
--rdma-mpiflag fromCPU test command
build.sh
Update frontier build workflow command.github/workflows/frontier/build.sh
• Removed
-aflag from test command while keeping--rdma-mpiflag4 files
golden.txt
Add golden reference test datatests/FA4D8FEF/golden.txt
• Added new golden reference file with numerical test data
• Contains
conservative and primitive variable data at different time steps
golden-metadata.txt
Add test metadata for golden referencetests/FA4D8FEF/golden-metadata.txt
• Added test metadata file with build configuration and system
information
• Contains CMake configuration, environment variables, and
CPU details
golden-metadata.txt
Add additional test metadata filetests/1B300F28/golden-metadata.txt
• Added another test metadata file with identical system configuration
• Contains same build and environment information as FA4D8FEF test
golden-metadata.txt
Add golden metadata file for RDMA-MPI GPU test configurationtests/2C9844EF/golden-metadata.txt
• Added a new golden metadata file containing test configuration and
system information
• Includes CMake configuration details for
pre_process, simulation, syscheck, and post_process modules
• Contains
CPU architecture information from an AMD EPYC 7A53 64-Core Processor
system
• Records test invocation with
--rdma-mpiand--gpuflags onfrontier configuration
3 files