Skip to content

Conversation

@nmorey
Copy link

@nmorey nmorey commented Nov 7, 2025

While debugging mpich 4.3.2 with ch4:ucx on s390x, several issues popped up.
Found, and tested on 1.19.0

First:

sles@ucx-debug:~/mpich/mpich-4.3.2/src/mpi/romio/test> mpirun -np 2 ./file_info -fname test
[1762529195.399132] [ucx-debug:464075:0]     ucp_context.c:2055 UCX  ERROR UCX_DYNAMIC_TL_PROGRESS_FACTOR must be > 0
[1762529195.399238] [ucx-debug:464074:0]     ucp_context.c:2055 UCX  ERROR UCX_DYNAMIC_TL_PROGRESS_FACTOR must be > 0

This fixed by the first patch. Config parser was wrongly looking for a time instead of a regular int. Not sure why I only see this on s390 though.

Second:

sles@ucx-debug:~/mpich/mpich-4.3.2/src/mpi/romio/test> ./file_info -fname test
[1762533043.448147] [ucx-debug:488518:0]            self.c:276  UCX  ERROR failed to allocate device resource

This is caused by UCT/SELF scaning the option to a uint while it is a size_t in memory.

Summary by CodeRabbit

  • Chores
    • Updated configuration parameter type handling to improve internal parsing and validation consistency.

size_t may be larger than an int. This causes issue on big endian systems

Signed-off-by: Nicolas Morey <[email protected]>
@coderabbitai
Copy link

coderabbitai bot commented Nov 7, 2025

Walkthrough

Configuration parameter types were modified in two transport modules. The UCP core's DYNAMIC_TL_PROGRESS_FACTOR changed from time units to unsigned integer type. The UCT self transport's NUM_DEVICES changed from signed integer to unsigned long type. These alterations affect how configuration values are parsed and validated during initialization.

Changes

Cohort / File(s) Summary
UCP Core Configuration
src/ucp/core/ucp_context.c
Changed DYNAMIC_TL_PROGRESS_FACTOR config entry type from UCS_CONFIG_TYPE_TIME_UNITS to UCS_CONFIG_TYPE_UINT
UCT Self Transport Configuration
src/uct/sm/self/self.c
Changed NUM_DEVICES config entry type from UCS_CONFIG_TYPE_INT to UCS_CONFIG_TYPE_ULONG

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10–15 minutes

  • Verify that config type changes align with parameter semantics and intended value ranges
  • Confirm downstream code consuming these configuration values remains compatible with the new types
  • Check that default values and validation constraints are appropriate for the new types

Poem

🐰 From time units to integers plain,
Config types dance through the lane,
Unsigned long and uint so keen,
Values parse in a different scene,
Transport flows now smooth and light!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Config type fixes' directly summarizes the main changes: correcting configuration type declarations in two files (ucp_context.c and self.c).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e29e5f5 and da4da6d.

📒 Files selected for processing (2)
  • src/ucp/core/ucp_context.c (1 hunks)
  • src/uct/sm/self/self.c (1 hunks)
🔇 Additional comments (2)
src/uct/sm/self/self.c (1)

59-60: LGTM! Type correction fixes allocation error.

The change from UCS_CONFIG_TYPE_INT to UCS_CONFIG_TYPE_ULONG correctly aligns the configuration parser with the in-memory size_t representation, resolving the "failed to allocate device resource" error described in the PR.

src/ucp/core/ucp_context.c (1)

444-448: LGTM! Correct type fixes parser error.

The change from UCS_CONFIG_TYPE_TIME_UNITS to UCS_CONFIG_TYPE_UINT is correct. The configuration entry represents a numeric factor/count ("Number of usage tracker rounds"), not a time duration. This fixes the runtime parser error reported in the PR.


Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant