-
Notifications
You must be signed in to change notification settings - Fork 928
Description
Background information
GPFS headers on Rocky Linux changed between RL9.5 and RL9.6 (5.2.1-1.x86_64 -> 5.2.3-0.x86_64)
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v5.0.8, which is btw not listed on the right hand side/"news" list on open-mpi.org. It's "only" listed on the download site.
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Using the v5.0.8 tarball, unpacking and configuring essentially with ./configure ... --with-gpfs
Please describe the system on which you are running
- Operating system/version: Rocky Linux 9.6
- Computer hardware: Xeon 8360Y
- Network type: InfiniBand
Details of the problem
At least the gpfs_fcntl.h header file changed. It introduces a new FCNTL (in a minor/patch update 🤷) which changes struct names.
--- gpfs_fcntl.h_gpfs.base-5.2.2-1.x86_64 2025-06-25 17:55:10.646268000 +0200
+++ gpfs_fcntl.h_gpfs.base-5.2.3-0.x86_64 2025-06-25 17:55:28.521533000 +0200
@@ -159,6 +159,7 @@
#define GPFS_FCNTL_GET_SNAP_MIGRATION_SUPPORT 3009
/* for use with mmlsattr -D */
#define GPFS_FCNTL_GET_DATA_BLOCK_DISK_NUMS 3010
+#define GPFS_FCNTL_SET_REPLICATIONX 3011
/* Structures for specifying the various gpfs_fcntl hints */
@@ -357,6 +358,7 @@
{
int structLen; /* length of this structure */
int structType; /* directive identifier:
+ GPFS_FCNTL_SET_REPLICATIONX or
GPFS_FCNTL_SET_REPLICATION */
int metadataReplicas; /* Set the number of copies of the file's
indirect blocks. Valid values are 1-3,
@@ -371,10 +373,14 @@
A value of 0 indicates not to change the
current value. */
int dataReplicas; /* Set the number of copies of the file's
- data blocks. Valid values are 1-3,
+ data blocks. Valid values are 0-3,
but cannot be greater than the value of
maxDataReplicas. A value of 0 indicates
- not to change the current value. */
+ not to change the current value under
+ GPFS_FCNTL_SET_REPLICATION.
+ Under GPFS_FCNTL_SET_REPLICATIONX, a valiue
+ of -1 indicates no change to the current
+ values. */
int maxDataReplicas; /* Set the maximum number of copies of a file's
data blocks. Space in the file's inode
and indirect blocks is reserved for the
@@ -386,7 +392,11 @@
Defined below. */
int errValue1; /* returned value depending upon errReason */
int errValue2; /* returned value depending upon errReason */
- int reserved; /* unused, but should be set to 0 */
+ int perfReplicas; /* Under GPFS_FCNTL_SET_REPLICATIONX, set the
+ number of additonal copies of the file's data
+ blocks for performance. Valid values are 0-1.
+ A value of -1 indicates no change to the
+ current value. */
} gpfsSetReplication_t;Consequently, Open MPI's GPFS plugin fails to compile:
make[2]: Entering directory '/tmp/bzfbchris/build/openmpi-5.0.8/ompi/mca/fs/gpfs'
CC fs_gpfs_component.lo
CC fs_gpfs_file_open.lo
CC fs_gpfs_file_set_info.lo
CC fs_gpfs_file_get_info.lo
CC fs_gpfs.lo
fs_gpfs_file_set_info.c: In function ‘mca_fs_gpfs_file_set_info’:
fs_gpfs_file_set_info.c:316:52: error: ‘gpfsSetReplication_t’ has no member named ‘reserved’
316 | gpfs_hint_SetReplication.gpfsSetReplication.reserved = 0;
|
My hotfix looks like this.
--- ompi/mca/fs/gpfs/fs_gpfs_file_set_info.c 2025-05-30 18:35:13.522618108 +0200
+++ ompi/mca/fs/gpfs/fs_gpfs_file_set_info.c 2025-06-25 18:21:58.703887614 +0200
@@ -313,7 +313,7 @@
gpfs_hint_SetReplication.gpfsSetReplication.maxMetadataReplicas = atoi(token);
gpfs_hint_SetReplication.gpfsSetReplication.dataReplicas = atoi(token);
gpfs_hint_SetReplication.gpfsSetReplication.maxDataReplicas = atoi(token);
- gpfs_hint_SetReplication.gpfsSetReplication.reserved = 0;
+ gpfs_hint_SetReplication.gpfsSetReplication.perfReplicas = 0;
free(info_str_dup);
rc = gpfs_fcntl(gpfs_file_handle, &gpfs_hint_SetReplication);According to the comment, the perfReplicas should only be considered by GPFS when the GPFS_FCNTL_SET_REPLICATIONX fcntl is used. Since it is new, this one is not used by Open MPI and setting perfReplicas to 0 should be safe.