Skip to content

Conversation

@simonpintarelli
Copy link
Member

No description provided.

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/161


#### `"cxil_map: write error"` when doing inter-node GPU-aware MPI communication

The following environment variable can be set to disable gdrcopy:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

This error message is sometimes triggered by applications that use GPU Direct MPI calls when they trigger a bug in gdrcopy (a low-level library used to copy buffers between GPUs).
Setting the following option will completely disable gdrcopy.
Note that this has a performance impact for small message sizes, so it should only be enabled on a case-by-case basis.

You could also mention that it has been used for ICON.

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/161

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/161

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/161

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/161

@github-actions
Copy link

preview available: https://docs.tds.cscs.ch/161

@bcumming bcumming merged commit c3caf9e into main Jun 30, 2025
1 check passed
@bcumming bcumming deleted the cxil_map-write-error branch June 30, 2025 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants