Skip to content

Conversation

@prasanna-amd
Copy link
Contributor

Details

race condition at shutdown causes segmentation fault, instead have an atexit handler to set global shutdown state and avoid use after free bugs.

Work item: "Internal", or link to GitHub issue (if applicable).
ROCM-1896

What were the changes?
Added atexit handler to set shutdown flag and prevent other threads from running into segmentation faults

Why were the changes made?
Without these changes, an error condition during rccl (like no open files, or running out of hip memory), could result in a segmentation fault, instead of gracefully exiting with an error reason.

How was the outcome achieved?
Handle exit conditions and prevent other threads from running into segfault

Additional Documentation:
cherry pick from develop

@JeniferC99
Copy link

manual psdb triggered: http://rocm-ci.amd.com/job/compute-psdb-rel-7.2/346

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants