-
Couldn't load subscription status.
- Fork 929
v3.1.x: Add -g compilation flag for all files that are present in the stack when attaching with MPIR #6357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
What is the performance impact of forcing that flag on all (even non-debug) builds? |
|
I'm still a little uncertain for the need for this patch. Don't debuggers offer other / different ways to get out of E.g., set a breakpoint out in main and "continue" until execution gets there. I ask for a few reasons:
|
|
ok to test |
|
@rhc54 the -g flag on its own shouldn't impact runtime. The binary size will increase slightly for these compilation units, but those affected sections of the ELF will only be loaded by debuggers, not during normal execution. Optimisation flags also seem to be appended to the compile line after the debugger flags. So even if @jsquyres This isn't about symbols exactly, to fix this bug we only really need async unwind information which is also used for exceptions. I chose to use the The reason that we can't set a breakpoint in main and run to it is because, as you can see from the stack, GDB doesn't know where in main we started from. The user would normally have to parse which line in main this stack starts from, and then set a breakpoint on the line after. But with this bug, GDB is unable to show this info and stops unwinding at
The exact commands that you would issue to GDB when using MPIR to attach would be: Which would tell GDB to step-out up the stack until frame 0 was reached and bring you back to user code. The user wouldn't have to do any stepping in MPI internals per say.
We're only worried about the stack that leads up to the MPIR holding point, and not everything under MPI_INIT. Do you think that stack would still be platform/environment dependent? |
|
Tagging @shamisp |
|
@rhc54 Is there a way that we can test this patch to make sure that it doesn't impact performance? The |
|
I don't have a way to do it myself, but I also agree with you that I can't see how this would impact application performance. I only raised the question in case someone out there in the MPI team sees something we don't 😄 |
|
What about a different approach: test in I assume we wouldn't want to add that flag everywhere, but we could sprinkle it around in the Right Places (potentially in, or as a supplement to Specifically: |
20b9acf to
e55bd3e
Compare
|
I've updated the PR to add a configure check for the compiler version and to add only the async unwind flag. I think there are still a few open questions:
|
|
@jsquyres Can we move this forward? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So sorry for the gigantic delay on replying to this.
|
Thanks for the feedback. I've made the changes to detect which flags work, rather than doing it in compiler name. And also done the other CPPFLAGS -> CFLAGS fixes. I didn't manage to do the flag detection with nested calls to _ORTE_SETUP_DEBUGGER_FLAGS_TRY_CFLAGS, that doesn't seem to be an option. But I did manage to do it with an extra "foundFlags" variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When all is said and done:
- This PR needs to be squashed down to a single commit
- The single commit needs to be applied to master before it can be applied to any release branches (i.e., another PR)
- Our normal way of doing things is to first commit fixes to master and then cherry-pick them to the relevant release branches
- The commit on this PR needs to be marked as a cherry-pick from master
- I.e., with the
git commit -x ...line in the commit message,(cherry-picked from xxx...). This helps us track to make sure that relevant commits got over from master to release branches.
- I.e., with the
- Equivalent PRs for the v3.0.x and v4.0.x branches should also be opened.
|
ok to test |
|
Ok, #6527 merged (the master version of this PR). Could you refresh this PR as a cherry-pick from that PR? And then file corresponding PRs for v3.0.x and v4.0.x? Thank you! |
This PR adds the
-gcompilation flag for all files that are present in the stack from theMPI_Init()call. This is so that when a debugger attaches using MPIR, it can step out of this stack back into main. This cannot be done with certain aggressive optimisations and missing debug information.This issue appeared when OpenMPI 3.1.2 was built with GCC 7.3.0 on Power8 with the following configuration line:
./configure --enable-mpirun-prefix-by-defaultThe stack that we get when attaching to the user process is this:
And without this patch, GDB can't unwind because of missing unwind information and an optimisation that puts a branch in the function preamble of the
orte_init()function. Looking like this:The user impact is that after attaching using MPIR, they can't get back to their call to
MPI_Init(). This can be fixed minimally by compiling with-fasynchronous-unwind-tables, but the existing pattern of using-gfor MPIR files (which implies unwind tables) was copied from the fileorte/orted/Makefile.amand applied to other files in this stack.This issue is occurring on 3.1.x and up, but I'm not sure about the branching model for PRs. Would I also have to submit a PR for future version branches? Or do changes to 3.1.x get automatically pulled in?
Thanks