-
Notifications
You must be signed in to change notification settings - Fork 929
opal/asm: updates to powerpc assembly #2051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit contains the following changes: - There is a bug in the PGI 16.x betas for ppc64 that causes them to emit the incorrect instruction for loading 64-bit operands. If not cast to void * the operands are loaded with lwz (load word and zero) instead of ld. This does not affect optimized mode. The work around is to cast to void * and was implemented similar to a work-around for a xlc bug. - Actually implement 64-bit add/sub. These functions were missing and fell back to the less efficient compare-and-swap implementations. Thanks to @PHHargrove for helping to track this down. With this update the GCC inline assembly works as expected with pgi and ppc64. Signed-off-by: Nathan Hjelm <[email protected]>
|
@PHHargrove Ok, finally got all the ASM working. I went to fix add/sub but noticed they were missing so I implemented them as well. The only failure remaining was ompi_rb_tree and that one is due to opal_init_util not being called by the test. Not sure what is going on with that. |
|
@jjhursey I am sure IBM wants to review this. Target is v2.0.2 if they will take it. v2.1.0 otherwise. |
|
@sjeaugey If @PHHargrove hasn't already filed a bug with PGI can you look into the 64-bit operand issue? The correct assembly is generated for pointers but not 64-bit integers. |
I have filed the bug report. @hjelmn I think your ppc add/sub code is subtly incorrect. Where you have -Paul |
|
Ok. I can update both the 64 and 32 bit atomics to use b instead of r. probably a bug if a compiler tries to use a special register but better safe than sorry given pgi loads 64-bit using 32-bit loads. |
|
What is the pgi bug number? I want to document it in the code like is done in gasnet. |
I reported by email, but don't have a response yet. |
The thing is that |
|
Makes sense. So on ppc64 'b' means any register but r0. On x86_64 it means *bx. I will make the necessary updates later this weekend. |
Yes, the "b" stands for "base". In some cases you still need to read the GCC sources to understand when the constraints really mean. For instance, you'll find the following in |
|
Ah, thanks for the link! The gcc inline assembly manual I usually use has none of this useful information. |
|
I am trying to understand the GPR vs base register issue. The linux kernel uses "r" not "b" but it could be that with ELF r0 is never used with "r". All I can find about GPR vs base is the using instruction. There r0 means no base register. There isn't similar language on add or subf. Do you know where I can find documentation on this feature? |
SHORT VERSION: It looks like your code is fine with LONG VERSION: See page 310 of https://upc-bugs.lbl.gov/~phargrov/Docs/PPC/PPC64_ProgEnvMan_ver3.pdf So, in that instruction the source register cannot be In GASNet's atomics add we are using more advanced inline asm to optimize for atomic-add involving a small integer constant and therefore may generate either The GASNet asm for 32-bit add-and-fetch, if you care: BTW: You current clobber |
|
Ok, thanks @PHHargrove. That makes sense. I will commit this as-is then and work and figuring out how we can improve the clobbers. If you leave the VM up this week I will play around with it tomorrow. |
|
I am not planning to take down the VM my self, but the hosting provider is doing some maintenance on Friday that may take it down. So, you should aim to finish before then if possible. |
|
OOPS - I have no idea why my previous comment was posted 3 times. |
|
I tried PGI I've added PGI to the IBM MTT runs. Tonight's runs are a little off as I was updating the environment while it was kicking off. Tomorrow night's runs should be good to review. When our Jenkins server comes back online I'll add a test there as well. |
This commit contains the following changes:
emit the incorrect instruction for loading 64-bit operands. If not
cast to void * the operands are loaded with lwz (load word and
zero) instead of ld. This does not affect optimized mode. The work
around is to cast to void * and was implemented similar to a
work-around for a xlc bug.
fell back to the less efficient compare-and-swap implementations.
Thanks to @PHHargrove for helping to track this down. With this update
the GCC inline assembly works as expected with pgi and ppc64.
Signed-off-by: Nathan Hjelm [email protected]