opal/asm: updates to powerpc assembly #2051

hjelmn · 2016-09-03T05:52:46Z

This commit contains the following changes:

There is a bug in the PGI 16.x betas for ppc64 that causes them to
emit the incorrect instruction for loading 64-bit operands. If not
cast to void * the operands are loaded with lwz (load word and
zero) instead of ld. This does not affect optimized mode. The work
around is to cast to void * and was implemented similar to a
work-around for a xlc bug.
Actually implement 64-bit add/sub. These functions were missing and
fell back to the less efficient compare-and-swap implementations.

Thanks to @PHHargrove for helping to track this down. With this update
the GCC inline assembly works as expected with pgi and ppc64.

Signed-off-by: Nathan Hjelm [email protected]

@PHHargrove

This commit contains the following changes: - There is a bug in the PGI 16.x betas for ppc64 that causes them to emit the incorrect instruction for loading 64-bit operands. If not cast to void * the operands are loaded with lwz (load word and zero) instead of ld. This does not affect optimized mode. The work around is to cast to void * and was implemented similar to a work-around for a xlc bug. - Actually implement 64-bit add/sub. These functions were missing and fell back to the less efficient compare-and-swap implementations. Thanks to @PHHargrove for helping to track this down. With this update the GCC inline assembly works as expected with pgi and ppc64. Signed-off-by: Nathan Hjelm <[email protected]>

hjelmn · 2016-09-03T05:54:12Z

@PHHargrove Ok, finally got all the ASM working. I went to fix add/sub but noticed they were missing so I implemented them as well. The only failure remaining was ompi_rb_tree and that one is due to opal_init_util not being called by the test. Not sure what is going on with that.

hjelmn · 2016-09-03T05:55:14Z

@jjhursey I am sure IBM wants to review this. Target is v2.0.2 if they will take it. v2.1.0 otherwise.

hjelmn · 2016-09-03T05:56:27Z

@sjeaugey If @PHHargrove hasn't already filed a bug with PGI can you look into the 64-bit operand issue? The correct assembly is generated for pointers but not 64-bit integers.

PHHargrove · 2016-09-03T06:19:39Z

If @PHHargrove hasn't already filed a bug with PGI

I have filed the bug report.

@hjelmn I think your ppc add/sub code is subtly incorrect.

Where you have "=&r" (t), I think you need "=&b" (t).
The b ensures a register other than r0 is selected.
This is necessary since r0 in an add (or subf?) instruction would be a zero value.

-Paul

hjelmn · 2016-09-03T13:13:43Z

Ok. I can update both the 64 and 32 bit atomics to use b instead of r. probably a bug if a compiler tries to use a special register but better safe than sorry given pgi loads 64-bit using 32-bit loads.

hjelmn · 2016-09-03T15:03:43Z

What is the pgi bug number? I want to document it in the code like is done in gasnet.

PHHargrove · 2016-09-03T18:14:36Z

What is the pgi bug number? I want to document it in the code like is done in gasnet.

I reported by email, but don't have a response yet.
When I have the TPR number I will pass it along.
I would guess Tue since Mon is Labor Day.

PHHargrove · 2016-09-03T18:19:21Z

probably a bug if a compiler tries to use a special register

The thing is that r0 is not special in most instructions.
It is actually a common temporary.
So, the compiler could choose it w/o being "buggy".
The presence of a b constraint gives programmers a way to identify contexts where r0 would be special.

hjelmn · 2016-09-04T03:10:14Z

Makes sense. So on ppc64 'b' means any register but r0. On x86_64 it means *bx. I will make the necessary updates later this weekend.

PHHargrove · 2016-09-04T03:32:14Z

[...] So on ppc64 'b' means any register but r0. On x86_64 it means *bx. [...]

Yes, the "b" stands for "base".
If you haven't already, you should bookmark https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html

In some cases you still need to read the GCC sources to understand when the constraints really mean. For instance, you'll find the following in gcc/config/rs6000/rs6000.h in relation to the register class corresponding to the b constraint:

   However, r0 is special in that it cannot be used as a base register.
   So make a class for registers valid as base registers.

hjelmn · 2016-09-04T03:33:50Z

Ah, thanks for the link! The gcc inline assembly manual I usually use has none of this useful information.

hjelmn · 2016-09-06T17:24:06Z

I am trying to understand the GPR vs base register issue. The linux kernel uses "r" not "b" but it could be that with ELF r0 is never used with "r".

All I can find about GPR vs base is the using instruction. There r0 means no base register. There isn't similar language on add or subf. Do you know where I can find documentation on this feature?

PHHargrove · 2016-09-06T19:20:07Z

Do you know where I can find documentation on this feature?

SHORT VERSION: It looks like your code is fine with r as the constraint, because I got a couple things confused.

LONG VERSION:

See page 310 of https://upc-bugs.lbl.gov/~phargrov/Docs/PPC/PPC64_ProgEnvMan_ver3.pdf
That page describes the addi instruction with the psuedo code:

if rA = 0 then rD ← EXTS(SIMM)
else rD ← rA + EXTS(SIMM)

So, in that instruction the source register cannot be r0 (li is actually encoded this way).
However, that is not the instruction you are using.

In GASNet's atomics add we are using more advanced inline asm to optimize for atomic-add involving a small integer constant and therefore may generate either add or addi instructions, depending on the operand. So GASNet needs to prevent use of r0, but your current asm does not.

The GASNet asm for 32-bit add-and-fetch, if you care:

      uint32_t _gasneti_atomic32_addfetch(gasneti_atomic32_t *v, uint32_t op) {
        register uint32_t result;
        __asm__ __volatile__ (
          "0:\t"
          "lwarx    %0,0,%2 \n\t"
          "add%I3   %0,%0,%3 \n\t"
          "stwcx.   %0,0,%2 \n\t"
          "bne-     0b \n\t"
          : "=&b"(result), "=m" (v->ctr)        /* constraint b = 'b'ase register (not r0) */
          : "r" (v), "Ir"(op) , "m"(v->ctr)
          : "cr0");
        return result;
      }

BTW: You current clobber cc which is all of the condition code registers, but the cmpw, cmpd, stwcx. and stdcw. instruction each only clobber cr0. So the compiler might be able to preserve values in cr1...7 if you change the clobbers from cc to cr0 in your asm.
Because you are using subfic in opal_atomic_cmpset_64() which is documented as setting the XER[CA] bit, that asm needs an additional clobber (xer, of course).
HOWEVER I am not 100% sure that PGI is going to know the xer clobber (or even cr0).

hjelmn · 2016-09-06T21:13:19Z

Ok, thanks @PHHargrove. That makes sense. I will commit this as-is then and work and figuring out how we can improve the clobbers. If you leave the VM up this week I will play around with it tomorrow.

PHHargrove · 2016-09-06T21:22:54Z

I am not planning to take down the VM my self, but the hosting provider is doing some maintenance on Friday that may take it down. So, you should aim to finish before then if possible.

PHHargrove · 2016-09-06T21:25:14Z

OOPS - I have no idea why my previous comment was posted 3 times.
Removing the duplicates now,

jjhursey · 2016-09-08T03:21:38Z

I tried PGI 16.7-beta on a power8 machine, and it worked fine.

I've added PGI to the IBM MTT runs. Tonight's runs are a little off as I was updating the environment while it was kicking off. Tomorrow night's runs should be good to review.

When our Jenkins server comes back online I'll add a test there as well.

hjelmn added bug Target: v2.x labels Sep 3, 2016

hjelmn added this to the v2.0.2 milestone Sep 3, 2016

jsquyres mentioned this pull request Sep 6, 2016

Compilation fails for powerpc with --enable-mpi-cxx #2055

Closed

hjelmn merged commit 27a2509 into open-mpi:master Sep 6, 2016

opal/asm: updates to powerpc assembly #2051

opal/asm: updates to powerpc assembly #2051

Uh oh!

Conversation

hjelmn commented Sep 3, 2016

Uh oh!

hjelmn commented Sep 3, 2016

Uh oh!

hjelmn commented Sep 3, 2016

Uh oh!

hjelmn commented Sep 3, 2016

Uh oh!

PHHargrove commented Sep 3, 2016

Uh oh!

hjelmn commented Sep 3, 2016

Uh oh!

hjelmn commented Sep 3, 2016

Uh oh!

PHHargrove commented Sep 3, 2016

Uh oh!

PHHargrove commented Sep 3, 2016

Uh oh!

hjelmn commented Sep 4, 2016

Uh oh!

PHHargrove commented Sep 4, 2016

Uh oh!

hjelmn commented Sep 4, 2016

Uh oh!

hjelmn commented Sep 6, 2016

Uh oh!

PHHargrove commented Sep 6, 2016

Uh oh!

hjelmn commented Sep 6, 2016

Uh oh!

PHHargrove commented Sep 6, 2016

Uh oh!

PHHargrove commented Sep 6, 2016

Uh oh!

jjhursey commented Sep 8, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants