Skip to content

Conversation

@hjelmn
Copy link
Member

@hjelmn hjelmn commented Sep 3, 2016

This commit contains the following changes:

  • There is a bug in the PGI 16.x betas for ppc64 that causes them to
    emit the incorrect instruction for loading 64-bit operands. If not
    cast to void * the operands are loaded with lwz (load word and
    zero) instead of ld. This does not affect optimized mode. The work
    around is to cast to void * and was implemented similar to a
    work-around for a xlc bug.
  • Actually implement 64-bit add/sub. These functions were missing and
    fell back to the less efficient compare-and-swap implementations.

Thanks to @PHHargrove for helping to track this down. With this update
the GCC inline assembly works as expected with pgi and ppc64.

Signed-off-by: Nathan Hjelm [email protected]

This commit contains the following changes:

 - There is a bug in the PGI 16.x betas for ppc64 that causes them to
   emit the incorrect instruction for loading 64-bit operands. If not
   cast to void * the operands are loaded with lwz (load word and
   zero) instead of ld. This does not affect optimized mode. The work
   around is to cast to void * and was implemented similar to a
   work-around for a xlc bug.

 - Actually implement 64-bit add/sub. These functions were missing and
   fell back to the less efficient compare-and-swap implementations.

Thanks to @PHHargrove for helping to track this down. With this update
the GCC inline assembly works as expected with pgi and ppc64.

Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn
Copy link
Member Author

hjelmn commented Sep 3, 2016

@PHHargrove Ok, finally got all the ASM working. I went to fix add/sub but noticed they were missing so I implemented them as well. The only failure remaining was ompi_rb_tree and that one is due to opal_init_util not being called by the test. Not sure what is going on with that.

@hjelmn
Copy link
Member Author

hjelmn commented Sep 3, 2016

@jjhursey I am sure IBM wants to review this. Target is v2.0.2 if they will take it. v2.1.0 otherwise.

@hjelmn hjelmn added this to the v2.0.2 milestone Sep 3, 2016
@hjelmn
Copy link
Member Author

hjelmn commented Sep 3, 2016

@sjeaugey If @PHHargrove hasn't already filed a bug with PGI can you look into the 64-bit operand issue? The correct assembly is generated for pointers but not 64-bit integers.

@PHHargrove
Copy link
Member

If @PHHargrove hasn't already filed a bug with PGI

I have filed the bug report.

@hjelmn I think your ppc add/sub code is subtly incorrect.

Where you have "=&r" (t), I think you need "=&b" (t).
The b ensures a register other than r0 is selected.
This is necessary since r0 in an add (or subf?) instruction would be a zero value.

-Paul

@hjelmn
Copy link
Member Author

hjelmn commented Sep 3, 2016

Ok. I can update both the 64 and 32 bit atomics to use b instead of r. probably a bug if a compiler tries to use a special register but better safe than sorry given pgi loads 64-bit using 32-bit loads.

@hjelmn
Copy link
Member Author

hjelmn commented Sep 3, 2016

What is the pgi bug number? I want to document it in the code like is done in gasnet.

@PHHargrove
Copy link
Member

What is the pgi bug number? I want to document it in the code like is done in gasnet.

I reported by email, but don't have a response yet.
When I have the TPR number I will pass it along.
I would guess Tue since Mon is Labor Day.

@PHHargrove
Copy link
Member

probably a bug if a compiler tries to use a special register

The thing is that r0 is not special in most instructions.
It is actually a common temporary.
So, the compiler could choose it w/o being "buggy".
The presence of a b constraint gives programmers a way to identify contexts where r0 would be special.

@hjelmn
Copy link
Member Author

hjelmn commented Sep 4, 2016

Makes sense. So on ppc64 'b' means any register but r0. On x86_64 it means *bx. I will make the necessary updates later this weekend.

@PHHargrove
Copy link
Member

[...] So on ppc64 'b' means any register but r0. On x86_64 it means *bx. [...]

Yes, the "b" stands for "base".
If you haven't already, you should bookmark https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html

In some cases you still need to read the GCC sources to understand when the constraints really mean. For instance, you'll find the following in gcc/config/rs6000/rs6000.h in relation to the register class corresponding to the b constraint:

   However, r0 is special in that it cannot be used as a base register.
   So make a class for registers valid as base registers.

@hjelmn
Copy link
Member Author

hjelmn commented Sep 4, 2016

Ah, thanks for the link! The gcc inline assembly manual I usually use has none of this useful information.

@hjelmn
Copy link
Member Author

hjelmn commented Sep 6, 2016

I am trying to understand the GPR vs base register issue. The linux kernel uses "r" not "b" but it could be that with ELF r0 is never used with "r".

All I can find about GPR vs base is the using instruction. There r0 means no base register. There isn't similar language on add or subf. Do you know where I can find documentation on this feature?

@PHHargrove
Copy link
Member

Do you know where I can find documentation on this feature?

SHORT VERSION: It looks like your code is fine with r as the constraint, because I got a couple things confused.

LONG VERSION:

See page 310 of https://upc-bugs.lbl.gov/~phargrov/Docs/PPC/PPC64_ProgEnvMan_ver3.pdf
That page describes the addi instruction with the psuedo code:

if rA = 0 then rD ← EXTS(SIMM)
else rD ← rA + EXTS(SIMM)

So, in that instruction the source register cannot be r0 (li is actually encoded this way).
However, that is not the instruction you are using.

In GASNet's atomics add we are using more advanced inline asm to optimize for atomic-add involving a small integer constant and therefore may generate either add or addi instructions, depending on the operand. So GASNet needs to prevent use of r0, but your current asm does not.

The GASNet asm for 32-bit add-and-fetch, if you care:

      uint32_t _gasneti_atomic32_addfetch(gasneti_atomic32_t *v, uint32_t op) {
        register uint32_t result;
        __asm__ __volatile__ (
          "0:\t"
          "lwarx    %0,0,%2 \n\t"
          "add%I3   %0,%0,%3 \n\t"
          "stwcx.   %0,0,%2 \n\t"
          "bne-     0b \n\t"
          : "=&b"(result), "=m" (v->ctr)        /* constraint b = 'b'ase register (not r0) */
          : "r" (v), "Ir"(op) , "m"(v->ctr)
          : "cr0");
        return result;
      }

BTW: You current clobber cc which is all of the condition code registers, but the cmpw, cmpd, stwcx. and stdcw. instruction each only clobber cr0. So the compiler might be able to preserve values in cr1...7 if you change the clobbers from cc to cr0 in your asm.
Because you are using subfic in opal_atomic_cmpset_64() which is documented as setting the XER[CA] bit, that asm needs an additional clobber (xer, of course).
HOWEVER I am not 100% sure that PGI is going to know the xer clobber (or even cr0).

@hjelmn
Copy link
Member Author

hjelmn commented Sep 6, 2016

Ok, thanks @PHHargrove. That makes sense. I will commit this as-is then and work and figuring out how we can improve the clobbers. If you leave the VM up this week I will play around with it tomorrow.

@hjelmn hjelmn merged commit 27a2509 into open-mpi:master Sep 6, 2016
@PHHargrove
Copy link
Member

I am not planning to take down the VM my self, but the hosting provider is doing some maintenance on Friday that may take it down. So, you should aim to finish before then if possible.

@PHHargrove
Copy link
Member

OOPS - I have no idea why my previous comment was posted 3 times.
Removing the duplicates now,

@jjhursey
Copy link
Member

jjhursey commented Sep 8, 2016

I tried PGI 16.7-beta on a power8 machine, and it worked fine.

I've added PGI to the IBM MTT runs. Tonight's runs are a little off as I was updating the environment while it was kicking off. Tomorrow night's runs should be good to review.

When our Jenkins server comes back online I'll add a test there as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants