Looking at #154654, I noticed the generated code for PPC32PICGOT is generating a bl instruction. I suspect that could be detrimental to branch prediction performance. See rust-lang/rust#145693 for example codegen.
To get a PC-relative address, bcl 20,31,$+4 is the only documented method using the branch instruction. I'll note that BO=0x20 behavior isn't documented. The usage in PPC32PICGOT is slightly different, it would look like bcl 20,31,$+8. Is that also a performance usage of bcl?