The scalarizeExtExtract combine packs into i(N*8) and then extracts lanes via lshr/and. The shift amount assumes little-endian ordering which is incorrect for big-endian targets.
This causes miscompiles on big-endian targets such as AIX/PowerPC under -O3 -flto.