In several workloads, I am seeing a code size expansion for Zicond.
This example of montgommery multiplication illustrates the scenario quite well:
https://godbolt.org/z/nKEv1dWra
The Zicond version is entirely branchless, which is great, but it comes at the expense of static code size, which should not happen at -Oz.
This function alone has several such cases, and they can be isolated, e.g.: https://godbolt.org/z/33P9szabs
uint64
simple (uint64 uhi, uint64 tlo, uint64 ulo)
{
if (ulo < tlo)
uhi = uhi + 1;
return uhi;
}
Zicond uses 10 instructions, while no Zicond needs 8 to get the job done.