-
Notifications
You must be signed in to change notification settings - Fork 15.4k
Description
Problem
Atomic Compare-and-Swap instructions (amocas) are part of the experimental zacas1p0 extension. For more information on amocas instructions, see Chapter 16. "Zacas" Extension for Atomic Compare-and-Swap (CAS) Instructions, Version 1.0.0, from the The RISC-V Instruction Set Manual Volume I.
During code generation from inline assembly, the compiler "self-sabotages" itself by using the wrong regsiters.
Steps to reproduce
Suppose I have a file bug.c. I am trying to use the 128 bit version of amocas from inline assembly, like this:
__int128 amocas_q_aqrl(__int128 *atom, __int128 value)
{
__int128 ret = 0xcafe;
__asm__ __volatile__(" amocas.q.aqrl %1, %2, (%0)"
: "+r"(atom), "+r"(ret)
: "r"(value)
: "memory");
return ret + value;
}I am compiling it like this:
clang -menable-experimental-extensions --target=riscv64 -march=rv64imafdc_zicsr_zifencei_zacas1p0 -c bug.c --gcc-toolchain=/usr/riscv64-linux-gnu-
I am getting the following error:
bug.c:4:24: error: register must be even
4 | __asm__ __volatile__(" amocas.q.aqrl %1, %2, (%0)"
| ^
<inline asm>:1:23: note: instantiated into assembly here
1 | amocas.q.aqrl a0, a3, (a2)
| ^
1 error generated.
My setup is the following:
- Running on Linux 6.7.12-amd64 SMP PREEMPT_DYNAMIC Debian 6.7.12-1 (2024-04-24) x86_64
- For cross compilation, I have installed gcc-riscv64-linux-gnu version 4:14.1.0-2
- I have built llvm and clang from source. At the time of opening this issue, I am on branch
main, last commit is 2190ffa - I can also reproduce it with a pre-installed version of clang 17.0.5
Note that other variants of amocas work fine, for example:
long amocas_d_aqrl(long *atom, long value)
{
long ret = 0xcafe;
__asm__ __volatile__(" amocas.d.aqrl %1, %2, (%0)"
: "+r"(atom), "+r"(ret)
: "r"(value)
: "memory");
return ret + value;
}The above compiles. Currently I have had this problem only with amocas.q, but it might be the case that amocas.d causes similar issues on riscv32.
Current workaround
I can get this to compile by tricking the compiler into not using register "a3":
__int128 amocas_q_aqrl(__int128 *atom, __int128 value)
{
__int128 ret = 0xcafe;
// must clobber "a3" register due to a compiler bug
__asm__ __volatile__(" amocas.q.aqrl %1, %2, (%0)"
: "+r"(atom), "+r"(ret)
: "r"(value)
: "memory", "a3");
return ret + value;
}I'm leaving this here in case somebody experiences a similar issue. I would've gladly looked more into it, but I've got no idea where to start.