Skip to content

Commit 3a92341

Browse files
committed
Initial psABI atomics specification
1 parent cfed71f commit 3a92341

File tree

3 files changed

+186
-0
lines changed

3 files changed

+186
-0
lines changed

introduction.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ This specification uses the following terms and abbreviations:
3333
| XLEN | The width of an integer register in bits
3434
| FLEN | The width of a floating-point register in bits
3535
| Linker relaxation | A mechanism for optimizing programs at link-time, see <<Linker Relaxation>> for more detail.
36+
| RVWMO | RISC-V Weak Memory Order, as defined in the RISC-V specification.
3637
|===
3738

3839
= Status of ABI

riscv-abi.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,5 @@ include::riscv-elf.adoc[]
1212
include::riscv-dwarf.adoc[]
1313

1414
include::riscv-rtabi.adoc[]
15+
16+
include::riscv-atomic.adoc[]

riscv-atomic.adoc

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
[[riscv-atomics]]
2+
= RISC-V Atomics ABI Specification
3+
ifeval::["{docname}" == "riscv-atomics"]
4+
include::prelude.adoc[]
5+
endif::[]
6+
7+
== RISC-V atomics mappings
8+
9+
This specifies mappings of C and C\++ atomic operations to RISC-V
10+
machine instructions. Other languages, for example Java, provide similar
11+
facilities that should be implemented in a consistent manner, usually
12+
by applying the mapping for the corresponding C++ primitive.
13+
14+
NOTE: Because different programming languages may be used within the same
15+
process, these mappings must be compatible across programming languages. For
16+
example, Java programmers expect memory ordering guarantees to be enforced even
17+
if some of the actual memory accesses are performed by a library written in
18+
C.
19+
20+
NOTE: Though many mappings are possible, not all of them will interoperate
21+
correctly. In particular, many mapping combinations will not
22+
correctly enforce ordering  between a C++ `memory_order_seq_cst`
23+
store and a subsequent `memory_order_seq_cst` load.
24+
25+
NOTE: These mappings are very similar to those that originally appeared in the
26+
appendix of the RISC-V "unprivileged" architecture specification as
27+
"Mappings from C/C++ primitives to RISC-V Primitives", which we will
28+
refer to by their 2019 historical label of "Table A.6". That mapping may
29+
be used, _except_ that `atomic_store(memory_order_seq_cst)` must have an
30+
an extra trailing fence for compatibility with the "Hypothetical mappings ..."
31+
table in the same section, which we similarly refer to as "Table A.7".
32+
As a result, we allow the "Table A.7" mappings as well.
33+
34+
NOTE: Our primary design goal is to maximize performance of the "Table A.7"
35+
mappings. These require additional load-acquire and store-release instructions,
36+
and are this not immediately usable. By requiring the extra store fence.
37+
or equivalent, we avoid an ABI break when moving to the "Table A.7"
38+
mappings in the future, in return for a small performance penalty in the
39+
short term.
40+
41+
For each construct, we provide a mapping that assumes only the A extension.
42+
In some cases, we provide additional mappings that assume a future load-acquire
43+
and store-release extension, as denoted by note 1 in the table.
44+
45+
All mappings interoperate correctly, and with the original "Table A.6"
46+
mappings, _except_ that mappings marked with note 3 do not interoperate
47+
with the original "Table A.6" mappings.
48+
49+
We present the mappings as a table in 3 sections. The first
50+
deals with translations for loads, stores, and fences. The next two sections
51+
address mappings for read-modify-write operations like `fetch_add`, and
52+
`exchange`. The second section deals with operations that have direct
53+
`amo` instruction equivalents in the RISC-V A extension. The final
54+
section deals with other read-modify-write operations that require
55+
the `lr` and `sc` instructions.
56+
57+
[[tab:c11mappings]]
58+
.Mappings from C/C++ primitives to RISC-V primitives
59+
[cols="<22,<18,<4",options="header",]
60+
|===
61+
|C/C++ Construct |RVWMO Mapping |Notes
62+
63+
|Non-atomic load |`l{b\|h\|w\|d}` |
64+
65+
|`atomic_load(memory_order_relaxed)` |`l{b\|h\|w\|d}` |
66+
67+
|`atomic_load(memory_order_acquire)` |`l{b\|h\|w\|d}; fence r,rw` |
68+
69+
|`atomic_load(memory_order_acquire)` |<RCsc atomic load-acquire> |1, 2
70+
71+
|`atomic_load(memory_order_seq_cst)` |`fence rw,rw; l{b\|h\|w\|d}; fence r,rw` |
72+
73+
|`atomic_load(memory_order_seq_cst)` |<RCsc atomic load-acquire> |1, 3
74+
75+
|Non-atomic store |`s{b\|h\|w\|d}` |
76+
77+
|`atomic_store(memory_order_relaxed)` |`s{b\|h\|w\|d}` |
78+
79+
|`atomic_store(memory_order_release)` |`fence rw,w; s{b\|h\|w\|d}` |
80+
81+
|`atomic_store(memory_order_release)` |<RCsc atomic store-release> |1, 2
82+
83+
|`atomic_store(memory_order_seq_cst)` |`fence rw,w; s{b\|h\|w\|d}; fence rw,rw;` |
84+
85+
|`atomic_store(memory_order_seq_cst)` |`amoswap.rl{w\|d};` |4
86+
87+
|`atomic_store(memory_order_seq_cst)` |<RCsc atomic store-release> |1
88+
89+
|`atomic_thread_fence(memory_order_acquire)` |`fence r,rw` |
90+
91+
|`atomic_thread_fence(memory_order_release)` |`fence rw,w` |
92+
93+
|`atomic_thread_fence(memory_order_acq_rel)` |`fence.tso` |
94+
95+
|`atomic_thread_fence(memory_order_seq_cst)` |`fence rw,rw` |
96+
|===
97+
98+
[cols="<20,<20,<4",options="header",]
99+
|===
100+
|C/C++ Construct |RVWMO AMO Mapping |Notes
101+
102+
|`atomic_<op>(memory_order_relaxed)` |`amo<op>.{w\|d}` |4
103+
104+
|`atomic_<op>(memory_order_acquire)` |`amo<op>.{w\|d}.aq` |4
105+
106+
|`atomic_<op>(memory_order_release)` |`amo<op>.{w\|d}.rl` |4
107+
108+
|`atomic_<op>(memory_order_acq_rel)` |`amo<op>.{w\|d}.aqrl` |4
109+
110+
|`atomic_<op>(memory_order_seq_cst)` |`amo<op>.{w\|d}.aqrl` |4
111+
112+
|===
113+
114+
[cols="<16,<24,<4",options="header",]
115+
|===
116+
|C/C++ Construct |RVWMO LR/SC Mapping |Notes
117+
118+
|`atomic_<op>(memory_order_relaxed)` |`loop:lr.{w\|d}; <op>; sc.{w\|d}; bnez loop` |4
119+
120+
|`atomic_<op>(memory_order_acquire)`
121+
|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}; bnez loop` |4
122+
123+
|`atomic_<op>(memory_order_release)`
124+
|`loop:lr.{w\|d}; <op>; sc.{w\|d}.rl; bnez loop` |4
125+
126+
|`atomic_<op>(memory_order_acq_rel)`
127+
|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}.rl; bnez loop` |4
128+
129+
|`atomic_<op>(memory_order_seq_cst)`
130+
|`loop:lr.{w\|d}.aqrl; <op>; sc.{w\|d}.rl; bnez loop` |4
131+
132+
|`atomic_<op>(memory_order_seq_cst)`
133+
|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}.rl; bnez loop` |3, 4
134+
|===
135+
136+
=== Meaning of notes in table
137+
138+
1) Depends on a load instruction with an RCsc aquire annotation,
139+
or a store instruction with an RCsc release annotation. These are curently
140+
under discussion, but the specification has not yet been approved.
141+
142+
2) An RCpc load or store would also suffice, if it were to be introduced
143+
in the future.
144+
145+
3) Incompatible with the original "Table A.6" mapping. Do not combine these
146+
mappings with code generated by a compiler using those older mappings.
147+
(This was mostly used by the initial LLVM implementations for RISC-V.)
148+
149+
4) Currently only directly possible for 32- and 64-bit operands.
150+
151+
=== Other conventions
152+
153+
It is expected that the RVWMO AMO Mappings will be used for atomic read-modify-write
154+
operations that are directly supported by corresponding AMO instructions,
155+
and that LR/SC mappings will be used for the remainder, currently
156+
including compare-exchange operations. Compare-exchange LR/SC sequences
157+
on the containing 32-bit word should be used for shorter operands. Thus,
158+
a `fetch_add` operation on a 16-bit quantity would use a 32-bit LR/SC sequence.
159+
160+
It is acceptable, but usually undesirable for performance reasons, to use LR/SC
161+
mappings where an AMO mapping would suffice.
162+
163+
Atomics do not imply any ordering for IO operations. IO operations
164+
should include sufficient fences to prevent them from being visibly
165+
reordered with atomic operations.
166+
167+
Float and double atomic loads and stores should be implemented using
168+
the integer sequences.
169+
170+
Float and double read-modify-write instructions should consist of a loop performing
171+
an initial plain load of the value, followed by the floating point
172+
computation, followed by an integer compare-and-swap sequence to try to
173+
store back the updated value. This avoids floating point
174+
instructions between LR and SC instructions. Depending on language requirements,
175+
it may be necessary to save and restore floating-point exception flags in the
176+
case of an operation that is later redone due to a failed SC operation.
177+
178+
NOTE: The "Eventual Success of Store-Conditional Instructions" section
179+
in the ISA specification provides that essential progress guarantee only
180+
if there are no floating point instructions between the LR and matching SC
181+
instruction. By compiling such sequences with an "extra" ordinary load,
182+
and performing the floating point computation before the LR, we preserve
183+
the guarantee.

0 commit comments

Comments
 (0)