|
| 1 | +[[riscv-atomics]] |
| 2 | += RISC-V Atomics ABI Specification |
| 3 | +ifeval::["{docname}" == "riscv-atomics"] |
| 4 | +include::prelude.adoc[] |
| 5 | +endif::[] |
| 6 | + |
| 7 | +== RISC-V atomics mappings |
| 8 | + |
| 9 | +This specifies mappings of C and C\++ atomic operations to RISC-V |
| 10 | +machine instructions. Other languages, for example Java, provide similar |
| 11 | +facilities that should be implemented in a consistent manner, usually |
| 12 | +by applying the mapping for the corresponding C++ primitive. |
| 13 | + |
| 14 | +NOTE: Because different programming languages may be used within the same |
| 15 | +process, these mappings must be compatible across programming languages. For |
| 16 | +example, Java programmers expect memory ordering guarantees to be enforced even |
| 17 | +if some of the actual memory accesses are performed by a library written in |
| 18 | +C. |
| 19 | + |
| 20 | +NOTE: Though many mappings are possible, not all of them will interoperate |
| 21 | +correctly. In particular, many mapping combinations will not |
| 22 | +correctly enforce ordering between a C++ `memory_order_seq_cst` |
| 23 | +store and a subsequent `memory_order_seq_cst` load. |
| 24 | + |
| 25 | +NOTE: These mappings are very similar to those that originally appeared in the |
| 26 | +appendix of the RISC-V "unprivileged" architecture specification as |
| 27 | +"Mappings from C/C++ primitives to RISC-V Primitives", which we will |
| 28 | +refer to by their 2019 historical label of "Table A.6". That mapping may |
| 29 | +be used, _except_ that `atomic_store(memory_order_seq_cst)` must have an |
| 30 | +an extra trailing fence for compatibility with the "Hypothetical mappings ..." |
| 31 | +table in the same section, which we similarly refer to as "Table A.7". |
| 32 | +As a result, we allow the "Table A.7" mappings as well. |
| 33 | + |
| 34 | +NOTE: Our primary design goal is to maximize performance of the "Table A.7" |
| 35 | +mappings. These require additional load-acquire and store-release instructions, |
| 36 | +and are this not immediately usable. By requiring the extra store fence. |
| 37 | +or equivalent, we avoid an ABI break when moving to the "Table A.7" |
| 38 | +mappings in the future, in return for a small performance penalty in the |
| 39 | +short term. |
| 40 | + |
| 41 | +For each construct, we provide a mapping that assumes only the A extension. |
| 42 | +In some cases, we provide additional mappings that assume a future load-acquire |
| 43 | +and store-release extension, as denoted by note 1 in the table. |
| 44 | + |
| 45 | +All mappings interoperate correctly, and with the original "Table A.6" |
| 46 | +mappings, _except_ that mappings marked with note 3 do not interoperate |
| 47 | +with the original "Table A.6" mappings. |
| 48 | + |
| 49 | +We present the mappings as a table in 3 sections. The first |
| 50 | +deals with translations for loads, stores, and fences. The next two sections |
| 51 | +address mappings for read-modify-write operations like `fetch_add`, and |
| 52 | +`exchange`. The second section deals with operations that have direct |
| 53 | +`amo` instruction equivalents in the RISC-V A extension. The final |
| 54 | +section deals with other read-modify-write operations that require |
| 55 | +the `lr` and `sc` instructions. |
| 56 | + |
| 57 | +[[tab:c11mappings]] |
| 58 | +.Mappings from C/C++ primitives to RISC-V primitives |
| 59 | +[cols="<22,<18,<4",options="header",] |
| 60 | +|=== |
| 61 | +|C/C++ Construct |RVWMO Mapping |Notes |
| 62 | + |
| 63 | +|Non-atomic load |`l{b\|h\|w\|d}` | |
| 64 | + |
| 65 | +|`atomic_load(memory_order_relaxed)` |`l{b\|h\|w\|d}` | |
| 66 | + |
| 67 | +|`atomic_load(memory_order_acquire)` |`l{b\|h\|w\|d}; fence r,rw` | |
| 68 | + |
| 69 | +|`atomic_load(memory_order_acquire)` |<RCsc atomic load-acquire> |1, 2 |
| 70 | + |
| 71 | +|`atomic_load(memory_order_seq_cst)` |`fence rw,rw; l{b\|h\|w\|d}; fence r,rw` | |
| 72 | + |
| 73 | +|`atomic_load(memory_order_seq_cst)` |<RCsc atomic load-acquire> |1, 3 |
| 74 | + |
| 75 | +|Non-atomic store |`s{b\|h\|w\|d}` | |
| 76 | + |
| 77 | +|`atomic_store(memory_order_relaxed)` |`s{b\|h\|w\|d}` | |
| 78 | + |
| 79 | +|`atomic_store(memory_order_release)` |`fence rw,w; s{b\|h\|w\|d}` | |
| 80 | + |
| 81 | +|`atomic_store(memory_order_release)` |<RCsc atomic store-release> |1, 2 |
| 82 | + |
| 83 | +|`atomic_store(memory_order_seq_cst)` |`fence rw,w; s{b\|h\|w\|d}; fence rw,rw;` | |
| 84 | + |
| 85 | +|`atomic_store(memory_order_seq_cst)` |`amoswap.rl{w\|d};` |4 |
| 86 | + |
| 87 | +|`atomic_store(memory_order_seq_cst)` |<RCsc atomic store-release> |1 |
| 88 | + |
| 89 | +|`atomic_thread_fence(memory_order_acquire)` |`fence r,rw` | |
| 90 | + |
| 91 | +|`atomic_thread_fence(memory_order_release)` |`fence rw,w` | |
| 92 | + |
| 93 | +|`atomic_thread_fence(memory_order_acq_rel)` |`fence.tso` | |
| 94 | + |
| 95 | +|`atomic_thread_fence(memory_order_seq_cst)` |`fence rw,rw` | |
| 96 | +|=== |
| 97 | + |
| 98 | +[cols="<20,<20,<4",options="header",] |
| 99 | +|=== |
| 100 | +|C/C++ Construct |RVWMO AMO Mapping |Notes |
| 101 | + |
| 102 | +|`atomic_<op>(memory_order_relaxed)` |`amo<op>.{w\|d}` |4 |
| 103 | + |
| 104 | +|`atomic_<op>(memory_order_acquire)` |`amo<op>.{w\|d}.aq` |4 |
| 105 | + |
| 106 | +|`atomic_<op>(memory_order_release)` |`amo<op>.{w\|d}.rl` |4 |
| 107 | + |
| 108 | +|`atomic_<op>(memory_order_acq_rel)` |`amo<op>.{w\|d}.aqrl` |4 |
| 109 | + |
| 110 | +|`atomic_<op>(memory_order_seq_cst)` |`amo<op>.{w\|d}.aqrl` |4 |
| 111 | + |
| 112 | +|=== |
| 113 | + |
| 114 | +[cols="<16,<24,<4",options="header",] |
| 115 | +|=== |
| 116 | +|C/C++ Construct |RVWMO LR/SC Mapping |Notes |
| 117 | + |
| 118 | +|`atomic_<op>(memory_order_relaxed)` |`loop:lr.{w\|d}; <op>; sc.{w\|d}; bnez loop` |4 |
| 119 | + |
| 120 | +|`atomic_<op>(memory_order_acquire)` |
| 121 | +|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}; bnez loop` |4 |
| 122 | + |
| 123 | +|`atomic_<op>(memory_order_release)` |
| 124 | +|`loop:lr.{w\|d}; <op>; sc.{w\|d}.rl; bnez loop` |4 |
| 125 | + |
| 126 | +|`atomic_<op>(memory_order_acq_rel)` |
| 127 | +|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}.rl; bnez loop` |4 |
| 128 | + |
| 129 | +|`atomic_<op>(memory_order_seq_cst)` |
| 130 | +|`loop:lr.{w\|d}.aqrl; <op>; sc.{w\|d}.rl; bnez loop` |4 |
| 131 | + |
| 132 | +|`atomic_<op>(memory_order_seq_cst)` |
| 133 | +|`loop:lr.{w\|d}.aq; <op>; sc.{w\|d}.rl; bnez loop` |3, 4 |
| 134 | +|=== |
| 135 | + |
| 136 | +=== Meaning of notes in table |
| 137 | + |
| 138 | +1) Depends on a load instruction with an RCsc aquire annotation, |
| 139 | +or a store instruction with an RCsc release annotation. These are curently |
| 140 | +under discussion, but the specification has not yet been approved. |
| 141 | + |
| 142 | +2) An RCpc load or store would also suffice, if it were to be introduced |
| 143 | +in the future. |
| 144 | + |
| 145 | +3) Incompatible with the original "Table A.6" mapping. Do not combine these |
| 146 | +mappings with code generated by a compiler using those older mappings. |
| 147 | +(This was mostly used by the initial LLVM implementations for RISC-V.) |
| 148 | + |
| 149 | +4) Currently only directly possible for 32- and 64-bit operands. |
| 150 | + |
| 151 | +=== Other conventions |
| 152 | + |
| 153 | +It is expected that the RVWMO AMO Mappings will be used for atomic read-modify-write |
| 154 | +operations that are directly supported by corresponding AMO instructions, |
| 155 | +and that LR/SC mappings will be used for the remainder, currently |
| 156 | +including compare-exchange operations. Compare-exchange LR/SC sequences |
| 157 | +on the containing 32-bit word should be used for shorter operands. Thus, |
| 158 | +a `fetch_add` operation on a 16-bit quantity would use a 32-bit LR/SC sequence. |
| 159 | + |
| 160 | +It is acceptable, but usually undesirable for performance reasons, to use LR/SC |
| 161 | +mappings where an AMO mapping would suffice. |
| 162 | + |
| 163 | +Atomics do not imply any ordering for IO operations. IO operations |
| 164 | +should include sufficient fences to prevent them from being visibly |
| 165 | +reordered with atomic operations. |
| 166 | + |
| 167 | +Float and double atomic loads and stores should be implemented using |
| 168 | +the integer sequences. |
| 169 | + |
| 170 | +Float and double read-modify-write instructions should consist of a loop performing |
| 171 | +an initial plain load of the value, followed by the floating point |
| 172 | +computation, followed by an integer compare-and-swap sequence to try to |
| 173 | +store back the updated value. This avoids floating point |
| 174 | +instructions between LR and SC instructions. Depending on language requirements, |
| 175 | +it may be necessary to save and restore floating-point exception flags in the |
| 176 | +case of an operation that is later redone due to a failed SC operation. |
| 177 | + |
| 178 | +NOTE: The "Eventual Success of Store-Conditional Instructions" section |
| 179 | +in the ISA specification provides that essential progress guarantee only |
| 180 | +if there are no floating point instructions between the LR and matching SC |
| 181 | +instruction. By compiling such sequences with an "extra" ordinary load, |
| 182 | +and performing the floating point computation before the LR, we preserve |
| 183 | +the guarantee. |
0 commit comments