[herd] More efficient computation of atomic load X stores pairs by maranget · Pull Request #1735 · herd/herdtools7

maranget · 2026-03-02T16:50:06Z

We take inspiration from the efficient computation of register read-from (see PR #1704): atomic stores are ordered according to (extended by iico) program order and an atomic load is paired with the closest atomic write that follows it.

Preliminary to PR #1733.

herd/mem.ml

HadrienRenaud

This looks good to me, thanks @maranget. There is an important point about interleaving reads in a comment that I think needs to be addressed before this is merged. Maybe a justification of why this is not a problem is enough.

herd/memUtils.ml

herd/mem.ml

maranget · 2026-03-05T17:26:07Z

For some reason CI is not running, an idea why?

maranget · 2026-03-05T17:39:52Z

The complete test make test-all fails on ./herd/tests/instructions/AArch64.vmsa+mte/A016.litmus...

maranget · 2026-03-05T22:21:34Z

In some cases, such as combining vmsa and mte, there exist (atomic) read effects that are unrelated by generalised po. As a consequence it is not possible to build a set of effects that rely on generalised po.

I have reverted to finding the closest po-after write (and not closest effect) and added a naive non-intervening load test. As long as writes (to a given location and on a given thread) are po-ordered, it works.

HadrienRenaud

Thanks @maranget, this looks very good. Sorry I am coming with other big questions

Should we check that the order is total? I think it would only need a linear pass on the store set. We could probably put this in StoreSet actually:

let check_total set =
  if not (is_empty set) then
    Set.fold
      (fun e' e ->
         if e' != e (* min element *) then
           assert (is_before_strict e e');
         e')
      set (min_elt set)
    |> ignore

let of_list li =
  let set = of_list li in
  let () = check_total set in
  set

herd/mem.ml

HadrienRenaud · 2026-03-06T10:01:06Z

herd/mem.ml

+module
+  EvtSetByPo


Question: should this belong in memUtils?

That's a possibility, I was a bit affraid of the signature but there should be no difficulty,

HadrienRenaud · 2026-03-06T10:29:15Z

herd/memUtils.ml

  and collect_reg_loads_stores es = collect_by_loc2 es E.is_reg_load_any E.is_reg_store_any

+  let accumulate_loc_proc proc loc e =
+  IntMap.update proc @@ function


nit: indentation is wrong

I think you dropped the fix on this one

herd/mem.ml

maranget · 2026-03-06T12:43:21Z

Thanks @maranget, this looks very good. Sorry I am coming with other big questions

Should we check that the order is total? I think it would only need a linear pass on the store set. We could probably put this in StoreSet actually:
let check_total set =
  if not (is_empty set) then
    Set.fold
      (fun e' e ->
         if e' != e (* min element *) then
           assert (is_before_strict e e');
         e')
      set (min_elt set)
    |> ignore

let of_list li =
  let set = of_list li in
  let () = check_total set in
  set

First consider that two write effects performed by different instructions are ordered, provided po is a total order.
Moreover the iico relation cannot have a write effect as its source. As a consequence, if two stores to a given location are performed by the same instruction they are unrelated. Hence our "stores to a given relation are totally orderd by the generalised po" is equivalent to "any instruction performs at most one store to a given location"

In fact this sounds quite right because, if some intruction could perform two or more stores to a given location, then this instruction would be non-determinist (unless the value stored always are identical).

Hence I plan to simplify the ordering relation behind StoreSet and see what happens.

maranget · 2026-03-06T15:18:52Z

Hence I plan to simplify the ordering relation behind StoreSet and see what happens.

I have followed your suggestions. I have also tried to use plain po as the order behind StoreSet.

It does not work because of AArch64+ASL. In that setting, the initialisation of the program counter is iico-before all PC reads and writes in the instruction. This is quite surprising.

HadrienRenaud · 2026-03-06T15:35:12Z

It does not work because of AArch64+ASL. In that setting, the initialisation of the program counter is iico-before all PC reads and writes in the instruction. This is quite surprising.

I do not understand. The initialisation of the PC is a register write, why does it interfere here? Also why would ASL even call this function?

fsestini · 2026-03-06T15:35:44Z

herd/memUtils.mli

+ *  to this set.
+ *)
+  module
+    EvtSetByPo :


FWIW, I would have kept this in mem.ml for now. EvtSetByPo's underlying assumptions seem fairly coupled to its usage within mem.ml, as it relies on a non‑total compare, which make sense in the specific context of mem.ml but seem riskier/easier to miss if exposed as a shared utility.

Ok, I can revert this easily.

maranget · 2026-03-06T16:17:33Z

It does not work because of AArch64+ASL. In that setting, the initialisation of the program counter is iico-before all PC reads and writes in the instruction. This is quite surprising.

I do not understand. The initialisation of the PC is a register write, why does it interfere here? Also why would ASL even call this function?

I do not understand either how a register write can be iico_data before something....

In AArch64_ASL mode, the failing StoreSet is used by match_reg_events, not by make_atomic_pairs. Here is the backtrace:

Not ordered stores (eeid=ev35 action=W1:PCq=200008) and (eeid=ev39 action=W1:PCq=200004)

Fatal: File "./herd/tests/instructions/AArch64/L027.litmus" Adios
Fatal error: exception File "herd/memUtils.ml", line 83, characters 14-20: Assertion failed
Raised at Dune__exe__MemUtils.Make.EvtSetByPo.compare in file "herd/memUtils.ml", line 83, characters 14-26
Called from Stdlib__Set.Make.add in file "set.ml", line 135, characters 18-33
Called from Dune__exe__Mem.Make.match_reg_events.(fun) in file "herd/mem.ml", line 878, characters 19-42
Called from Stdlib__Map.Make.fold in file "map.ml", line 329, characters 19-42
Called from Stdlib__Map.Make.fold in file "map.ml", line 329, characters 26-41
Called from Dune__exe__Mem.Make.do_solve_regs.(fun) in file "herd/mem.ml", lines 981-983, characters 10-18
Called from Dune__exe__Mem.Make.calculate_rf_with_cnstrnts in file "herd/mem.ml", line 2339, characters 12-36
Called from Stdlib__List.fold_left in file "list.ml", line 123, characters 24-34
Called from Dune__exe__Top_herd.Make.run in file "herd/top_herd.ml", line 478, characters 12-53
Re-raised at Dune__exe__Top_herd.Make.run in file "herd/top_herd.ml", line 483, characters 12-19
Called from Dune__exe__RunTest.Make.run in file "herd/runTest.ml", line 92, characters 8-29
Called from Misc.input_protect_gen in file "lib/misc.ml", line 487, characters 8-14
Re-raised at Misc.input_protect_gen in file "lib/misc.ml", line 487, characters 41-48
Called from Dune__exe__Herd.(fun) in file "herd/herd.ml", line 793, characters 12-31
Re-raised at Dune__exe__Herd.(fun) in file "herd/herd.ml", line 817, characters 11-18
Called from Dune__exe__Herd in file "herd/herd.ml", lines 791-818, characters 4-27

maranget · 2026-03-06T16:27:08Z

And here the image of the failing candidate for a much simpler test;

AArch64 L061
(* branch to instruction address with .section name *)

{ 0:X0 = 1; }
  P0            ;
  B .text       ;
  ADD W0, W0, #1;
.text: NOP      ;

A.pdf

This image was produced by the compare function of EvtSetByPo that uses E.po_strict in place of is_before_strict, as this was the case before the last forced push.

HadrienRenaud · 2026-03-06T16:32:30Z

Thank you, this makes sense. I'll have a look soon

maranget · 2026-03-06T20:46:43Z

Commit cb2e0cd changes some of iico_data into iico_order, which may be more appropriate. Hopefully this change is of aesthetic nature. I am not sure we should do it...

maranget · 2026-03-06T21:04:42Z

By the way something has changed with PR #1704. Consider this ambiguous test:

AArch64 LDPSame
{
int t[2]={1,2};
0:X0=t;
}
  P0           ;
LDP W1,W1,[X0] ;
locations [0:X1;]

With current master, we have:

% herd7 LDPSame.litmus 
Not ordered stores (eeid=f action=W0:X1q=S2) and (eeid=g action=W0:X1q=S4)

Fatal: File "LDPSame.litmus" Adios
Fatal error: exception File "herd/mem.ml", line 882, characters 6-12: Assertion failed

While before PR #1704, we had:

% herd7 -version LDPSame.litmus 
7.58+1, Rev: 238a08e757f7c4063dd232294f0df835daaa403d
% herd7 LDPSame.litmus 
Warning: File "LDPSame.litmus": Ambiguous po for register 0:X1

HadrienRenaud · 2026-03-09T09:32:03Z

Thanks @maranget for the bug report. I'll have a look, hopefully I'll have a PR today.

HadrienRenaud · 2026-03-09T12:43:19Z

I've picked a few bits of this PR into #1744, and fixed some problems that you were having in this PR.
I would suggest merging #1744 first, then rebasing this one on top, because #1744 is a fix when this one is an improvement. If you want to merge #1735, or part of it, before #1744, then I can rebase #1744 on top of master.

To summarize the problems found (and fixed):

There was a iico_data starting from a register write because this register write was read-from inside the same instruction. We now require Intra-Instruction Data Dependencies to start from a register read or a memory read.
There was 2 register writes to the same location because we needed to initialize PC inside the instruction if it was needed. We now just get rid of the PC events during the ASL to AArch64 translation.

[herd,asl] Fix ambiguous PC writes This PR cherry picks a commit from #1735 that introduces some checks on the coherence of writes, and fixes the problems highlighted from those checks. ## PC handling in ASL+herd The aarch64 handwritten semantics in herd handle the program counter differently from the ASL code: the ASL code has an explicit PC register, whereas the handwritten semantics rely on po and BCC branching events. We had introduced in #711 an AArch64 PC register in ASL to support the ASL code that uses it. This implementation wrote 2 times to the PC: 1. First at the beginning of the instruction to correctly initialize the PC value 2. Secondly when the branching is committed and we actually increments the PC This poses a few problems, mainly because the current herd algorithm does not handle multiple writes at the same register in a single instruction. We thus remove the translation of the PC accesses, replaced here by a BCC commit event in case of a write. This leaves implicit the PC reads. ## Intra-Instruction Dependencies starting from a register write The PC handling described above had a strange graph, where a `iico_data` was starting at a register write. This is because this register write was read-from in the same instruction. (Precisely, the first write to PC was read from to query the current PC and then increment it, resulting in a graph like [the one](https://github.com/user-attachments/files/25799596/A.pdf) shown by Luc in [a comment](#1735 (comment)) on #1704). We simply add guards on the Intra-Instruction Data dependencies to force them to start at a register read or at a memory read.

maranget · 2026-03-19T15:57:46Z

Hi @HadrienRenaud . At last, we may be ready. Would you review?

HadrienRenaud

This looks very good, thanks @maranget

We take inspiration from the efficient computation of register read-from (see PR #1704): atomic stores by a given thread and to a given location are ordered according to (extended by iico) program order and an atomic load is paired with the closest atomic store that follows it. Additionaly it is checked that there is no atomic load in-between. That is, the next effect in (generalised) po indeed is a write.

HadrienRenaud reviewed Mar 3, 2026

View reviewed changes

herd/mem.ml Outdated Show resolved Hide resolved

maranget commented Mar 3, 2026

View reviewed changes

herd/mem.ml Show resolved Hide resolved

maranget force-pushed the opt-atomic-pairs branch from 02e15d8 to 9a98003 Compare March 3, 2026 14:18

maranget marked this pull request as ready for review March 3, 2026 14:37

maranget force-pushed the opt-atomic-pairs branch from 3676f0e to 2689dce Compare March 3, 2026 14:42

HadrienRenaud reviewed Mar 5, 2026

View reviewed changes

herd/memUtils.ml Show resolved Hide resolved

herd/mem.ml Outdated Show resolved Hide resolved

herd/mem.ml Outdated Show resolved Hide resolved

maranget force-pushed the opt-atomic-pairs branch from 2689dce to eada4fa Compare March 5, 2026 17:16

maranget force-pushed the opt-atomic-pairs branch from eada4fa to 6b2ef26 Compare March 5, 2026 22:16

maranget force-pushed the opt-atomic-pairs branch 2 times, most recently from 399c3ed to c3b869a Compare March 5, 2026 23:18

HadrienRenaud reviewed Mar 6, 2026

View reviewed changes

maranget force-pushed the opt-atomic-pairs branch from d67f208 to b2ceb79 Compare March 6, 2026 15:30

fsestini reviewed Mar 6, 2026

View reviewed changes

HadrienRenaud mentioned this pull request Mar 9, 2026

[herd,asl] Fix ambiguous PC writes #1744

Merged

maranget force-pushed the opt-atomic-pairs branch 2 times, most recently from b4bca44 to 31a0360 Compare March 19, 2026 15:56

HadrienRenaud approved these changes Mar 19, 2026

View reviewed changes

maranget force-pushed the opt-atomic-pairs branch from 31a0360 to d2aec4f Compare March 20, 2026 13:28

maranget force-pushed the opt-atomic-pairs branch from d2aec4f to af44a24 Compare March 20, 2026 13:50

maranget merged commit ce8f8ce into master Mar 20, 2026
3 checks passed

Conversation

maranget commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HadrienRenaud left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maranget commented Mar 5, 2026

Uh oh!

maranget commented Mar 5, 2026

Uh oh!

maranget commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HadrienRenaud left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maranget commented Mar 6, 2026

Uh oh!

maranget commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HadrienRenaud commented Mar 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maranget commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maranget commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HadrienRenaud commented Mar 6, 2026

Uh oh!

maranget commented Mar 6, 2026

Uh oh!

maranget commented Mar 6, 2026

Uh oh!

HadrienRenaud commented Mar 9, 2026

Uh oh!

HadrienRenaud commented Mar 9, 2026

Uh oh!

maranget commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HadrienRenaud left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maranget commented Mar 2, 2026 •

edited

Loading

maranget commented Mar 5, 2026 •

edited

Loading

maranget commented Mar 6, 2026 •

edited

Loading

maranget commented Mar 6, 2026 •

edited

Loading

maranget commented Mar 6, 2026 •

edited

Loading

maranget commented Mar 19, 2026 •

edited

Loading