Skip to content

[herd] Use "static" po indices when rendering execution graphs#1628

Merged
fsestini merged 2 commits intoherd:masterfrom
fsestini:static-poi
Mar 4, 2026
Merged

[herd] Use "static" po indices when rendering execution graphs#1628
fsestini merged 2 commits intoherd:masterfrom
fsestini:static-poi

Conversation

@fsestini
Copy link
Copy Markdown
Collaborator

This PR make some small changes to the way herd7 renders its dot execution graphs, in particular relating to the poi indices.

Motivation

Consider this litmus test:

AArch64 LB+dmb.ld+ctrl
{
0:X0=x; 0:X3=y;
1:X0=y; 1:X3=x;
}
 P0          | P1           ;
 LDR W1,[X0] | LDR W1,[X0]  ;
 DMB LD      | CBNZ W1,LC00 ;
 MOV W2,#1   | LC00:        ;
 STR W2,[X3] | MOV W2,#1    ;
             | STR W2,[X3]  ;
exists (0:X1=1 /\ 1:X1=1)

If we run this test through with herd7 test.litmus -o . -show all, we get a dot file that shows these events for one of the executions of P1:

subgraph cluster_proc1 { rank=sink; label = "Thread 1"; color=magenta; shape=box;
eiid2 [label="c: R[y]=0\lproc:P1 poi:0\lLDR W1,[X0]", shape="box", color="blue"];
eiid15 [label="p: Branching(bcc)\lproc:P1 poi:1\lCBNZ W1,.+4", shape="box", color="blue"];
eiid3 [label="d: W[x]=1\lproc:P1 poi:5\lSTR W2,[X3]", shape="box", color="blue"];
}

You can see that the poi index of STR W2,[X3] is 5, whereas if we look at the program purely syntactically, that instruction actually appears at index 4 (counting from 0), or actually index 3 if we ignore the label:

LDR W1,[X0]       --> 0
CBNZ W1,LC00      --> 1
LC00: MOV W2,#1   --> 2
STR W2,[X3]       --> 3

Indeed we can see that one of the other executions shown in the dot file does assign a poi = 3 to that same instruction:

subgraph cluster_proc1 { rank=sink; label = "Thread 1"; color=magenta; shape=box;
eiid2 [label="c: R[y]=1\lproc:P1 poi:0\lLDR W1,[X0]", shape="box", color="blue"];
eiid15 [label="p: Branching(bcc)\lproc:P1 poi:1\lCBNZ W1,.+4", shape="box", color="blue"];
eiid3 [label="d: W[x]=1\lproc:P1 poi:3\lSTR W2,[X3]", shape="box", color="blue"];
}

My understanding of the rationale behind this is that in the presence of jumps like CBNZ, herd will assign addresses and poi indices (among other things) by considering both branch possibilities in sequence. Therefore, in this example, poi will be assigned as follows:

LDR W1,[X0]       --> 0
CBNZ W1,LC00      --> 1
# jump condition is true
MOV W2,#1         --> 2
STR W2,[X3]       --> 3
# jump condition is false
MOV W2,#1         --> 4
STR W2,[X3]       --> 5

In other words, the poi here is closer to a runtime program counter rather than a static index pointing to the instruction's position in the source code.

There's clearly very good reasons for herd7 to use this "runtime" poi internally. However, from the point of view of a person looking at the execution graph, and comparing it to the source litmus test, I think it would be more useful if they were presented with a po index that is "static", i.e. that corresponds to the position of instructions in the source code.
Note that this is in fact what herd7 already prints in the graph when -showevents all -withbox true are passed as options:

subgraph cluster_proc1_poi3 { color=green; label = ""; labelloc="b"; shape=box;
eiid3 [label="d: W[x]=1\lproc:P1 poi:5\lSTR W2,[X3]", shape="box", color="blue"];
eiid17 [label="r: R1:X3q=x\lproc:P1 poi:5\lSTR W2,[X3]", shape="box", color="blue"];
eiid18 [label="s: R1:X2q=1 (data)\lproc:P1 poi:5\lSTR W2,[X3]", shape="box", color="blue"];
}

You can see that the name of the subgraph, namely cluster_proc1_poi3, shows the instruction's "static" poi (3), vs the actual node label showing the "runtime" poi (5).

This PR slightly modifies the logic that builds herd7's internal representation of the litmus test to analyse, so that information about each instruction's "static" poi can be propagated to the code responsible for generating the execution graphs.

For example, with this PR, the subgraph shown above would look like so:

subgraph cluster_proc1_poi3 { color=green; label = ""; labelloc="b"; shape=box;
eiid3 [label="d: W[x]=1\lproc:P1 poi:3\lSTR W2,[X3]", shape="box", color="blue"];
eiid17 [label="r: R1:X3q=x\lproc:P1 poi:3\lSTR W2,[X3]", shape="box", color="blue"];
eiid18 [label="s: R1:X2q=1 (data)\lproc:P1 poi:3\lSTR W2,[X3]", shape="box", color="blue"];
}

NOTE: this PR changes how graphs are generated, for certain litmus tests. However, running make test on my machine did not show any failing tests, which suggests that either I'm using the wrong command, or this PR does not impact functionality that the current test suite covers. That's not anything remotely alarming per se, as testing this functionality may just not be worth it. However, we might want to take the opportunity to ask ourselves that question.

@fsestini fsestini requested review from maranget and relokin December 10, 2025 11:29
@fsestini fsestini self-assigned this Dec 10, 2025
Copy link
Copy Markdown
Collaborator

@HadrienRenaud HadrienRenaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really speak about functionality, especially interaction with instruction fetching, but I have 3 little code comments

@artkhyzha
Copy link
Copy Markdown
Collaborator

Thank you, @fsestini, this is an interesting idea. A few quick thoughts:

  • Change-wise, one question I have is whether in principle a static poi can replace "addresses" (integers assigned inside of loader.ml to instructions). Alternatively, can we infer static_poi from addresses? Something like, remove the part that gets added to encode the thread number and divide the integer by 4.
  • Would it help to accompany a graph with the sequence of instructions each thread executed?
  • Changes to loader.ml are likely in conflict with [herd,litmus] Initial support for translating addresses of instructions #1518, so a merging strategy might be needed. It could be interesting to consider how to identify "poi" for .pagealign, as it inserts a branch instruction not present in the code of the test.

@fsestini
Copy link
Copy Markdown
Collaborator Author

fsestini commented Feb 6, 2026

Thank you for looking into this @artkhyzha . Your first two questions in particular touch on something I thought about when working on this. I'll address all questions in a moment, but first let me add context on the reasons why I decided to implement this change at the time I did, which might help us understand where my proposal is coming from, and perhaps figure out alternative approaches.

While I think it's useful in its own right to provide, in the execution graphs, a more reliable pointer into the instruction's location in the source litmus test code, the main motivation for this change is to support my work on litmus2desc. This is a tool that aims to generates English descriptions of litmus tests. It does so by describing the test's instructions, and by narrating the effects and candidate executions that would satisfy the test's final condition.

Consider this litmus test:

AArch64 LB+dmb.ld+ctrl
{
0:X0=x; 0:X3=y;
1:X0=y; 1:X3=x;
}
 P0          | P1           ;
 LDR W1,[X0] | LDR W1,[X0]  ;
 DMB LD      | CBNZ W1,LC00 ;
 MOV W2,#1   | LC00:        ;
 STR W2,[X3] | MOV W2,#1    ;
             | STR W2,[X3]  ;
exists (0:X1=1 /\ 1:X1=1)

The expected output from litmus2desc on this test is something like:

This litmus test asks whether the following execution is architecturally allowed:

The first Load instruction on P0, i.e. LDR W1,[X0], generates an Explicit Memory Read Effect of Location x with value 1;
The first Store instruction on P0, i.e. STR W2,[X3], generates an Explicit Memory Write Effect of Location y with value 1;
The second Store instruction on P1, i.e. STR W2,[X3], generates ... etc...
... etc ...

To produce such description, we need:

  1. A way to determine the effects generated by all instructions for any execution that could validate the test's condition (whether it's allowed or forbidden)
  2. A way to unambiguously trace those effects back to the exact instruction that generated them in the litmus test source code. This is crucial, and it allows the tool to generate descriptions like "the first Load instruction", "the second Store instruction", etc., that unambiguously identify an instruction within the litmus program.

Internally, litmus2desc works by running herd7 on the input test and parsing the resulting execution graphs (rendered as json, but that's an orthogonal matter). herd7's execution graphs as for current master are already sufficient to achieve point (1) above.

The difficulty is with point (2), since currenly the instruction-related information herd7 attaches to an Effect is

  1. the "runtime" poi discussed here
  2. the string representation of the instruction itself

I have doubts this information alone is sufficient to reliably trace an Effect back to its source instruction in the litmus code.

I suppose we could, in principle, find a way for litmus2desc to "reverse engineer" the static poi from the runtime poi, but this would rely on knowledge about how herd7 internally constructs runtime pois, and I'm not fond of the idea of baking assumptions about the internal logic of herd7 into tools external to it.

The string representation of instructions as shown in the execution graphs is also not always fully usable by litmus2desc, since the tool needs a structured representation of the instruction (namely AArch64Base.pseudo). In my WIP implementation of litmus2desc (which sits on top of this PR), this is achieved by having the tool parse the input test into its structured representation via lib/genParser.ml, and then using the static poi from the execution graph to index into the code obtained from GenParser. Absent static poi information, an alternative could be to use (a modified version of) GenParser to parse the string representation of instructions from the execution graphs into a structured AArch64Base.pseudo term. I think there are some issues with this approach, mainly to do with the format of labels in pretty-printed instructions as they appear in the graphs. For example, CBNZ W1,LC00 will be rendered as CBNZ W1,.+4 in graphs. As a consequence, if litmus2desc were to derive its knowledge about a test's instructions from the graphs alone, every description of an instruction with labels would end up showing labels as .+N, which would be quite confusing to the user. Note that even assuming execution graphs showed instructions exactly as they appear in the source code, litmus2desc would still also need reliable po indexing information to determine which, among two identical instructions, comes first in program order.

Among all the options that I could think of and that would allow a tool like litmus2desc to work, I favoured adding static poi information to the graphs, considering its relative simplicity (according to my understanding of things) and general usefulness to people looking at herd7 regardless of litmus2desc. However, perhaps there are alternative solutions I haven't considered yet.

Now, to your points:

Change-wise, one question I have is whether in principle a static poi can replace "addresses" (integers assigned inside of loader.ml to instructions). Alternatively, can we infer static_poi from addresses? Something like, remove the part that gets added to encode the thread number and divide the integer by 4.

As for herd7's current implementation of addresses, I think it should be possible and relatively simple to write a mapping from static pois to addresses, and back, precisely as you suggest. I'm not sure if it would still be quite as simple after #1518, as that PR seems to alter the address logic in non-trivial ways (although I haven't looked deeply into it yet, and you are certainly more qualified to answer than me).

I appreciate your question was posed in the context of herd7 alone, and in that context, I don't see particular problems with deriving static pois from addresses or vice versa (pre-#1518).

From the point of view of litmus2desc, I wonder whether having such "static poi <-> address" mapping would help solving the issues I described above. For example, I would have some reservations about an approach where we display addresses in the graphs (instead of static pois) and then try to infer static pois from addresses on litmus2desc's side, since again this would have to rely on baking into litmus2desc assumptions about how herd7 internally constructs addresses; assumptions which can easily break the moment we decide to change herd7 address handling logic.

A alternative approach could be to move the "static poi <-> address" translation logic out of herd/ and into lib/ so that both herd7 and litmus2desc can depend on it as code, and we don't need to worry about litmus2desc's assumptions about herd7 internal logic going out of sync with the actual code. That could work, although I think that from the perspective of a user looking at the execution graphs, it would nevertheless be useful to have some kind of static poi information.

Would it help to accompany a graph with the sequence of instructions each thread executed?

I suppose it would depend on how this sequence is structured, and what kind of information we can derive from it. I would be happy to consider adding different kinds of data to the execution graphs, if that data can be used to reliably trace effects back to their source instructions.

Changes to loader.ml are likely in conflict with #1518, so a merging strategy might be needed.

If this PR ends up reaching consensus and approval for merge well before #1518 does, we indeed might. Otherwise, I wouldn't mind holding back until after #1518 is merged and then rebasing this on top. This PR is quite small, and not super urgent as its mainly in support of litmus2desc.

It could be interesting to consider how to identify "poi" for .pagealign

Agreed. Currently static pois are calculated by incrementing a counter for each Instruction constructor of AArch64Base.pseudo, but in principle, anything that is included in the MiscParser.result structure can take part in static poi calculation. In particular, from #1518, it looks like .pagealign would at least appear as a Pagealign constructor of AArch64Base.pseudo, so the static poi calculation logic could potentially use that.

as it inserts a branch instruction not present in the code of the test.

Could you clarify what you mean by "code" here? Would that be what Loader.load produces as output?

Copy link
Copy Markdown
Member

@relokin relokin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fsestini please rebase and merge when you are ready.

@fsestini fsestini merged commit 471350c into herd:master Mar 4, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants