Skip to content

Conversation

@CapZTr
Copy link
Collaborator

@CapZTr CapZTr commented Aug 14, 2025

This PR adds support to detect heap memory safety issues in:

  1. adding new event MemFree with new tag FREE
  2. modifying event MemAlloc (renamed from Alloc), which has new tag ALLOC when it is a heap allocation
  3. enhancing alias analysis to handle the two events above
  4. adding new event Termination with tag TERMINATION which gets executed if and only if the program terminated normally
  5. adding new base relation AllocPtr and AllocMem and derived ones into models c11 and rc11, NOTE: vmm is still WIP

natgavrilenko and others added 30 commits May 13, 2025 14:26
* init alias analysis for alloc

Signed-off-by: Tianrui Zheng <[email protected]>

* add MemFree events to the test

Signed-off-by: Tianrui Zheng <[email protected]>

* add first test for alias analysis

Signed-off-by: Tianrui Zheng <[email protected]>

* add second test for alias analysis

Signed-off-by: Tianrui Zheng <[email protected]>

* improve field insensitive for alloc

Signed-off-by: Tianrui Zheng <[email protected]>

* add third test for alias analysis

Signed-off-by: Tianrui Zheng <[email protected]>

* add alias analysis for free events

Signed-off-by: Tianrui Zheng <[email protected]>

* use alias analysis for alloc and free in relation analysis

Signed-off-by: Tianrui Zheng <[email protected]>

* change native relation analysis

Signed-off-by: Tianrui Zheng <[email protected]>

* change logic of mustAlias

Signed-off-by: Tianrui Zheng <[email protected]>

* Using stack pointer in alloc tests

* change exception type

Signed-off-by: Tianrui Zheng <[email protected]>

* Add full alias analysis

Signed-off-by: Tianrui Zheng <[email protected]>

* Add object alias for alloc -> memory event

Signed-off-by: Tianrui Zheng <[email protected]>

* Add alloc and free to alias graph

Signed-off-by: Tianrui Zheng <[email protected]>

---------

Signed-off-by: Tianrui Zheng <[email protected]>
Co-authored-by: Tianrui Zheng <[email protected]>
Co-authored-by: Natalia Gavrilenko <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Tianrui Zheng and others added 3 commits August 8, 2025 16:19
Signed-off-by: Tianrui Zheng <[email protected]>
Type of pthread mutex is u1 instead of u32.
Remove intrinsics for pthread_mutexattr_setpolicy_np.
@ThomasHaas
Copy link
Collaborator

Hmm, I haven't checked the details, but I think all of this can be done a lot easier.
We do support "generic memory events" and IMO we can make MemFree such an event.
It is not so straightforward to do the same for Alloc (it cannot be a MemoryEvent because it has no address), but a simple workaround is to use a marker-like AllocMarker memory event that takes an expression as input and marks it as allocated.

In IR, it would look like this:

bv64 r <- heapalloc(....)
AllocMarker(r)     // Memory event with address expression "r" and ALLOC tag

In fact, maybe we should just have two different alloc events altogether: "DynamicAlloc" and MemAlloc (naming is up to debate) where the latter one is a memory event. The former is later "compiled" to the latter. Similar to how we have DynamicThreadCreate and ThreadCreate.

Why do all this?
(1) It is conceptually more sane and more principled.
(2) You don't need to mess up the signature of AliasAnalysis.mayAlias etc.
(3) Allocptr becomes obsolete: you can do let allocptr = [ALLOC];loc;[FREE] since you have the standard loc relation between them. Also, I'm still advocating for a sameObj relation which would make allocmem obsolete, is more general, and more cleanly aligns with the new AliasAnalysis.mayObjectAlias methods.
(4) It is easier and requires less changes (e.g. EncodingContext and IRHelper changes need not be done).


Also, do we really need the Termination event? The notion of memory leak is independet of the memory model since it is independent of time and so we could easily encode is as either a separate property or as part of the program specification (maybe with a switch to enable/disable the leak checks). In other words, we move it out of the CAT file and into the program spec.
In fact, we could move more (or all?) time-independent notions into the program spec such as free-without-alloc.

@natgavrilenko
Copy link
Collaborator

Hmm, I haven't checked the details, but I think all of this can be done a lot easier. We do support "generic memory events" and IMO we can make MemFree such an event. It is not so straightforward to do the same for Alloc (it cannot be a MemoryEvent because it has no address), but a simple workaround is to use a marker-like AllocMarker memory event that takes an expression as input and marks it as allocated.

In IR, it would look like this:

bv64 r <- heapalloc(....)
AllocMarker(r)     // Memory event with address expression "r" and ALLOC tag

In fact, maybe we should just have two different alloc events altogether: "DynamicAlloc" and MemAlloc (naming is up to debate) where the latter one is a memory event. The former is later "compiled" to the latter. Similar to how we have DynamicThreadCreate and ThreadCreate.

Why do all this? (1) It is conceptually more sane and more principled. (2) You don't need to mess up the signature of AliasAnalysis.mayAlias etc. (3) Allocptr becomes obsolete: you can do let allocptr = [ALLOC];loc;[FREE] since you have the standard loc relation between them. Also, I'm still advocating for a sameObj relation which would make allocmem obsolete, is more general, and more cleanly aligns with the new AliasAnalysis.mayObjectAlias methods. (4) It is easier and requires less changes (e.g. EncodingContext and IRHelper changes need not be done).

It is a valid question whether we should make MemAlloc and MemFree memory events. I see two potential problems with this approach, but none of them should really prevent the implementation.

Firstly, if MemAlloc and MemFree inherit AbstractMemoryCoreEvent, then they will also inherit the 'M' tag. As far as I see, it shouldn't be a problem for existing models (rc11, c11, and vmm), but I can imagine models where this could cause surprising results that might be hard to troubleshoot. As a workaround, we can remove the 'M' tag in the constructors of MemAlloc and MemFree, though it is a bit ugly.

Secondly, with a straightforward implementation of 'loc', there will be a relation between MemAlloc (or MemFree) and a memory access to the beginning of the allocated region, but not to the other indexes of this region. This can cause very subtle bugs, and I am against defining 'loc' this way. But we can filter out MemAlloc and MemFree in the relation analysis of the loc relation, and keep them all together in alias analysis.

In short, I am fine to make MemAlloc and MemFree a GenericMemoryEvent (alias analysis will be indeed cleaner this way), as long as loc, allocmem, and allocptr are kept as they are.

Also, do we really need the Termination event? The notion of memory leak is independet of the memory model since it is independent of time and so we could easily encode is as either a separate property or as part of the program specification (maybe with a switch to enable/disable the leak checks). In other words, we move it out of the CAT file and into the program spec. In fact, we could move more (or all?) time-independent notions into the program spec such as free-without-alloc.

For the termination event, you are right that it can be implemented as a separate property, but it will make the interface of the end user even more convoluted. Actually, it is Hernan who wants to keep the interface clean, but I also like the idea of keeping all the six properties together in a CAT file.

}

@Override
// TODO: May and must sets of AllocMem can become very large for some programs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do a detailed review a bit later this week, but just a quick comment for now.

This should use LazyEventGraph. ImmutableEventGraph is the same explicit representation as in the native relation analysis, just with an immutable collection. It could be more efficient to use it in derived relations, but it may still require a lot of memory to store a may set of allocmem. With a LazyEventGraph, it will compute a set each time on demand. This might be a bit slower, but will use less memory.

Also, if alias analysis is thread-safe now, could you also change visitSameLocation to use LazyEventGraph? It used to be one of the most problematic relations in terms of the size of a may set. Hopefully, it should get better with a lazy graph.

And could you also remove the TODO here?

@ThomasHaas
Copy link
Collaborator

It is a valid question whether we should make MemAlloc and MemFree memory events. I see two potential problems with this approach, but none of them should really prevent the implementation.

Firstly, if MemAlloc and MemFree inherit AbstractMemoryCoreEvent, then they will also inherit the 'M' tag. As far as I see, it shouldn't be a problem for existing models (rc11, c11, and vmm), but I can imagine models where this could cause surprising results that might be hard to troubleshoot. As a workaround, we can remove the 'M' tag in the constructors of MemAlloc and MemFree, though it is a bit ugly.

If M = R | W ought to hold, we should probably define it like that in the stdlib.cat we have (and get rid of the internal M tag). We could also do let M = M \ (Free | Alloc) without the need to internally get rid of the tag. That being said, seeing both of them as proper memory events seems quite reasonable to me since they behave as such in many ways.

Secondly, with a straightforward implementation of 'loc', there will be a relation between MemAlloc (or MemFree) and a memory access to the beginning of the allocated region, but not to the other indexes of this region. This can cause very subtle bugs, and I am against defining 'loc' this way. But we can filter out MemAlloc and MemFree in the relation analysis of the loc relation, and keep them all together in alias analysis.

I don't understand why this can cause subtle bugs. loc is defined to mean that addresses exactly match, even in standard literature. For example, two mixed-size accesses that do only partially overlap are typically not in loc relation. Only the overlapping parts are in loc if the events get split into multiple subevents.
This is important because even the CPU can out-of-order execute the non-overlapping parts which would violate SC-per-loc if loc was defined to relate overlapping events.

So I would explicitly advocate for having alloc and free in loc relation, at least if we consider them as memory events internally.
Again, stdlib.cat could define let loc = loc \ (Free | Alloc) * (Free | Alloc) or let loc = [M];loc;[M] (assuming M does not contain free/alloc) if for some reason those should not be considered in the standard case.
Indeed, all those new relations could be defined in stdlib.cat if a simple sameObj base relation was available (whose implementation would be identical to allocmem just without the restriction to alloc/free). The user wouldn't know and doesn't really need to care I think.
GenMC has a similar loc-overlap relation sc-genmc (Kater), although they only use it to determine races and double-free it seems.

For the termination event, you are right that it can be implemented as a separate property, but it will make the interface of the end user even more convoluted. Actually, it is Hernan who wants to keep the interface clean, but I also like the idea of keeping all the six properties together in a CAT file.

In #878 I have a dual "Nontermination" event, so I'm not against having those types of events in principle. The implementation needs to be different though.

Regarding the properties in CAT: I can also imagine that having them as CAT spec can be kinda annoying, for example, when checking for data races in test code that doesn't care about cleanup (such as all our existing unit tests). You will end up getting memory leak errors left and right. But we could also provide options to disable/enable certain CAT-defined properties in some other way.

@natgavrilenko
Copy link
Collaborator

It is a valid question whether we should make MemAlloc and MemFree memory events. I see two potential problems with this approach, but none of them should really prevent the implementation.
Firstly, if MemAlloc and MemFree inherit AbstractMemoryCoreEvent, then they will also inherit the 'M' tag. As far as I see, it shouldn't be a problem for existing models (rc11, c11, and vmm), but I can imagine models where this could cause surprising results that might be hard to troubleshoot. As a workaround, we can remove the 'M' tag in the constructors of MemAlloc and MemFree, though it is a bit ugly.

If M = R | W ought to hold, we should probably define it like that in the stdlib.cat we have (and get rid of the internal M tag). We could also do let M = M \ (Free | Alloc) without the need to internally get rid of the tag. That being said, seeing both of them as proper memory events seems quite reasonable to me since they behave as such in many ways.

Secondly, with a straightforward implementation of 'loc', there will be a relation between MemAlloc (or MemFree) and a memory access to the beginning of the allocated region, but not to the other indexes of this region. This can cause very subtle bugs, and I am against defining 'loc' this way. But we can filter out MemAlloc and MemFree in the relation analysis of the loc relation, and keep them all together in alias analysis.

I don't understand why this can cause subtle bugs. loc is defined to mean that addresses exactly match, even in standard literature. For example, two mixed-size accesses that do only partially overlap are typically not in loc relation. Only the overlapping parts are in loc if the events get split into multiple subevents. This is important because even the CPU can out-of-order execute the non-overlapping parts which would violate SC-per-loc if loc was defined to relate overlapping events.

So I would explicitly advocate for having alloc and free in loc relation, at least if we consider them as memory events internally. Again, stdlib.cat could define let loc = loc \ (Free | Alloc) * (Free | Alloc) or let loc = [M];loc;[M] (assuming M does not contain free/alloc) if for some reason those should not be considered in the standard case. Indeed, all those new relations could be defined in stdlib.cat if a simple sameObj base relation was available (whose implementation would be identical to allocmem just without the restriction to alloc/free). The user wouldn't know and doesn't really need to care I think. GenMC has a similar loc-overlap relation sc-genmc (Kater), although they only use it to determine races and double-free it seems.

Please read again to see why it shouldn't be in loc.

For the termination event, you are right that it can be implemented as a separate property, but it will make the interface of the end user even more convoluted. Actually, it is Hernan who wants to keep the interface clean, but I also like the idea of keeping all the six properties together in a CAT file.

In #878 I have a dual "Nontermination" event, so I'm not against having those types of events in principle. The implementation needs to be different though.

I don't see why it needs to be different, we cannot change the code back and forth just because you like it that way.

Regarding the properties in CAT: I can also imagine that having them as CAT spec can be kinda annoying, for example, when checking for data races in test code that doesn't care about cleanup (such as all our existing unit tests). You will end up getting memory leak errors left and right. But we could also provide options to disable/enable certain CAT-defined properties in some other way.

Again, please check the pr first, all the tests have been already taken care of.

@ThomasHaas
Copy link
Collaborator

Please read again to see why it shouldn't be in loc.

I don't get this. Your only argument was this:

Secondly, with a straightforward implementation of 'loc', there will be a relation between MemAlloc (or MemFree) and a memory access to the beginning of the allocated region, but not to the other indexes of this region.

And my argument is that this behavior is exactly how loc is typically defined, i.e., it behaves as one would expect.

In #878 I have a dual "Nontermination" event, so I'm not against having those types of events in principle. The implementation needs to be different though.

I don't see why it needs to be different, we cannot change the code back and forth just because you like it that way.

My point is rather that the Termination event should not define its semantics in the encodeExec method (me might even want to get rid of this method altogether at some point) and instead should be defined in the ProgramEncoder which already provides encodings for termination like "threadHasTerminatedNormally()". Putting a program-wide reasoning into the event-local encoding is ugly and duplicating the already existing encoding for termination is not nice either.

Regarding the properties in CAT: I can also imagine that having them as CAT spec can be kinda annoying, for example, when checking for data races in test code that doesn't care about cleanup (such as all our existing unit tests). You will end up getting memory leak errors left and right. But we could also provide options to disable/enable certain CAT-defined properties in some other way.

Again, please check the pr first, all the tests have been already taken care of.

Ok, that fixes at least the issue on our side.

@natgavrilenko
Copy link
Collaborator

Please read again to see why it shouldn't be in loc.

I don't get this. Your only argument was this:

Secondly, with a straightforward implementation of 'loc', there will be a relation between MemAlloc (or MemFree) and a memory access to the beginning of the allocated region, but not to the other indexes of this region.

And my argument is that this behavior is exactly how loc is typically defined, i.e., it behaves as one would expect.

Where is it 'typically defined'? My point is that loc relation defined this way will have edges to events accessing index 0 but not other indexes. Such definition would be very error-prone, because in reality the effect will be the same for any index within the allocated bounds (unless it is a MemFree event), i.e. either accesses to all indexes should have the relation or none.

In #878 I have a dual "Nontermination" event, so I'm not against having those types of events in principle. The implementation needs to be different though.

I don't see why it needs to be different, we cannot change the code back and forth just because you like it that way.

My point is rather that the Termination event should not define its semantics in the encodeExec method (me might even want to get rid of this method altogether at some point) and instead should be defined in the ProgramEncoder which already provides encodings for termination like "threadHasTerminatedNormally()". Putting a program-wide reasoning into the event-local encoding is ugly and duplicating the already existing encoding for termination is not nice either.

It is the existing encoding moved from PropertyEncoder to the event.

Regarding the properties in CAT: I can also imagine that having them as CAT spec can be kinda annoying, for example, when checking for data races in test code that doesn't care about cleanup (such as all our existing unit tests). You will end up getting memory leak errors left and right. But we could also provide options to disable/enable certain CAT-defined properties in some other way.

Again, please check the pr first, all the tests have been already taken care of.

Ok, that fixes at least the issue on our side.

@ThomasHaas
Copy link
Collaborator

Please read again to see why it shouldn't be in loc.

I don't get this. Your only argument was this:

Secondly, with a straightforward implementation of 'loc', there will be a relation between MemAlloc (or MemFree) and a memory access to the beginning of the allocated region, but not to the other indexes of this region.

And my argument is that this behavior is exactly how loc is typically defined, i.e., it behaves as one would expect.

Where is it 'typically defined'? My point is that loc relation defined this way will have edges to events accessing index 0 but not other indexes. Such definition would be very error-prone, because in reality the effect will be the same for any index within the allocated bounds (unless it is a MemFree event), i.e. either accesses to all indexes should have the relation or none.

That is why I gave the example of mixed-size accesses:

int x[2];
*((long*)&x) = 42; // 64 bit access
x[0]  = 1;             // in loc relation with above access (access to same base address)
x[1] = 2;             // not in loc relation with first access (overlapping, but access to different base address)

Putting the large access into loc relation with the access to x[1] would be wrong according to, e.g., ARM's cat model as then SC-per-loc would prohibit reorderings that are observable on actual hardware. If you treat a MemFree simply as a large memory access over the whole object range (which I think is very reasonable and intuitive), this behavior of loc shouldn't fundametally change: the MemFree is only loc-related to the first index.

If you want to relate accesses to the same region, you instead use the new sameObj (or sameRegion if you prefer that name) relation that I'm talking about.
Since alloc-free pairs need to match on the exact same address, you use loc to check those. But, e.g., a use-after-free does not need to match precisely, so you use the new sameObj to formulate it.
The only real argument against doing it like this is for memory-space reasons, i.e., sameObj might have a too large may-set. That being said. the encoded active set will still only contain events pairs involving frees, i.e., the encoding will likely be identical to what a hardcoded allocmem would produce.

In #878 I have a dual "Nontermination" event, so I'm not against having those types of events in principle. The implementation needs to be different though.

I don't see why it needs to be different, we cannot change the code back and forth just because you like it that way.

My point is rather that the Termination event should not define its semantics in the encodeExec method (me might even want to get rid of this method altogether at some point) and instead should be defined in the ProgramEncoder which already provides encodings for termination like "threadHasTerminatedNormally()". Putting a program-wide reasoning into the event-local encoding is ugly and duplicating the already existing encoding for termination is not nice either.

It is the existing encoding moved from PropertyEncoder to the event.

Well, then move it to the ProgramEncoder I guess. It still stands that program-wide reasoning better fits into the ProgramEncoder rather than the event-local semantics.
In the long term, I would get rid of encodeExec entirely I think. There are only like 2 or 3 usages, all of which the ProgramEncoder could encode instead of spreading the definition of the program semantics all over the place.

Comment on lines 36 to 38
public MemAlloc(Register resultRegister, Type allocType, Expression arraySize, Expression alignment,
boolean isHeapAllocation, boolean doesZeroOutMemory) {
super(resultRegister, "alloc");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not going to work. The address expression (or any expression for that matter) is evaluated before the execution of the event, but at that point in time, resultRegister will not contain the correct address (it is most likely even uninitialized).

That is why I said you probably need to add a new event. Either a dummy event that can be added after the alloc so it can refer to the result register:

p <- alloc(...) // No memory event. Using "p" as address here is wrong.
MemAlloc(p); // Visible allocation marker. Memory event with address "p".

Or, a new MemAlloc that replaces Alloc after the memory allocation pass ran, i.e., after memory objects have been allocated. Because then MemAlloc can use the allocated MemoryObject as address.

The first solution is certainly easier to implement for now (probably like 10 lines of code).
For the second solution, I can also imagine MemAlloc just inheriting from Alloc (but then you need to implement the MemoryCoreEvent interface) and adding the new address field. But I guess quite a bit of existing code would need to get updated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give a minimal example of a program where it is not going to work

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I use resultRegister in constructor and later address will be set to allocatedObject when setAllocatedObject is called.

Copy link
Collaborator

@ThomasHaas ThomasHaas Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give a minimal example of a program where it is not going to work

No frontend will generate such a program, but code like this would fail to get optimized

int64 p <- ... // alloc or a local assigment
int64 p <- alloc(...)

The second alloc appears to make use of p which keeps the previous assignment alive and thus it cannot get deleted although it is dead. It will probably also generate some encoding for that, but I don't think it will result in soundness errors. Though I can also imagine it breaking the alias analysis if we ever want to run it before memory allocation (which is reasonable).
Apart from that, the conceptual error is important in itself IMO.

Here I use resultRegister in constructor and later address will be set to allocatedObject when setAllocatedObject is called.

I know, but that only partly addresses the issue. On construction, the alloc event is essentially broken until the memory allocation pass goes through. Creating a conceptually broken object and then adding a workaround will only cause problems in the long run. At the very least, it deserves comments that document the "hacks" and explain why the hack is not problematic in practice. Though I would advice against it, because then you kinda need to understand what the rest of Dartagnan does and anyone who writes passes in the future can easily run into the trap (e.g., alias analysis running before memory allocation).
In 99% of the cases though, the wrong address will only cause optimization obstacles if anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g., alias analysis running before memory allocation

I'm confused about it: how can the two solutions suggested by you help with this case?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently (and in both of my suggested solutions), the Alloc event is no memory event, so no pass/analysis can accidentally treat it as such. This also means that an alias analysis that uses getAddress() to determine the accessed address of each memory event is going to do the right thing.
In your version, getAddress() gives a wrong value before memory allocation, so the analyses have to special case on Alloc and if they don't, they may produce wrong results without knowing.
And there is no way that, a couple month down the line, everyone remembers that Alloc needs special treatment... Even worse, because it will likely work in many cases (e.g., whenever an analysis runs after memory allocation), it ends up being a time bomb for a very long time.

Btw. the "cheap" solution with the extra allocation marker is somewhat implemented here by a former student of mine. It also uses the sobj (same object) relation to formulate memory errors:

flag ~empty (([M];sobj;[MFR]) \ hb*) as possible-access-after-free
let alloc-free-matches = [MAL];loc;[MFR]
flag ~empty [MFR] \ [range(alloc-free-matches)] as invalid-free
flag ~empty [MAL] \ [domain(alloc-free-matches)] as memory-leak

I have not checked any details of the code base nor the CAT files, so take them with a grain of salt.

Lastly, I think you shouldn't inherit from GenericMemoryEvent. If you want to use that class, you can just create instances of it with the ALLOC/FREE tag. If you create separate classes (which is totally fine), then they should probably extend AbstractMemoryCoreEvent instead.

Signed-off-by: Tianrui Zheng <[email protected]>
Comment on lines +105 to +116
(*
* Heap memory safety.
* Base relations:
* allocptr - relates (ALLOC | FREE) -> (FREE) when both events use the same pointer
* allocmem - relates (ALLOC) -> (M) when the second event accesses the memory allocated by the first event
*)
flag ~empty (((ALLOC * FREE) & allocptr) \ hb) as alloc-race
flag ~empty (((FREE * FREE) \ id) & allocptr) as double-free
flag ~empty ([FREE] \ [range(allocptr & (ALLOC * FREE))]) as free-without-alloc
flag ~empty (allocmem \ hb) as use-before-alloc
flag ~empty ((allocmem^-1 ; (allocptr & (ALLOC * FREE))) \ hb) as use-after-free
flag ~empty [domain(ALLOC * TERMINATION)] \ [domain(allocptr)] as alloc-without-free
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could put this into a separate .cat file and just include it in the other models. I guess it makes sense for any (language-level) memory model that defines a hb relation.

Comment on lines +180 to +182
public static Termination newTerminationEvent() {
return terminationEvent;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't return an existing instance; just create a new one.

Comment on lines +16 to +19
MemFree other = new MemFree(address);
other.setFunction(this.getFunction());
other.copyAllMetadataFrom(this);
return other;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a proper copy constructor. Also, don't set the function of the newly created event: this breaks invariants we try to uphold.

Comment on lines +27 to +39
public BooleanFormula encodeExec(EncodingContext ctx) {
Program program = getThread().getProgram();
final BooleanFormulaManager bmgr = ctx.getBooleanFormulaManager();
final BooleanFormula exitReached = bmgr.and(program.getThreads().stream()
.filter(t -> !t.equals(getThread()))
.map(t -> bmgr.equivalence(ctx.execution(t.getEntry()), ctx.execution(t.getExit())))
.toList());
final BooleanFormula terminated = program.getThreadEventsWithAllTags(Tag.NONTERMINATION).stream()
.map(CondJump.class::cast)
.map(jump -> bmgr.not(ctx.jumpTaken(jump)))
.reduce(bmgr.makeTrue(), bmgr::and);
return bmgr.implication(ctx.execution(this), bmgr.and(ctx.controlFlow(this), exitReached, terminated));
}
Copy link
Collaborator

@ThomasHaas ThomasHaas Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this encoding into ProgramEncoder. To make it accessible from everywhere, I think it would be reasonable to have EncodingContext provide a hasProgramTerminated() function that returns a fresh boolean variable. ProgramEncoder can then encode hasProgramTerminated <=> "actual encoding" and execution(terminatorEvent) <=> hasProgramTerminated.

EDIT: If ctx.hasProgramTerminated() exists, then it is also fine to let Termination.encodeExec encode exec(TermationEvent) <=> ctx.hasProgramTerminated. Basically, the point is to get the definition of what it means for a program to be terminated outside of the Termination event and into a more central place.

.forEach(this::instrumentLoop);
}

program.addTerminationThread();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be here. Indeed, the code will not get reached if only spin loop detection is enabled.
If you want to put it into an existing pass, then ThreadCreation might be a better option. But that one might not run for Litmus code IIRC.
So a dedicated pass right at the end of the processing pipeline seems reasonable.

Comment on lines +208 to +211
case DATA -> intersection(r, getOrCreatePredefinedRelation(IDDTRANS),
addDefinition(new CartesianProduct(newRelation(),
Filter.union(Filter.byTag(Tag.MEMORY), Filter.byTag(Tag.ALLOC)),
Filter.byTag(Tag.MEMORY))));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not a big fan of removing the memory tags from Alloc and Free. It messes with so many things.
Is there any imaginable case where treating Alloc/Free as proper memory events is really wrong?
If so, then it suggests that malloc and calloc are ought to be treated differently, since the latter zero-initializes the memory and so almost acts as a write (and hence a proper memory access).

If it is hard to imagine, I would go with the simpler solution and add the tags (or rather, not remove them in the first place) and see if we ever encounter an issue with it.

Comment on lines +8 to +18
public class AllocPtr extends Definition {

public AllocPtr(Relation r) {
super(r, ALLOCPTR);
}

@Override
public <T> T accept(Visitor<? extends T> v) {
return v.visitAllocPtr(this);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your current iteration, Alloc and Free have loc-edges (which I think they should), so let allocPtr = [ALLOC];loc;[FREE] holds and you can save yourself this whole relation.

Tianrui Zheng added 8 commits August 22, 2025 16:48
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants