-
Notifications
You must be signed in to change notification settings - Fork 34
Add support alloc-free #923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Conversation
* init alias analysis for alloc Signed-off-by: Tianrui Zheng <[email protected]> * add MemFree events to the test Signed-off-by: Tianrui Zheng <[email protected]> * add first test for alias analysis Signed-off-by: Tianrui Zheng <[email protected]> * add second test for alias analysis Signed-off-by: Tianrui Zheng <[email protected]> * improve field insensitive for alloc Signed-off-by: Tianrui Zheng <[email protected]> * add third test for alias analysis Signed-off-by: Tianrui Zheng <[email protected]> * add alias analysis for free events Signed-off-by: Tianrui Zheng <[email protected]> * use alias analysis for alloc and free in relation analysis Signed-off-by: Tianrui Zheng <[email protected]> * change native relation analysis Signed-off-by: Tianrui Zheng <[email protected]> * change logic of mustAlias Signed-off-by: Tianrui Zheng <[email protected]> * Using stack pointer in alloc tests * change exception type Signed-off-by: Tianrui Zheng <[email protected]> * Add full alias analysis Signed-off-by: Tianrui Zheng <[email protected]> * Add object alias for alloc -> memory event Signed-off-by: Tianrui Zheng <[email protected]> * Add alloc and free to alias graph Signed-off-by: Tianrui Zheng <[email protected]> --------- Signed-off-by: Tianrui Zheng <[email protected]> Co-authored-by: Tianrui Zheng <[email protected]> Co-authored-by: Natalia Gavrilenko <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Type of pthread mutex is u1 instead of u32. Remove intrinsics for pthread_mutexattr_setpolicy_np.
|
Hmm, I haven't checked the details, but I think all of this can be done a lot easier. In IR, it would look like this: In fact, maybe we should just have two different alloc events altogether: "DynamicAlloc" and Why do all this? Also, do we really need the Termination event? The notion of memory leak is independet of the memory model since it is independent of time and so we could easily encode is as either a separate property or as part of the program specification (maybe with a switch to enable/disable the leak checks). In other words, we move it out of the CAT file and into the program spec. |
It is a valid question whether we should make MemAlloc and MemFree memory events. I see two potential problems with this approach, but none of them should really prevent the implementation. Firstly, if MemAlloc and MemFree inherit AbstractMemoryCoreEvent, then they will also inherit the 'M' tag. As far as I see, it shouldn't be a problem for existing models (rc11, c11, and vmm), but I can imagine models where this could cause surprising results that might be hard to troubleshoot. As a workaround, we can remove the 'M' tag in the constructors of MemAlloc and MemFree, though it is a bit ugly. Secondly, with a straightforward implementation of 'loc', there will be a relation between MemAlloc (or MemFree) and a memory access to the beginning of the allocated region, but not to the other indexes of this region. This can cause very subtle bugs, and I am against defining 'loc' this way. But we can filter out MemAlloc and MemFree in the relation analysis of the loc relation, and keep them all together in alias analysis. In short, I am fine to make MemAlloc and MemFree a GenericMemoryEvent (alias analysis will be indeed cleaner this way), as long as loc, allocmem, and allocptr are kept as they are.
For the termination event, you are right that it can be implemented as a separate property, but it will make the interface of the end user even more convoluted. Actually, it is Hernan who wants to keep the interface clean, but I also like the idea of keeping all the six properties together in a CAT file. |
| } | ||
|
|
||
| @Override | ||
| // TODO: May and must sets of AllocMem can become very large for some programs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do a detailed review a bit later this week, but just a quick comment for now.
This should use LazyEventGraph. ImmutableEventGraph is the same explicit representation as in the native relation analysis, just with an immutable collection. It could be more efficient to use it in derived relations, but it may still require a lot of memory to store a may set of allocmem. With a LazyEventGraph, it will compute a set each time on demand. This might be a bit slower, but will use less memory.
Also, if alias analysis is thread-safe now, could you also change visitSameLocation to use LazyEventGraph? It used to be one of the most problematic relations in terms of the size of a may set. Hopefully, it should get better with a lazy graph.
And could you also remove the TODO here?
If
I don't understand why this can cause subtle bugs. So I would explicitly advocate for having alloc and free in
In #878 I have a dual "Nontermination" event, so I'm not against having those types of events in principle. The implementation needs to be different though. Regarding the properties in CAT: I can also imagine that having them as CAT spec can be kinda annoying, for example, when checking for data races in test code that doesn't care about cleanup (such as all our existing unit tests). You will end up getting memory leak errors left and right. But we could also provide options to disable/enable certain CAT-defined properties in some other way. |
Please read again to see why it shouldn't be in loc.
I don't see why it needs to be different, we cannot change the code back and forth just because you like it that way.
Again, please check the pr first, all the tests have been already taken care of. |
I don't get this. Your only argument was this:
And my argument is that this behavior is exactly how
My point is rather that the
Ok, that fixes at least the issue on our side. |
Where is it 'typically defined'? My point is that loc relation defined this way will have edges to events accessing index 0 but not other indexes. Such definition would be very error-prone, because in reality the effect will be the same for any index within the allocated bounds (unless it is a MemFree event), i.e. either accesses to all indexes should have the relation or none.
It is the existing encoding moved from PropertyEncoder to the event.
|
That is why I gave the example of mixed-size accesses: Putting the large access into If you want to relate accesses to the same region, you instead use the new
Well, then move it to the |
Signed-off-by: Tianrui Zheng <[email protected]>
| public MemAlloc(Register resultRegister, Type allocType, Expression arraySize, Expression alignment, | ||
| boolean isHeapAllocation, boolean doesZeroOutMemory) { | ||
| super(resultRegister, "alloc"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not going to work. The address expression (or any expression for that matter) is evaluated before the execution of the event, but at that point in time, resultRegister will not contain the correct address (it is most likely even uninitialized).
That is why I said you probably need to add a new event. Either a dummy event that can be added after the alloc so it can refer to the result register:
p <- alloc(...) // No memory event. Using "p" as address here is wrong.
MemAlloc(p); // Visible allocation marker. Memory event with address "p".
Or, a new MemAlloc that replaces Alloc after the memory allocation pass ran, i.e., after memory objects have been allocated. Because then MemAlloc can use the allocated MemoryObject as address.
The first solution is certainly easier to implement for now (probably like 10 lines of code).
For the second solution, I can also imagine MemAlloc just inheriting from Alloc (but then you need to implement the MemoryCoreEvent interface) and adding the new address field. But I guess quite a bit of existing code would need to get updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please give a minimal example of a program where it is not going to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I use resultRegister in constructor and later address will be set to allocatedObject when setAllocatedObject is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please give a minimal example of a program where it is not going to work
No frontend will generate such a program, but code like this would fail to get optimized
int64 p <- ... // alloc or a local assigment
int64 p <- alloc(...)
The second alloc appears to make use of p which keeps the previous assignment alive and thus it cannot get deleted although it is dead. It will probably also generate some encoding for that, but I don't think it will result in soundness errors. Though I can also imagine it breaking the alias analysis if we ever want to run it before memory allocation (which is reasonable).
Apart from that, the conceptual error is important in itself IMO.
Here I use resultRegister in constructor and later address will be set to allocatedObject when setAllocatedObject is called.
I know, but that only partly addresses the issue. On construction, the alloc event is essentially broken until the memory allocation pass goes through. Creating a conceptually broken object and then adding a workaround will only cause problems in the long run. At the very least, it deserves comments that document the "hacks" and explain why the hack is not problematic in practice. Though I would advice against it, because then you kinda need to understand what the rest of Dartagnan does and anyone who writes passes in the future can easily run into the trap (e.g., alias analysis running before memory allocation).
In 99% of the cases though, the wrong address will only cause optimization obstacles if anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g., alias analysis running before memory allocation
I'm confused about it: how can the two solutions suggested by you help with this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently (and in both of my suggested solutions), the Alloc event is no memory event, so no pass/analysis can accidentally treat it as such. This also means that an alias analysis that uses getAddress() to determine the accessed address of each memory event is going to do the right thing.
In your version, getAddress() gives a wrong value before memory allocation, so the analyses have to special case on Alloc and if they don't, they may produce wrong results without knowing.
And there is no way that, a couple month down the line, everyone remembers that Alloc needs special treatment... Even worse, because it will likely work in many cases (e.g., whenever an analysis runs after memory allocation), it ends up being a time bomb for a very long time.
Btw. the "cheap" solution with the extra allocation marker is somewhat implemented here by a former student of mine. It also uses the sobj (same object) relation to formulate memory errors:
flag ~empty (([M];sobj;[MFR]) \ hb*) as possible-access-after-free
let alloc-free-matches = [MAL];loc;[MFR]
flag ~empty [MFR] \ [range(alloc-free-matches)] as invalid-free
flag ~empty [MAL] \ [domain(alloc-free-matches)] as memory-leak
I have not checked any details of the code base nor the CAT files, so take them with a grain of salt.
Lastly, I think you shouldn't inherit from GenericMemoryEvent. If you want to use that class, you can just create instances of it with the ALLOC/FREE tag. If you create separate classes (which is totally fine), then they should probably extend AbstractMemoryCoreEvent instead.
Signed-off-by: Tianrui Zheng <[email protected]>
| (* | ||
| * Heap memory safety. | ||
| * Base relations: | ||
| * allocptr - relates (ALLOC | FREE) -> (FREE) when both events use the same pointer | ||
| * allocmem - relates (ALLOC) -> (M) when the second event accesses the memory allocated by the first event | ||
| *) | ||
| flag ~empty (((ALLOC * FREE) & allocptr) \ hb) as alloc-race | ||
| flag ~empty (((FREE * FREE) \ id) & allocptr) as double-free | ||
| flag ~empty ([FREE] \ [range(allocptr & (ALLOC * FREE))]) as free-without-alloc | ||
| flag ~empty (allocmem \ hb) as use-before-alloc | ||
| flag ~empty ((allocmem^-1 ; (allocptr & (ALLOC * FREE))) \ hb) as use-after-free | ||
| flag ~empty [domain(ALLOC * TERMINATION)] \ [domain(allocptr)] as alloc-without-free |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could put this into a separate .cat file and just include it in the other models. I guess it makes sense for any (language-level) memory model that defines a hb relation.
| public static Termination newTerminationEvent() { | ||
| return terminationEvent; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't return an existing instance; just create a new one.
| MemFree other = new MemFree(address); | ||
| other.setFunction(this.getFunction()); | ||
| other.copyAllMetadataFrom(this); | ||
| return other; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a proper copy constructor. Also, don't set the function of the newly created event: this breaks invariants we try to uphold.
| public BooleanFormula encodeExec(EncodingContext ctx) { | ||
| Program program = getThread().getProgram(); | ||
| final BooleanFormulaManager bmgr = ctx.getBooleanFormulaManager(); | ||
| final BooleanFormula exitReached = bmgr.and(program.getThreads().stream() | ||
| .filter(t -> !t.equals(getThread())) | ||
| .map(t -> bmgr.equivalence(ctx.execution(t.getEntry()), ctx.execution(t.getExit()))) | ||
| .toList()); | ||
| final BooleanFormula terminated = program.getThreadEventsWithAllTags(Tag.NONTERMINATION).stream() | ||
| .map(CondJump.class::cast) | ||
| .map(jump -> bmgr.not(ctx.jumpTaken(jump))) | ||
| .reduce(bmgr.makeTrue(), bmgr::and); | ||
| return bmgr.implication(ctx.execution(this), bmgr.and(ctx.controlFlow(this), exitReached, terminated)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this encoding into ProgramEncoder. To make it accessible from everywhere, I think it would be reasonable to have EncodingContext provide a hasProgramTerminated() function that returns a fresh boolean variable. ProgramEncoder can then encode hasProgramTerminated <=> "actual encoding" and execution(terminatorEvent) <=> hasProgramTerminated.
EDIT: If ctx.hasProgramTerminated() exists, then it is also fine to let Termination.encodeExec encode exec(TermationEvent) <=> ctx.hasProgramTerminated. Basically, the point is to get the definition of what it means for a program to be terminated outside of the Termination event and into a more central place.
| .forEach(this::instrumentLoop); | ||
| } | ||
|
|
||
| program.addTerminationThread(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be here. Indeed, the code will not get reached if only spin loop detection is enabled.
If you want to put it into an existing pass, then ThreadCreation might be a better option. But that one might not run for Litmus code IIRC.
So a dedicated pass right at the end of the processing pipeline seems reasonable.
| case DATA -> intersection(r, getOrCreatePredefinedRelation(IDDTRANS), | ||
| addDefinition(new CartesianProduct(newRelation(), | ||
| Filter.union(Filter.byTag(Tag.MEMORY), Filter.byTag(Tag.ALLOC)), | ||
| Filter.byTag(Tag.MEMORY)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not a big fan of removing the memory tags from Alloc and Free. It messes with so many things.
Is there any imaginable case where treating Alloc/Free as proper memory events is really wrong?
If so, then it suggests that malloc and calloc are ought to be treated differently, since the latter zero-initializes the memory and so almost acts as a write (and hence a proper memory access).
If it is hard to imagine, I would go with the simpler solution and add the tags (or rather, not remove them in the first place) and see if we ever encounter an issue with it.
| public class AllocPtr extends Definition { | ||
|
|
||
| public AllocPtr(Relation r) { | ||
| super(r, ALLOCPTR); | ||
| } | ||
|
|
||
| @Override | ||
| public <T> T accept(Visitor<? extends T> v) { | ||
| return v.visitAllocPtr(this); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your current iteration, Alloc and Free have loc-edges (which I think they should), so let allocPtr = [ALLOC];loc;[FREE] holds and you can save yourself this whole relation.
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
Signed-off-by: Tianrui Zheng <[email protected]>
8eebe94 to
dc21c35
Compare
This PR adds support to detect heap memory safety issues in:
MemFreewith new tagFREEMemAlloc(renamed fromAlloc), which has new tagALLOCwhen it is a heap allocationTerminationwith tagTERMINATIONwhich gets executed if and only if the program terminated normallyAllocPtrandAllocMemand derived ones into modelsc11andrc11, NOTE:vmmis still WIP