-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[WIP][analyzer] Refactor ExplodedGraph::trim()
#139939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
...because its implementation was using a complicated two-pass graph traversal while in fact one pass is sufficient for performing that trimming operation. Note that this commit changes the interface of `trim`: it no longer accepts nullpointers within the array `Sinks` and `BugReporter.cpp` (the single non-debug caller) was modified accordingly. This also affects the interface of `DumpGraph` (which calls `trim` under the hood), but that's a debug helper which is only called if some developer compiles a tweaked version of the analyzer for local debugging and it is unlikely that a developer would pass nullpointers to it. WORK IN PROGRESS -- DO NOT MERGE! After this commit the analyzer will pick different representants from the bug report equivalence classes (because some iteration order changes), which breaks lots of testcases.
|
This is the cleanup of @steakhal What should I do with this? |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
It not only simplifies it but also makes if faster. There are parts in the analyzer that have more legacy than others. I think to make this sustainable, we need to think about paying back the technical debt even at the expense of convenience. Anyway, we should probably focus first hardening stability of the reports and then on disruptive changes to mitigate the pain upfront. Because if we would land this, and then more similar changes, then we would face this instability again and again with a similar magnitude. On linux, it says "only" 11 tests failed: This isn't too bad actually, but I also get that the user-facing change would be much more signifficant. They would probably see differences of bug reports that had equal long candidate counterparts in the bug eqclass. This usually means that some "true branch taken" notes would turn into "false branch taken" for the conditions that didn't really matter anyway for the bug. |
The declaration of this data member was specifying `nullptr` as an initializer, but each constructor of the class initializes it to a non-null value (which is taken as an argument) and nothing changes its value later.
It looks like an iterator, let's clarify its type and use a better name.
Previous call only checked the last few commits from the PR.
| TrimGraphWorklist Worklist{Nodes}; | ||
| std::unique_ptr<ExplodedGraph> TrimmedG(G.trim(Worklist)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is Worklist hoised into a variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah because it's now an in-out ref parameter. I don't understand why it needs to be an lvalue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see... I can just pass Nodes directly to G.trim and the type conversion would force the materialization of an unnamed temporary object, which is exactly the right thing in this situation. I'll remove the named temporary Worklist -- let's enjoy magic when it's convenient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, does not work right away:
Non-const lvalue reference to type 'clang::ento::TrimGraphWorklist' (aka 'SmallVector<const clang::ento::ExplodedNode *, 32>') cannot bind to a value of unrelated type 'ArrayRef<const clang::ento::ExplodedNode *>' (clang lvalue_reference_bind_to_unrelated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One should prefer views at parameter boundaries.
You can take an arrayref parameter type, and then inside your function materialize a small vector out of it to use it as a worklist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed yet another small commit where I provide an "easy-to-call" overload for trim that takes an ArrayRef instead of a worklist: e01e8bd
However I'm also open to reverting the commit where I start to consume the array of nodes if you think that this is premature optimiziation.
|
I pushed a heap of small commits which do various simple cleanup in this neighborhood.
I agree, this seems to be a good approach: first land one commit that enforces deterministic ordering of the bug reports (which is non-NFC and may disrupt the users once) and then it will be possible to merge this commit and other similar changes without further disruptions. I guess, then this commit is put on hold until you or me or somebody else can harden the ordering of bug reports within an EQclass... I'm not sure that I can do so immediately (my backlog of plans is getting larger each day...) but there is a chance... |
I think we are aligned. yes. |
|
I started a measurement on open source projects to see the effect of this change on the (total) analysis runtime. I don't expect much but if this turns out to be non-negligible, then I will prioritize this direction. |
RT is not a concern to me in this case. |
The reason why I decided to check the runtime is because you said that "It not only simplifies it but also makes if faster. " in an earlier comment. However, my first runtime measurement finished and unfortunately it seems that this PR somehow increases the runtime 😕 on average:
I restarted this measurement to see whether this is a real tendency or just noise... but if this is a real slowdown, then I'll probably abandon or heavily reconsider this direction. On the other hand, I noticed that |
|
I don't believe these numbers. The report processing itself should barely take a measurable time. |
|
You're right, repeating the same measurement produced significantly different results:
At least now I know that our environment is noisy and I will need to repeat measurements multiple times to get usable results 😅 |
|
Thanks for letting me know. |
...because its implementation was using a complicated two-pass graph traversal while in fact one pass is sufficient for performing that trimming operation.
Note that this commit changes the interface of
trim: it no longer accepts nullpointers within the arraySinksandBugReporter.cpp(the single non-debug caller) was modified accordingly. This also affects the interface ofDumpGraph(which callstrimunder the hood), but that's a debug helper which is only called if some developer compiles a tweaked version of the analyzer for local debugging and it is unlikely that a developer would pass nullpointers to it.WORK IN PROGRESS -- DO NOT MERGE!
After this commit the analyzer will pick different representatives from the bug report equivalence classes (because some iteration order changes), which breaks lots of testcases.