[WIP][analyzer] Refactor `ExplodedGraph::trim()` #139939

NagyDonat · 2025-05-14T17:52:20Z

...because its implementation was using a complicated two-pass graph traversal while in fact one pass is sufficient for performing that trimming operation.

Note that this commit changes the interface of trim: it no longer accepts nullpointers within the array Sinks and BugReporter.cpp (the single non-debug caller) was modified accordingly. This also affects the interface of DumpGraph (which calls trim under the hood), but that's a debug helper which is only called if some developer compiles a tweaked version of the analyzer for local debugging and it is unlikely that a developer would pass nullpointers to it.

WORK IN PROGRESS -- DO NOT MERGE!

After this commit the analyzer will pick different representatives from the bug report equivalence classes (because some iteration order changes), which breaks lots of testcases.

...because its implementation was using a complicated two-pass graph traversal while in fact one pass is sufficient for performing that trimming operation. Note that this commit changes the interface of `trim`: it no longer accepts nullpointers within the array `Sinks` and `BugReporter.cpp` (the single non-debug caller) was modified accordingly. This also affects the interface of `DumpGraph` (which calls `trim` under the hood), but that's a debug helper which is only called if some developer compiles a tweaked version of the analyzer for local debugging and it is unlikely that a developer would pass nullpointers to it. WORK IN PROGRESS -- DO NOT MERGE! After this commit the analyzer will pick different representants from the bug report equivalence classes (because some iteration order changes), which breaks lots of testcases.

NagyDonat · 2025-05-14T17:53:00Z

This is the cleanup of ExplodedGraph::trim() that I promised at 31e981c. This significantly simplifies the algorithm and should be equivalent in a theoretical sense, but unfortunately it changes the (annoyingly fragile) ordering of reports within a bug report equivalence class 😞

@steakhal What should I do with this?

github-actions · 2025-05-14T17:56:16Z

✅ With the latest revision this PR passed the C/C++ code formatter.

steakhal · 2025-05-15T08:13:03Z

This is the cleanup of ExplodedGraph::trim() that I promised at 31e981c. This significantly simplifies the algorithm and should be equivalent in a theoretical sense, but unfortunately it changes the (annoyingly fragile) ordering of reports within a bug report equivalence class 😞

It not only simplifies it but also makes if faster. There are parts in the analyzer that have more legacy than others.
One of the worst parts are the bug reporting and processing - just like what you touch in the PR.

I think to make this sustainable, we need to think about paying back the technical debt even at the expense of convenience.
If we could harden the bug EQclass part for deterministically selecting the sequence of the bug reports we would try, that would make the outcomes stable no matter how the egraph looks, right?

Anyway, we should probably focus first hardening stability of the reports and then on disruptive changes to mitigate the pain upfront. Because if we would land this, and then more similar changes, then we would face this instability again and again with a similar magnitude.

On linux, it says "only" 11 tests failed:

Failed Tests (11):
  Clang :: Analysis/ArrayBound/verbose-tests.c
  Clang :: Analysis/cast-value-notes.cpp
  Clang :: Analysis/cert/env34-c.c
  Clang :: Analysis/division-by-zero-track-zero.cpp
  Clang :: Analysis/fuchsia_handle.cpp
  Clang :: Analysis/malloc-plist.c
  Clang :: Analysis/operator-calls.cpp
  Clang :: Analysis/osobject-retain-release.cpp
  Clang :: Analysis/retain-release.m
  Clang :: Analysis/smart-ptr-text-output.cpp
  Clang :: Analysis/valist-uninitialized-no-undef.c

This isn't too bad actually, but I also get that the user-facing change would be much more signifficant. They would probably see differences of bug reports that had equal long candidate counterparts in the bug eqclass. This usually means that some "true branch taken" notes would turn into "false branch taken" for the conditions that didn't really matter anyway for the bug.

clang/lib/StaticAnalyzer/Core/ExplodedGraph.cpp

clang/lib/StaticAnalyzer/Core/BugReporter.cpp

clang/lib/StaticAnalyzer/Core/ExplodedGraph.cpp

The declaration of this data member was specifying `nullptr` as an initializer, but each constructor of the class initializes it to a non-null value (which is taken as an argument) and nothing changes its value later.

It looks like an iterator, let's clarify its type and use a better name.

Previous call only checked the last few commits from the PR.

steakhal · 2025-05-15T13:38:01Z

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp

+  TrimGraphWorklist Worklist{Nodes};
+  std::unique_ptr<ExplodedGraph> TrimmedG(G.trim(Worklist));


Why is Worklist hoised into a variable?

Ah because it's now an in-out ref parameter. I don't understand why it needs to be an lvalue.

Oh, I see... I can just pass Nodes directly to G.trim and the type conversion would force the materialization of an unnamed temporary object, which is exactly the right thing in this situation. I'll remove the named temporary Worklist -- let's enjoy magic when it's convenient.

Nope, does not work right away:

Non-const lvalue reference to type 'clang::ento::TrimGraphWorklist' (aka 'SmallVector<const clang::ento::ExplodedNode *, 32>') cannot bind to a value of unrelated type 'ArrayRef<const clang::ento::ExplodedNode *>' (clang lvalue_reference_bind_to_unrelated)

One should prefer views at parameter boundaries.
You can take an arrayref parameter type, and then inside your function materialize a small vector out of it to use it as a worklist.

I pushed yet another small commit where I provide an "easy-to-call" overload for trim that takes an ArrayRef instead of a worklist: e01e8bd

However I'm also open to reverting the commit where I start to consume the array of nodes if you think that this is premature optimiziation.

NagyDonat · 2025-05-15T13:40:07Z

I pushed a heap of small commits which do various simple cleanup in this neighborhood.

If we could harden the bug EQclass part for deterministically selecting the sequence of the bug reports we would try, that would make the outcomes stable no matter how the egraph looks, right?

Anyway, we should probably focus first hardening stability of the reports and then on disruptive changes to mitigate the pain upfront. Because if we would land this, and then more similar changes, then we would face this instability again and again with a similar magnitude.

I agree, this seems to be a good approach: first land one commit that enforces deterministic ordering of the bug reports (which is non-NFC and may disrupt the users once) and then it will be possible to merge this commit and other similar changes without further disruptions.

I guess, then this commit is put on hold until you or me or somebody else can harden the ordering of bug reports within an EQclass... I'm not sure that I can do so immediately (my backlog of plans is getting larger each day...) but there is a chance...

steakhal · 2025-05-15T13:46:09Z

I pushed a heap of small commits which do various simple cleanup in this neighborhood.

If we could harden the bug EQclass part for deterministically selecting the sequence of the bug reports we would try, that would make the outcomes stable no matter how the egraph looks, right?
Anyway, we should probably focus first hardening stability of the reports and then on disruptive changes to mitigate the pain upfront. Because if we would land this, and then more similar changes, then we would face this instability again and again with a similar magnitude.

I agree, this seems to be a good approach: first land one commit that enforces deterministic ordering of the bug reports (which is non-NFC and may disrupt the users once) and then it will be possible to merge this commit and other similar changes without further disruptions.

I guess, then this commit is put on hold until you or me or somebody else can harden the ordering of bug reports within an EQclass... I'm not sure that I can do so immediately (my backlog of plans is getting larger each day...) but there is a chance...

I think we are aligned. yes.

NagyDonat · 2025-05-16T13:22:53Z

I started a measurement on open source projects to see the effect of this change on the (total) analysis runtime. I don't expect much but if this turns out to be non-negligible, then I will prioritize this direction.

steakhal · 2025-05-16T14:18:34Z

I started a measurement on open source projects to see the effect of this change on the (total) analysis runtime. I don't expect much but if this turns out to be non-negligible, then I will prioritize this direction.

RT is not a concern to me in this case.

NagyDonat · 2025-05-17T15:34:25Z

RT is not a concern to me in this case.

The reason why I decided to check the runtime is because you said that "It not only simplifies it but also makes if faster. " in an earlier comment.

However, my first runtime measurement finished and unfortunately it seems that this PR somehow increases the runtime 😕 on average:

Project	before this PR	this PR	Ratio
memcached	23.78	26.44	111.2%
tmux	74.92	73.93	98.7%
curl	231.98	238.36	102.8%
twin	57.52	54.77	95.2%
vim	179.32	164.70	91.8%
openssl	407.68	407.90	100.1%
sqlite	878.11	835.58	95.2%
ffmpeg	1306.35	1288.38	98.6%
postgres	401.11	397.88	99.2%
tinyxml2	28.74	28.44	98.9%
libwebm	57.17	60.29	105.5%
xerces	204.98	204.06	99.6%
bitcoin	160.97	158.15	98.2%
protobuf	2332.42	2435.38	104.4%
qtbase	3226.83	3851.08	119.3%
openrct2	1154.17	1172.49	101.6%
total	10726.05	11397.83	106.3%

I restarted this measurement to see whether this is a real tendency or just noise... but if this is a real slowdown, then I'll probably abandon or heavily reconsider this direction.

On the other hand, I noticed that CodeChecker report hashing "hides" the disturbances caused by this commit: CodeChecker diff and similar tools can recognize that the different representative of the equivalence class is still equivalent to the old one. This would mean that if we do introduce a commit that disrupts the representative selection (one last time, by enforcing a stable ordering), then it won't be too disruptive for the CodeChecker users.

steakhal · 2025-05-17T17:47:36Z

I don't believe these numbers. The report processing itself should barely take a measurable time.
Improving that shouldn't be visible. I suspect what you measured was noise.
I wouldn't invest into further measurements.

NagyDonat · 2025-05-19T08:37:35Z

You're right, repeating the same measurement produced significantly different results:

Project	before this PR	this PR	Ratio
memcached	25.70	28.12	109.4%
tmux	74.18	74.30	100.2%
curl	219.55	217.06	98.9%
twin	52.27	47.75	91.4%
vim	167.13	163.41	97.8%
openssl	431.63	422.84	98.0%
sqlite	959.32	941.19	98.1%
ffmpeg	1353.04	1348.64	99.7%
postgres	436.02	425.13	97.5%
tinyxml2	31.80	32.39	101.9%
libwebm	67.39	65.77	97.6%
xerces	201.98	225.78	111.8%
bitcoin	175.57	173.28	98.7%
protobuf	2585.72	2367.59	91.6%
qtbase	3988.95	3984.71	99.9%
openrct2	1210.14	1175.08	97.1%
total	11980.39	11693.04	97.6%

At least now I know that our environment is noisy and I will need to repeat measurements multiple times to get usable results 😅

steakhal · 2025-05-19T17:44:19Z

Thanks for letting me know.

NagyDonat requested review from Xazax-hun, haoNoQ and steakhal May 14, 2025 17:52

steakhal reviewed May 15, 2025

View reviewed changes

clang/lib/StaticAnalyzer/Core/ExplodedGraph.cpp Outdated Show resolved Hide resolved

clang/lib/StaticAnalyzer/Core/BugReporter.cpp Outdated Show resolved Hide resolved

NagyDonat added 2 commits May 15, 2025 13:57

Simplify map lookup; unbrace

348e740

Get rid of InverseMap because it was unused

300b7ce

steakhal reviewed May 15, 2025

View reviewed changes

clang/lib/StaticAnalyzer/Core/ExplodedGraph.cpp Outdated Show resolved Hide resolved

NagyDonat added 8 commits May 15, 2025 14:18

Eliminate const_cast by using the proper type

7ea7d83

Avoid copying a vector of node pointers

c2a6891

Delete an unused type alias

a8ced53

Capitalize the parameter 'bugReports'

42d3076

Clarify that PathSensitiveBugReport::ErrorNode is never null

f6b4ef5

The declaration of this data member was specifying `nullptr` as an initializer, but each constructor of the class initializes it to a non-null value (which is taken as an argument) and nothing changes its value later.

Fix misleading declaration of a loop variable

fde270e

It looks like an iterator, let's clarify its type and use a better name.

Satisfy git-clang-format

64b117a

Satisfy git-clang-format (really)

fab536f

Previous call only checked the last few commits from the PR.

steakhal reviewed May 15, 2025

View reviewed changes

Provide 'ExplodedGraph::trim(ArrayRef, ...)'

e01e8bd

		TrimGraphWorklist Worklist{Nodes};
		std::unique_ptr<ExplodedGraph> TrimmedG(G.trim(Worklist));

[WIP][analyzer] Refactor ExplodedGraph::trim() #139939

Are you sure you want to change the base?

[WIP][analyzer] Refactor ExplodedGraph::trim() #139939

Uh oh!

Conversation

NagyDonat commented May 14, 2025

Uh oh!

NagyDonat commented May 14, 2025

Uh oh!

github-actions bot commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steakhal commented May 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steakhal May 15, 2025

Choose a reason for hiding this comment

Uh oh!

steakhal May 15, 2025

Choose a reason for hiding this comment

Uh oh!

NagyDonat May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NagyDonat May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steakhal May 15, 2025

Choose a reason for hiding this comment

Uh oh!

NagyDonat May 15, 2025

Choose a reason for hiding this comment

Uh oh!

NagyDonat commented May 15, 2025

Uh oh!

steakhal commented May 15, 2025

Uh oh!

NagyDonat commented May 16, 2025

Uh oh!

steakhal commented May 16, 2025

Uh oh!

NagyDonat commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steakhal commented May 17, 2025

Uh oh!

NagyDonat commented May 19, 2025

Uh oh!

steakhal commented May 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP][analyzer] Refactor `ExplodedGraph::trim()` #139939

[WIP][analyzer] Refactor `ExplodedGraph::trim()` #139939

github-actions bot commented May 14, 2025 •

edited

Loading

NagyDonat May 15, 2025 •

edited

Loading

NagyDonat May 15, 2025 •

edited

Loading

NagyDonat commented May 17, 2025 •

edited

Loading