Skip to content

Conversation

@NagyDonat
Copy link
Contributor

Because this hack is responsible for use-after-free errors that can trigger assertion failures, as reported e.g. in the github issue #105512

Disabling this performance optimization can cause severe slowdowns (up to +200% runtime) on a few specific translation units, but its effects are barely significant when they are averaged over a whole project: even on projects that are "severely affected" the slowdowns are <20 seconds.

If someone happens to be heavily affected by this performance loss (and doesn't fear the crashes) they can re-enable ExplodedNode reclamation by passing -analyzer-option graph-trim-interval=1000 (the old default) to the analyzer.

This commit is a temporary workaround to eliminate the crashes ASAP; in addition to this, we're also working to develop a long-term solution that can hopefully remove 'graph-trim-interval' altogether and replace it with different solutions that guarantee a good runtime without breaking the invariants of the ExplodedGraph. (It would be much easier to reason about the execution paths if we didn't have to think about the case that some nodes can just disappear.)

Because this hack is responsible for use-after-free errors that can
trigger assertion failures, as reported e.g. in the github issue
llvm#105512

Disabling this performance optimization can cause severe slowdowns (up
to +200% runtime) on a few specific translation units, but its effects
are barely significant when they are averaged over a whole project: even
on projects that are "severely affected" the slowdowns are <20 seconds.

If someone happens to be heavily affected by this performance loss (and
doesn't fear the crashes) they can re-enable ExplodedNode reclamation by
passing -analyzer-option graph-trim-interval=1000 (the old default) to
the analyzer.

This commit is a temporary workaround to eliminate the crashes ASAP; in
addition to this, we're also working to develop a long-term solution
that can hopefully remove 'graph-trim-interval' altogether and replace
it with different solutions that guarantee a good runtime without
breaking the invariants of the ExplodedGraph. (It would be much easier
to reason about the execution paths if we didn't have to think about the
case that some nodes can just disappear.)
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Oct 10, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 10, 2024

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: Donát Nagy (NagyDonat)

Changes

Because this hack is responsible for use-after-free errors that can trigger assertion failures, as reported e.g. in the github issue #105512

Disabling this performance optimization can cause severe slowdowns (up to +200% runtime) on a few specific translation units, but its effects are barely significant when they are averaged over a whole project: even on projects that are "severely affected" the slowdowns are <20 seconds.

If someone happens to be heavily affected by this performance loss (and doesn't fear the crashes) they can re-enable ExplodedNode reclamation by passing -analyzer-option graph-trim-interval=1000 (the old default) to the analyzer.

This commit is a temporary workaround to eliminate the crashes ASAP; in addition to this, we're also working to develop a long-term solution that can hopefully remove 'graph-trim-interval' altogether and replace it with different solutions that guarantee a good runtime without breaking the invariants of the ExplodedGraph. (It would be much easier to reason about the execution paths if we didn't have to think about the case that some nodes can just disappear.)


Full diff: https://github.com/llvm/llvm-project/pull/111843.diff

2 Files Affected:

  • (modified) clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def (+5-3)
  • (modified) clang/test/Analysis/analyzer-config.c (+1-1)
diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
index 737bc8e86cfb6a..ab7d696e595bce 100644
--- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
+++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
@@ -393,9 +393,11 @@ ANALYZER_OPTION(
 
 ANALYZER_OPTION(
     unsigned, GraphTrimInterval, "graph-trim-interval",
-    "How often nodes in the ExplodedGraph should be recycled to save memory. "
-    "To disable node reclamation, set the option to 0.",
-    1000)
+    "[DEPRECATED] When set to a number N > 0, this enables reclamation of "
+    "'unimportant' nodes in the ExplodedGraph once per N node creation steps. "
+    "This reduces the memory usage, but can cause use-after-free errors or "
+    "crashes, so will be phased out.",
+    0)
 
 ANALYZER_OPTION(
     unsigned, MinCFGSizeTreatFunctionsAsLarge,
diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c
index 47594e8317bc79..2b6230f0921e60 100644
--- a/clang/test/Analysis/analyzer-config.c
+++ b/clang/test/Analysis/analyzer-config.c
@@ -84,7 +84,7 @@
 // CHECK-NEXT: experimental-enable-naive-ctu-analysis = false
 // CHECK-NEXT: exploration_strategy = unexplored_first_queue
 // CHECK-NEXT: faux-bodies = true
-// CHECK-NEXT: graph-trim-interval = 1000
+// CHECK-NEXT: graph-trim-interval = 0
 // CHECK-NEXT: ignore-bison-generated-files = true
 // CHECK-NEXT: ignore-flex-generated-files = true
 // CHECK-NEXT: inline-lambdas = true

@NagyDonat
Copy link
Contributor Author

Feedback about this topic in the parent issue:

@steakhal at #105512 (comment)

I can stand by disabling node reclamation given that our measurements were correct - so I'd like someone else to repeat the measurements at some scale to gain confidence that our decision is based of real datapoints. That measurements should also include ffmpeg (as a project showcasing the slowdown), and confirm that even in worst case translation unit analysis times behave as we described.

Assuming the observed numbers align with the expectation, we can turn it off by default.

@xazax at #105512 (comment)

Note that this is not just about performance, but about memory. If the change in the memory consumption is significant, that might be the difference whether some users can do -j 12 or -j 8 before running out of memory which would be a huge difference in the throughput. I don't think we can make this decision without looking at the memory consumption.

@steakhal
Copy link
Contributor

If someone happens to be heavily affected by this performance loss (and doesn't fear the crashes) they can re-enable ExplodedNode reclamation by passing -analyzer-option graph-trim-interval=1000 (the old default) to the analyzer.

Correct me if I'm wrong, but this is an assertion failure. That said, these nodes come from a pool that doesn't get deallocated, but all at once (BumpPtrAllocator). Consequently, all pointers point to allocated memory, regardless if the object's destructor was called. In practice, this means that this issue can't cause segfaults. We may get surprising (but valid) states super rarely, but that can't manifest crashes. The issue the user reported underpins this as that refers to the assertion.

So, to me, this isn't an urgent or important issue that would deserve immediate actions. One such equally good option would be removing that assert.
Despite all I said here, I'm still very much looking for the day disabling node reclamation. I just don't get the urgency.

@steakhal
Copy link
Contributor

And the message I wanted to share xD

Is it a possible way forward dropping that assert?

@NagyDonat
Copy link
Contributor Author

Is it a possible way forward dropping that assert?

The function that performs the assertion is not part of the static analyzer, it's a generic graph algorithm from an LLVM support library and the assertion seems to be a really obvious sanity check. I don't think that it's reasonable to remove the assertion months after it was added.

@steakhal
Copy link
Contributor

Is it a possible way forward dropping that assert?

The function that performs the assertion is not part of the static analyzer, it's a generic graph algorithm from an LLVM support library and the assertion seems to be a really obvious sanity check. I don't think that it's reasonable to remove the assertion months after it was added.

Ah, yes. Well, thats makes this choice easier. Somehow I misremembered. Thanks for insisting!

@NagyDonat NagyDonat closed this Nov 21, 2024
@NagyDonat
Copy link
Contributor Author

It turns out that this commit increases the runtime by ~3% on a set of 6-8 open source projects that we used as a benchmark. (The testing was done on our local fork by @gamesh411, he can provide more accurate data if needed.) Based on this, we decided to temporarily disable the assertion on our downstream branch (instead of using this patch), so I'm closing this PR because if we don't use it ourselves, then I won't try to sell it to the open source community.

I hope that eventually there will be a permanent solution for this issue, but I don't know exactly when. This task is assigned to an intern (Gábor Tóthvári) who is capable and diligent, but has other responsibilities (e.g. university studies) so doesn't work on this full time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:static analyzer clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants