TBAA #617

ThomasHaas · 2024-02-16T10:48:28Z

This PR adds a simple type-based alias analysis (TBAA) using LLVM TBAA metadata generated by Clang.
The analysis is enabled by default and runs in addition to the other core alias analysis.

Overall issues:

To generate TBAA metadata in the first place, Clang needs to run with at least -O1. Since we do not want to have -O1 optimizations, we can use clang -O1 -Xclang -disable-llvm-passes to disable the -O1 LLVM passes. However, Clang still seems to produce slightly different code and I'm not sure how to avoid that (-O3 seemed to work better???).
TBAA metadata is only generated for direct accesses in the C-code. In particular, r = atomic_load(&x) does not generate TBAA metadata whereas r = x does, even if both are translated to the same code (assuming x is marked as atomic)
TBAA metadata is quite weak: it currently does not distinguish between unions, arrays, different pointer types, and nested pointers. This is a long standing issue (for over 10 years already), but there seems to be some recent traction: here and here

As a result of the above, TBAA seems to add no more precision than the current analysis (only tested on EBR so far, since our internal benchmarks are generated without TBAA metadata).
A case where it could be more precise is if the code uses complex pointer tagging which confuses our standard analysis. But for code that does not do this, TBAA might be useless.

Side notes:
Running Clang with higher optimization level also generates other metadata, in particular, !range metadata that limits the outcome of some loads from memory. It seems to give bounds for indices that are loaded to access an array.
This could be valuable both for generating smaller encodings (less bitwidth) and for improving aliasing precision for dynamic indexing into arrays.

Added TBAA as combined analysis to other alias analyses (for testing)

More efficient TBAA alias check.

hernanponcedeleon · 2024-02-26T22:37:49Z

I still have not manage to find an example where this new alias pays off: while we have a more precise analysis, the extra optimization makes the program harder for the solver.

Here are the numbers for harris.c with B=2 ... the mayset of rf is better, but the one of co is worse and the overall solving time is terribly worse.

java -jar dartagnan/target/dartagnan.jar cat/aarch64.cat --target=arm8 --solver=yices2 harris-opt.ll --bound=2
...
#Events: 5108
...
        #must|may rf edges: 3|21121
        #must|may rf edges: 3|21121
        #must|may co edges: 46|3346
        #must|may co edges: 46|3346
...
 ======== Summary ========
Number of iterations: 111
Total native solving time(ms): 931171
   -- Bound check time(ms): 369335
Total CAAT solving time(ms): 4341
   -- Model extraction time(ms): 1118
   -- Population time(ms): 1291
   -- Consistency check time(ms): 19
   -- Reason computation time(ms): 1037
   -- Refining time(ms): 3615
   -- #Computed core reasons: 18996
   -- #Computed core reduced reasons: 12353
   -- Min model size (#events): 143
   -- Average model size (#events): 266
   -- Max model size (#events): 321

[26.02.2024] 11:51:09 [DEBUG] RefinementSolver.runInternal - 
 ===== SMT Statistics (after final iteration) ===== 

[26.02.2024] 11:51:09 [INFO] RefinementSolver.runInternal - Verification finished with result UNKNOWN
UNKNOWN
Total verification time(ms): 948712

java -jar dartagnan/target/dartagnan.jar cat/aarch64.cat --target=arm8 --solver=yices2 harris-O3-opt.ll --bound=2
...
#Events: 5330
...
        #must|may rf edges: 3|19946
        #must|may rf edges: 3|19946
        #must|may co edges: 67|3451
        #must|may co edges: 67|3451
...
 ======== Summary ========
Number of iterations: 199
Total native solving time(ms): 29629180
   -- Bound check time(ms): 9406406
Total CAAT solving time(ms): 11840
   -- Model extraction time(ms): 2796
   -- Population time(ms): 3364
   -- Consistency check time(ms): 42
   -- Reason computation time(ms): 3889
   -- Refining time(ms): 6040
   -- #Computed core reasons: 40707
   -- #Computed core reduced reasons: 26772
   -- Min model size (#events): 217
   -- Average model size (#events): 276
   -- Max model size (#events): 383

[26.02.2024] 10:27:52 [DEBUG] RefinementSolver.runInternal - 
 ===== SMT Statistics (after final iteration) ===== 

[26.02.2024] 10:27:52 [INFO] RefinementSolver.runInternal - Verification finished with result UNKNOWN
UNKNOWN
Total verification time(ms): 29669776

harris.zip

ThomasHaas · 2024-02-26T22:53:00Z

As long as we cannot get TBAA without adding further optimizations, I agree that this is pointless. We cannot do fair comparisons this way.

hernanponcedeleon · 2025-03-12T13:24:26Z

can this be closed?

ThomasHaas added 3 commits February 25, 2024 22:06

First commit for TBAA (contains almost all functionality)

1029e06

TBAA implements AliasAnalysis now.

76d01bc

Added TBAA as combined analysis to other alias analyses (for testing)

More cleanup.

3a92996

More efficient TBAA alias check.

hernanponcedeleon force-pushed the tbaa branch from 62cab51 to 3a92996 Compare February 26, 2024 01:06

ThomasHaas closed this Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TBAA #617

TBAA #617

Uh oh!

ThomasHaas commented Feb 16, 2024

Uh oh!

hernanponcedeleon commented Feb 26, 2024

Uh oh!

ThomasHaas commented Feb 26, 2024

Uh oh!

hernanponcedeleon commented Mar 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TBAA #617

TBAA #617

Uh oh!

Conversation

ThomasHaas commented Feb 16, 2024

Uh oh!

hernanponcedeleon commented Feb 26, 2024

Uh oh!

ThomasHaas commented Feb 26, 2024

Uh oh!

hernanponcedeleon commented Mar 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants