-
Notifications
You must be signed in to change notification settings - Fork 1.8k
PoC: Integer range analysis #15342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
PoC: Integer range analysis #15342
Conversation
Lintcheck changes for fe13b7d
This comment will be updated if you push new changes |
This is something I've wanted to add for a while, but other things have been a higher priority. The analysis itself should be done as a MIR analysis rather than working on the HIR tree. See Some notes for implementing this:
|
@Jarcho Is that even possible? Note that I am very new to rustc and clippy development, so what I'm about to say stems from me looking into MIR (mostly reading this and this and playing around with it) and only really knowing about the parts of clippy and the compiler that I have touched so far. I currently don't see how this will work for the following reasons:
Do we have enough control over how the MIR is generated to address these issues? Especially issue 2. If the MIR has overflow checks, then the results of the analysis will be plain wrong for release builds. This would kill the idea.
I have a question about loops. From my (admittedly shaky) understanding, fn foo() -> u8 {
let mut x: u32 = 1;
while x != 0 {
x += 1;
}
x as u8
} This function either returns 0 after Obviously, this is completely unusable. My first thought for fixing this was to do a small fixed number of iterations (maybe 3) and set any locals that haven't converged to the full range of their type, but I'm not sure if this is the best idea. My question is (1) whether my understanding is broadly correct and (2) whether there are better ways of ensuring quick convergence? |
I'm not sure where in the compiler is done. I think it's a MIR optimization and we disable those when running clippy. At some point this will change to running before optimizations.
We run before without MIR inlining so you can treat them as the same. The only way they'll appear is if someone calls an intrinsic directly which can be not a well supported thing.
See previous responses. We don't run with optimizations.
MIR has some span information so you can detect if the
That is correct. A naive implementation would indeed take a long time to run. You can use |
Alrighty, then this seems pretty doable to me now. I should be able to get something working in a few days.
This kind of conflicts with what docs are saying and what I see on rust playground, but maybe even debug builds do some amount of MIR inlining. I'll figure it out. That said, clippy doing no amount of inlining and representing operators as calls to their trait impls in MIR would be my ideal scenario. |
Operators (as in the actual operators, not calls to the trait methods) are not lowered to function calls for primitive types. That's different from inlining. |
When I say treat them as the same I mean you can treat the |
This PR is a proof of concept for using integer range analysis to improve existing rules. I focused on the lints
cast_possible_truncation
,cast_possible_wrap
, andcast_sign_loss
for now, but range analysis is potentially useful in other existing rules as well.The range analysis itself is rather simplistic right now. The possible values of an expression are represented using a closed interval with the
IInterval
type. This representation is applicable to all integer expression and variables, even those we know nothing about except for the type. E.g. a function parameterx: u8
will have the range0..=255
, and the expressionx / 3 + 1
will have the range1..=86
.The interval arithmetic necessary to implement operations like addition, subtractions, and many std integer methods is currently implemented by the
Arithmetic
type. I already implemented most methods that return integers orOption<int_type>
.Example
While only a limited proof of concept, it is already enough to fix false positives in
cast_possible_truncation
such as #7486. In the following, I annotated the computed interval ranges for the code of the issue:As we can see, the range analysis is capable enough to determine that the
result
variable can only hold values in the range0..=u32::MAX-1
despite being of typeu64
. This is enough to eliminate the false positive.Here's a slightly more complex example where I annotated all expressions:
A few details
I want to mention a few details to hopefully make what I implemented more understandable for potential reviewers.
IInterval
doesn't just represent an integer interval, but a typed interval. I omitted types in the above annotations, but allIInterval
s know the underlying integer type.IInterval
s can be empty. This is important for operations that can panic. E.g.1_u32.strict_sub(2_u32)
will always panic, so it will be assigned the empty range (u32).All interval operations guarantee that they return a superset of the actual result. E.g. for
1 + 1
, the actual result interval is1..=1
, but the add implementation is allowed to return any superset of it, e.g.0..=255
. This is both necessary and useful.It's necessary because some operations don't output clean intervals. E.g.$\set{0,255}$ is
u8::wrapping_add(value, 1)
wherevalue
is254
or255
will result in the outputs255
or0
respectively. However, the smallest interval that contains the set0..=255
— a superset.It's useful, because it makes operations easier to implement. Some operations are very complex to implement, so allowing any superset makes it possible to start with a very simple implementation that returns a large superset and then later improve it.
Arithmetic
has both methods and static functions. This is because the dataArithmetic
carries represents compiler flags/options. If an operation depends on any flags/options, it's a method, and a static function otherwise. E.g.abs()
may panic or wrap depending on whether overflow checks are enabled, so it's a method, whilestrict_abs()
always panics, so it's a static function.IntervalCtxt
is the type that implements recursively evaluating expression intervals (inspired byConstEvalCtxt
).I want to stress that this is a proof of concept. This PR is in no state to be merged right now. Before I invest more time into this, I would like to ask (1) whether there is interest in clippy doing this type of range analysis in the first place and (2) whether there are people that could help me with the clippy side of things?
The main thing I need help with is testing (I think). Specifically:
I used property tests and snapshot tests to ensure the correctness of
Arithmetic
andIInterval
, but I'm not sure how to best integrate them. I say "used", because I initially worked onIInterval
andArithmetic
to be their own crate, but then I just copy-pasted the code for the sake of this PR. However, I didn't copy the tests.I don't care whether this code ends up inside clippy or as its own crate, I just want well-tested working code to make clippy smarter. So I would like to ask (1) whether
rinterval
should be its own crate (probably not), and if not (2) what's the best way to add the snapshot tests I had?I want to test the recursive expression evaluation too. I imagine snapshot tests similar to
uitest
, where I can just define a few source files and it will output a snapshot whether the ranges of all expressions are annotated, similar to what I did for the above examples. Is there any existing infrastructure I could use?Of course, other feedback and suggestions are also appreciated.
Before this PR is ready, some work still needs to be done:
1u64 as isize
->u64::MAX as isize
.usize
. I currently userustc_lint::context::LateContext::data_layout().ptr_sized_integer()
to mapusize
to a fixed-width integer type. Same forisize
. This works, but also means that the interval analysis can't be used to determine the intervals for 32-bit AND 64-bit system, only one is possible. This can potentially lead to false positives and false negatives in lints.I also want to point out that I see what I did in this PR as an MVP. In the future, I would like to make the range analysis even smarter (e.g. control-flow-based narrowing) and add floating-point ranges. Of course, I also want to use range analysis to improve other existing rules (which may even be done in this PR, idk).
Please tell me what you think.