Bounds declarations

Overview

Programmers declare bounds for variables and members with _Array_ptr, _Nt_array_ptr, or checked array types as part of variable and member declarations. The bounds for a variable or member are declared by following the declarator for the variable or member with a ':' and a bounds expression.
The bounds declaration precedes any initializer.

There are 4 kinds of bounds expressions:

count(e): the number of array elements that a pointer points to.
byte_count(e): the number of bytes that a pointer points to.
bounds(lower, upper): the lower and upper bound for a pointer. For an _Array_ptr variable, memory at or above _lower and below upper can be accessed. For an _Nt_array_ptr variable, the element beginning at upper can be read or a null value can be stored there.
bounds(unknown): the pointer has unknown bounds and cannot be used to access memory. Count and bytecount are syntactic sugar for bounds.

Bounds expressions are evaluated as part of bounds checking memory accesses, so they are not allowed to modify memory.
They cannot be or contain assignment expressions, pre/post increment/decrement expressions, or function calls.

When a bounds expression is declared for a parameter, it can refer to any other parameter in the parameter list. The entire parameter list is parsed and then type checking of bounds expressions and checking of other well-formedness conditions is done.

Programmers can also declare bounds for the return value of a function. The return bounds are declared by following the function declarator with a ':' and a bounds expression. The return bounds can refer to any parameter. They can also refer to the special symbol _Return_value, which is the return valoe of the function.

Examples

// Function taking pointer to 5 integers.
int f(_Array_ptr<int> p : count(5));

// Equivalent declaration using byte_count.
int f(_Array_ptr<int> p : byte_count(5 * sizeof(int));

// Function taking pointer to len integers.
int g(_Array_ptr<int> p : count(len), int len);

// Equivalent declaration using an array tpe.
// Parameters with array types are treateed as
// having pointer types.
int g(int p _Checked[] : count(len), int len);

// Function returning pointer to len integers.
_Array_ptr<int> h(int len) : count(len);

// Global variable pointing to 5 integers, initialized to
// null.
_Array_ptr<int> r : count(5) = 0;  

// Variable length array declaration.
struct VarArr {
    int len;
    int arr _Checked[] : count(len);
};
z
## Keeping bounds declarations valid

A bounds declaration for a variable or an object member is valid:
* If the declared bounds are within the bounds of an object at runtime.
* Or the variable or object member is null.  Null pointers cannot
  be used to access memory, so the bounds possibly being out-of-range
  of an object does not matter.

A bounds declaration must be valid for the lifetime of its
variable or object member.  All variables with bounds declarations
must be initialized at declaration time.  This also
includes struct variables that have members with bounds.
Struct variables can be initialized by declaring a value
for the first member of the struct.  The remaining values
will be initialized to default values, which for pointers is
null:

VarArr va = { 0 };

Heap-allocated data must be allocated
with calloc or the programmer must ensure that it is initialized
before it is used (we will eventually add checking that enforces this,
but it isn't done yet).

Declared bounds must be valid after the end of every
statement, unless the statement is in `bundled` scope.
Declared bounds must be valid at the end of `bundled` scopes.
If there is an assignment to a variable with a bounds
declaration, or an assignment to a variable used by
an in-scope bounds declaration, the programmer must
ensure the declare bounds are valid at the end of
the statement or `boundled` scope.

The compiler checks statically that declared bounds are
valid.  The static checking algorithm is described
in [Bounds declaration checking](Bounds-declarations#Bounds-declaration-checking).
It is not finished
yet, so currently the compiler issues a warning if it
cannot prove that bounds are valid.  Eventually the
compiler will issue an error.   A programmer can
use a dynamic runtime check to eliminate the warning
(or error).

A program can create a pointer to a variable or member by taking its address.
For all practical purposes, we do not allow pointers to variables or members that
have bounds or are used in bounds to be created.  This prevents bounds declaration
checking from being bypassed via pointers.
The result of an address-of operation for a variable or member 
that has bounds or is used in bounds and that does not have array type
(or is a pointer derived from such an operation), cannot be
used as:
-  the right-hand side of an assignment or as an argument to a call.
The pointer value could escape to an unknown use.
- In an lvalue expression that is assigned to.  This would
modify the variable or member.

_Array_ptr p : count(len) = ... _Array_ptr r : count(len) = *&p; // legal _Ptr<Array_ptr> pp = cond ? &p : &r; // not allowed. _Ptr plen = &len; // not allowed *&p = r; // not allowed

If a programmer uses unchecked pointers to modify 
a variable or member with bounds, it is the programmer's
responsibility to maintain the validity of the
bounds declaration.  For example, a programmer
could take the address of a member of a structure
without bounds or not used in bounds, and use it to reach an
and modify adjacent member in the structure.  This cannot be
done using checked pointers because the write to the
adjacent member will not be allowed (it will fail the
bounds check), but it could be done using unchecked pointers.

## Narrowing and widening of bounds

It is OK to narrow the bounds of a pointer.   This is useful when a function only
operates over a portion of data.  Consider a function that checks that integrity
of data in a buffer.  The checksum is stored in the first 4 bytes of a buffer and the
data is in the remainder of the buffer.

int convert_to_int(Array_ptr start : count(len), int len);

int checksum(_Array_ptr start : bounds(start, end), _Array_ptr end);

int verify_data(_Array_ptr buf : bounds(buf, end), _Array_ptr end) { if (buf + 4 > end) return 0;

int target = convert_to_int(buf, 4); int computed = checksum(buf + 4, end); return computed == target; }

Bounds of _Nt_array_ptr pointers can be widened if the value at the upper bound
is non-null (note that this widening is not yet implemented in the compiler!).

_Nt_array_ptr p : count(0); if (*p) { _Nt_array_ptr r : count(1) = p; ... }


## Bounds declaration checking

Bounds declarations are checked at variable declarations, after
assignments, and at function calls.  The validity of the
bounds declaration must follow from the constructed
bounds for expressions and additional program 
facts gathered by dataflow analysis.  The facts may
include equality facts as well as relational facts.

If the compiler cannot prove the validity of the
bounds, a programmer can insert a dynamic runtime check
to avoid a compiler warning.   The static checking
algorithm that Checked C uses is incomplete, and there
will be facts about bounds that are true but cannot 
be proved at compile-time.

For a variable declaration, the compiler constructs
the bounds for the initializer expression  It then
checks that those bounds imply declared bounds:

_Array_ptr x : count(10) = ... _Array_ptr y : count(5) = x;


For assignments, the compiler checks that the bounds
of variables and members are valid after expression
statements.  In the simplest case, the bounds of
the right-hand side expression of an assignment
imply the bounds for a variable or a member.

_Array_ptr x : count(10) = ... _Array_ptr y : count(5) = ... y = x;

C allows multiple assignments within a single
statement.  In more complex cases, a variable
with a bounds and variables used in bounds are
updated.

_Array_ptr x : count(x_len) = ... _Array_ptr y : count(y_len) = ...

x = y , x_len = y_len;


For function calls, the compiler checks that the bounds of
argument expressions imply parameter bounds.  The compiler
does this by "translating" the parameter bounds into
bounds at the call site.   It substitutes the actual
arguments into the parameter bounds.  If the arguments
have side-effects, the compiler introduces (symbolic)
temporaries.

### Reasoning about bounds expressions.

The bounds declaration checking algorithm reasons about:
1. Expressions that are equivalent (always compute the same value.)
2. Constant-sized ranges.  These are bounds that can be expressed
  in the form (e1 + c, e1 + d), where c and d compile-time constants.

For determining expression equivalence, the compiler starts with a list
of sets of expressions that are known to evaluate to equal
values (for example, given the
assignment `x = y`, the compiler knows after the assignment that x == y).
Two expressions are equivalent if:
1. They are syntactically identical.
2. They are both in the same set of expressions known to be equal.
3. They constant-fold to the same value.
4. The expressions both use the same operator, each of their
operands is equivalent, and the corresponding operators have
matching types.
5. One expression is parenthesized and its subexpression is equivalent
to the other expression.
6. One expression is a value-preserving operation and its subexpression is
equivalent to the other expression.  Value-preserving operations include:
- Parenthesis.
- C-style casts that preserve bitwise identity (such as casts between
  pointer types).   
- `*` and `&` applied to function and pointer types.

To see if one constant-sized range (e1 + c1, e1 + d1) 
implies the validity (or falseness)
of another constant-sized range (e2 + c2 + d2), the compiler checks if the base expressions
are equivalent.  If they are not equivalent, no conclusion can be
drawn about truth or falseness.  If they are equivalent, the
compiler checks that c1 <= c2 and d1 >= d2.

### Bounds declaration checking after assignments

To check bounds declarations after assignments, we begin with
bounds that are true in the program state before the
assignments (the "before state").
We then translate that information to the program state
after the assignments .  That way we have bounds in
the "after" state that we can use to reason about the
validity of bounds declarations there.

This done in 4 steps:
1. Gather the set of required bounds for any variables or members that
are assigned to in the statement.   If an assignment is made
to a variable or member with bounds, the bounds are included
in the set.  Also, if an assignment is made to any variable used
in a in-scope bounds declaration, those bounds are included
in the set.  These bounds must be true about the "after" program
state.
2. Compute the bounds for the right-hand sides of any
assignments.  These are bounds that true in the "before" 
program state.
3. Update the bounds obtained in step 2 based on the
effects of assignments in the statement.  These are now bounds
true in the "after" program state.
4. Check that the updated bounds along with dataflow
facts imply the required bounds.

Step 3 involves two steps:
1. Some assignments have right-hand sides that are
_invertible expressions_.  The prior value of the expression can be
calculated from the new value.  For example, given `x = x + 1`,
the prior value of `x` is `x - 1`. For those assignments,
the original variable is replaced with the inverted expression.
2. For all other assignments, any bounds using a variable
updated by those assignment are changed to `bounds(unknown)`

Checked C Wiki

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bounds declarations

Overview

Examples

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally