Skip to content

Conversation

@resistor
Copy link
Collaborator

These aren't an issue on big CHERI where representable offsets usually extend well beyond bounds, but on CHERIoT our representation limits are quite tight, so this is a common source of bugs.

…out-of-bounds intermediate capabilities.

These aren't an issue on big CHERI where representable offsets usually extend well beyond bounds, but on CHERIoT our representation limits are quite tight, so this is a common source of bugs.
@resistor
Copy link
Collaborator Author

Added a warning for common variants of this in #284

@davidchisnall
Copy link

I expect a lot of false positives from these, which is why I thought they’d make sense as static analysis things: there’s more dataflow and a bit of symbolic execution available to give a bit more precision there.

@resistor
Copy link
Collaborator Author

resistor commented Dec 1, 2025

I expect a lot of false positives from these, which is why I thought they’d make sense as static analysis things: there’s more dataflow and a bit of symbolic execution available to give a bit more precision there.

FWIW, this warning doesn't trigger on the RTOS or the network-stack demos, but perhaps those are written in a CHERI-amenable style already? Is there something else that would be good to evaluate it on?

I agree that a static analyzer check would be more powerful, but I think the value proposition is harder there. I expect fewer people to run clang-tidy than run just clang, and I particularly doubt many people will run it during the initial porting of a code base.

@davidchisnall
Copy link

Can you try https://github.com/pq-code-package/mlkem-c-embedded (mlkem and fips202 directories)?

That codebase contains things that we know are broken, let's see if it finds the broken things without false positives.

@resistor
Copy link
Collaborator Author

resistor commented Dec 1, 2025

Do you have a build system for this?

@davidchisnall
Copy link

Only locally, but you can just point xmake at those files.

@davidchisnall
Copy link

	add_includedirs("mlkem-c-embedded/fips202", ".")
	add_files(
		"mlkem-c-embedded/fips202/fips202.c",
		"mlkem-c-embedded/fips202/keccakf1600.c",
		"mlkem-c-embedded/mlkem/cbd.c",
		"mlkem-c-embedded/mlkem/indcpa.c",
		"mlkem-c-embedded/mlkem/kem.c",
		"mlkem-c-embedded/mlkem/matacc.c",
		"mlkem-c-embedded/mlkem/ntt.c",
		"mlkem-c-embedded/mlkem/poly.c",
		"mlkem-c-embedded/mlkem/polyvec.c",
		"mlkem-c-embedded/mlkem/reduce.c",
		"mlkem-c-embedded/mlkem/symmetric-shake.c",
		"mlkem-c-embedded/mlkem/verify.c")

@davidchisnall
Copy link

randombytes.h:

#pragma once

#include <stdint.h>
#include <stddef.h>

__BEGIN_DECLS

int randombytes(uint8_t *output, size_t n);

__END_DECLS

@resistor
Copy link
Collaborator Author

resistor commented Dec 2, 2025

Good new: no warnings are triggered on mlkem-c-embedded
Bad news: no warnings are triggered on mlkem-c-embedded

The original diagnosis of this code was wrong: the array in question has 256 elements, so the offsetting by -4 does not make it statically obvious that the array would go out of bounds.

@davidchisnall
Copy link

Yup, the original bug was due to base + index - constant, which is always safe as base + (index - constant), but not necessarily safe as either base + index - constant or base - constant + index.

@davidchisnall
Copy link

Further diagnosis:

The mlkem-c-embedded example actually is fine as written. Unfortunately, the compiler does the -4 before the + ctr in the generated code. The evaluation order of expressions like this is implementation defined, so maybe we could just do the thing less likely to go wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants