Skip to content

Conversation

hanno-becker
Copy link
Contributor

  • Change mlk_polyvec back to struct { mlk_poly vec[MLKEM_K]; }
  • Change mlk_polymat to struct { mlk_polyvec vec[MLKEM_K]; }
  • Update all function signatures to use pointer style
  • Fix all implementations to use struct member access
  • Update tests, benchmarks, and CBMC harnesses
  • Add consistent const annotations

Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate why you would like to make this change?

@hanno-becker
Copy link
Contributor Author

hanno-becker commented Jul 10, 2025

@rod-chapman suggested that this unblocks #906. Adjusting it to check if that's true.

@mkannwischer
Copy link
Contributor

@rod-chapman suggested that this unblocks #906. Adjusting it to check if that's true.

Should this be converted to a draft PR in that case?

@hanno-becker
Copy link
Contributor Author

@mkannwischer We could. One could make the argument, though, that the only reason to introduce mlk_polymat/vec as a array was to fix the UB in gen_matrix(), which is now moot given @rod-chapman's #1112.

hanno-becker and others added 3 commits July 25, 2025 08:20
- Change mlk_polyvec back to struct { mlk_poly vec[MLKEM_K]; }
- Change mlk_polymat to struct { mlk_polyvec vec[MLKEM_K]; }
- Update all function signatures to use pointer style
- Fix all implementations to use struct member access
- Update tests, benchmarks, and CBMC harnesses
- Add consistent const annotations

Signed-off-by: Hanno Becker <[email protected]>
Previously, the base multiplication would assume that one of its inputs
is bound by 4096 in absolute value, but make no assumptions about the
other input ("b-input" henceforth) and its mulcache.

This commit refines the bounds slightly, as follows:
- The b-input is assumed to be bound by MLK_NTT_BOUND in absolute value.
  This comes for free since all values for b _are_ results of the NTT.
- The b-cache-input is assumed to be bound by MLKEM_Q in absolute value.

With those additional bounds in place, it can be showed that the result
of the base multiplication is below INT16_MAX/2 in absolute value.
Accordingly, this can be added as a precondition for the inverse NTT.

Those refined bounds can help in subsequent commits to improve the
reduction strategy inside the inverse NTT.

For the native AVX2 backend, the new output bound for the mulcache
forces an explicit zeroization of the mulcache. This is not ideal
since the cache is in fact entirely unused, but the performance
penalty should be marginal (if the compiler can't eliminate the
zeroization in the first place).

Signed-off-by: Hanno Becker <[email protected]>
@rod-chapman
Copy link
Contributor

Does this change improve performance and/or stability of proof on platforms that we care about (by which I really mean - AArch64/macOS, Intel/Linux, EC2 Graviton/Linux, and "whatever our CI runners are running..."?

I assume performance impact on generated code is negligible?

@hanno-becker
Copy link
Contributor Author

Closing this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants