Skip to content

Conversation

@shuklaayush
Copy link
Collaborator

@shuklaayush shuklaayush commented Dec 30, 2025

Closes INT-5017

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

shuklaayush added a commit that referenced this pull request Jan 2, 2026
- limit threads launched by mulh cuda kernel
- use `inline constexpr` for compile time constants instead of `static
const` in cuda kernels

these are unrelated changes picked from #2338
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 5, 2026

CodSpeed Performance Report

Merging #2338 will degrade performance by 82.87%

Comparing develop-new-keccak (7571c7e) with develop-v1.6.0 (6dc3800)1

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

❌ 8 regressions
✅ 16 untouched
⏩ 36 skipped2

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Mode Benchmark BASE HEAD Efficiency
WallTime benchmark_execute_metered[quicksort] 9 ms 44.7 ms -79.78%
WallTime benchmark_execute_metered[bubblesort] 11.3 ms 46.7 ms -75.89%
WallTime benchmark_execute_metered[revm_snailtracer] 7.3 ms 42.8 ms -82.87%
WallTime benchmark_execute_metered[revm_transfer] 24.9 ms 59.9 ms -58.44%
WallTime benchmark_execute_metered[sha256] 9.3 ms 44.3 ms -78.99%
WallTime benchmark_execute_metered[fibonacci_recursive] 16.9 ms 52.7 ms -67.85%
WallTime benchmark_execute_metered[keccak256] 11.1 ms 46.8 ms -76.27%
WallTime benchmark_execute_metered[fibonacci_iterative] 14.3 ms 49.1 ms -70.84%

Footnotes

  1. No successful run was found on develop-v1.6.0 (3026a3f) during the generation of this report, so 95fdcd5 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

  2. 36 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Maillew and others added 3 commits January 5, 2026 05:46
## Overview of new design
Implemented a new design for constraining the SHA-2 family of hash
functions (specifically SHA-256, SHA-512, SHA-384). The new design adds
incremental hasher functionality, which means we can compute the hash of
a stream of bytes. More specifically, the new `Sha256`, `Sha512`, and
`Sha384` structs provided in the SHA-2 guest library provide the
`update(&[u8])` and `finalize() -> [u8; HASH_SIZE]` methods. We can
instantiate a hasher object, `let hasher = Sha256::new()` and then call
`hasher.update(data)` as many times as we want on it. The `data`
parameter can be a slice of any size. When we would like to retrieve the
hash, we can call `hasher.finalize()`.

### Main Idea
The `Sha256` struct in the SHA-2 guest library maintains an array of
bytes that serves as the internal state of the SHA-2 hashing algorithm.
This array is updated using a new opcode: `SHA256_UPDATE dst src input`
which takes in one block of input and pointers to the src/dst hasher
states (the guest library sets `src == dst` for updating the state
in-place). The `Sha256` struct will buffer up to one block of input, and
it will call `SHA2_UPDATE` when necessary to absorb the input into the
state.

The `Sha512` and `Sha384` structs are implemented similarly.

### Interoperability
The `Sha256`, `Sha512`, `Sha384` structs implement the `sha2::Digest`
trait, allowing them to be used as a drop-in replacement for the popular
`sha2` crate's `sha2::{Sha256, Sha512, Sha384}` hasher objects.

### Documentation
The OpenVM book, specs, and the crate docs have been updated.
Additionally, a brief justification of soundness for the main
constraints has been added.

### Tests
All the SHA-2 guest library integration tests and the SHA-2 circuit unit
tests pass on CI.
Both CPU and GPU trace generation is tested among these tests.

closes INT-4972 INT-5023 INT-5024 INT-5025 INT-5026

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Jonathan Wang <[email protected]>
Co-authored-by: Ayush Shukla <[email protected]>
Co-authored-by: Alexander Golovanov <[email protected]>
Co-authored-by: Arayi Khalatyan <[email protected]>
Co-authored-by: Ayush Shukla <[email protected]>
Co-authored-by: Xinding Wei <[email protected]>
Co-authored-by: Lun-Kai Hsu <[email protected]>
Co-authored-by: Arayi <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: crStiv <[email protected]>
Co-authored-by: HrikB <[email protected]>
Co-authored-by: Yi Sun <[email protected]>
Co-authored-by: stephenh-axiom-xyz <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Teo Kitanovski <[email protected]>
Co-authored-by: Valery Cherepanov <[email protected]>
Co-authored-by: Peyman Jabbarzade <[email protected]>
Co-authored-by: Valery <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: William Lin <[email protected]>
A description of the `HintNonQr` and `HintSqrt` phantom instructions in
the algebra extension was missing from instruction reference. This PR
adds it.
@github-actions

This comment has been minimized.

@shuklaayush shuklaayush changed the base branch from main to develop-v1.6.0 January 5, 2026 09:07
GunaDD and others added 6 commits January 5, 2026 09:11
New Keccak with Xorin and Keccakf chip and opcode

- [ ] I have performed a self-review of my own code
- [ ] Add negative tests for xorin chip
- [ ] Add negative tests for keccakf chip
- [x] Add unit test to CI
- [x] Add new guest code for E2E test to CI (the keccak example is
updated, but I am thinking of adding another one)
- [ ] Check with Ayush if I implemented the SizedRecord trait correctly
- [ ] Rebase to include Zach's new Plonky3 update and update the keccakf
trace gen to not have to transpose any more before giving it into the
input
- [ ] Maybe add comments to justify the correctness of the
constrain_input_read and constraint_output_write function

To reviewer: I will still have to complete the above checklist. But you
can start reviewing if you would like to.

Closes INT-5017, INT-5721, INT-5720, INT-5718, INT-5717, INT-5646,
INT-5018
Closes INT-5779

---------

Co-authored-by: Ayush Shukla <[email protected]>
@github-actions

This comment has been minimized.

@github-actions
Copy link

github-actions bot commented Jan 5, 2026

group app.proof_time_ms app.cycles app.cells_used leaf.proof_time_ms leaf.cycles leaf.cells_used
verify_fibair 226 322,610 2,058,654 - - -
fibonacci 1,062 1,500,269 2,102,750 - - -
regex 2,441 4,137,512 17,686,228 - - -
ecrecover 761 122,919 2,265,656 - - -
pairing 1,478 1,745,757 25,464,050 - - -

Commit: 7571c7e

Benchmark Workflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants