-
Notifications
You must be signed in to change notification settings - Fork 84
feat: split keccak chip into xorin + keccakf #2338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop-v1.6.0
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
659cea1 to
01be16e
Compare
This comment has been minimized.
This comment has been minimized.
01be16e to
29ca86b
Compare
This comment has been minimized.
This comment has been minimized.
- limit threads launched by mulh cuda kernel - use `inline constexpr` for compile time constants instead of `static const` in cuda kernels these are unrelated changes picked from #2338
29ca86b to
be4d2d8
Compare
This comment has been minimized.
This comment has been minimized.
be4d2d8 to
4c32ab4
Compare
This comment has been minimized.
This comment has been minimized.
CodSpeed Performance ReportMerging #2338 will degrade performance by 82.87%Comparing
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | WallTime | benchmark_execute_metered[quicksort] |
9 ms | 44.7 ms | -79.78% |
| ❌ | WallTime | benchmark_execute_metered[bubblesort] |
11.3 ms | 46.7 ms | -75.89% |
| ❌ | WallTime | benchmark_execute_metered[revm_snailtracer] |
7.3 ms | 42.8 ms | -82.87% |
| ❌ | WallTime | benchmark_execute_metered[revm_transfer] |
24.9 ms | 59.9 ms | -58.44% |
| ❌ | WallTime | benchmark_execute_metered[sha256] |
9.3 ms | 44.3 ms | -78.99% |
| ❌ | WallTime | benchmark_execute_metered[fibonacci_recursive] |
16.9 ms | 52.7 ms | -67.85% |
| ❌ | WallTime | benchmark_execute_metered[keccak256] |
11.1 ms | 46.8 ms | -76.27% |
| ❌ | WallTime | benchmark_execute_metered[fibonacci_iterative] |
14.3 ms | 49.1 ms | -70.84% |
Footnotes
-
No successful run was found on
develop-v1.6.0(3026a3f) during the generation of this report, so 95fdcd5 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩ -
36 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
## Overview of new design
Implemented a new design for constraining the SHA-2 family of hash
functions (specifically SHA-256, SHA-512, SHA-384). The new design adds
incremental hasher functionality, which means we can compute the hash of
a stream of bytes. More specifically, the new `Sha256`, `Sha512`, and
`Sha384` structs provided in the SHA-2 guest library provide the
`update(&[u8])` and `finalize() -> [u8; HASH_SIZE]` methods. We can
instantiate a hasher object, `let hasher = Sha256::new()` and then call
`hasher.update(data)` as many times as we want on it. The `data`
parameter can be a slice of any size. When we would like to retrieve the
hash, we can call `hasher.finalize()`.
### Main Idea
The `Sha256` struct in the SHA-2 guest library maintains an array of
bytes that serves as the internal state of the SHA-2 hashing algorithm.
This array is updated using a new opcode: `SHA256_UPDATE dst src input`
which takes in one block of input and pointers to the src/dst hasher
states (the guest library sets `src == dst` for updating the state
in-place). The `Sha256` struct will buffer up to one block of input, and
it will call `SHA2_UPDATE` when necessary to absorb the input into the
state.
The `Sha512` and `Sha384` structs are implemented similarly.
### Interoperability
The `Sha256`, `Sha512`, `Sha384` structs implement the `sha2::Digest`
trait, allowing them to be used as a drop-in replacement for the popular
`sha2` crate's `sha2::{Sha256, Sha512, Sha384}` hasher objects.
### Documentation
The OpenVM book, specs, and the crate docs have been updated.
Additionally, a brief justification of soundness for the main
constraints has been added.
### Tests
All the SHA-2 guest library integration tests and the SHA-2 circuit unit
tests pass on CI.
Both CPU and GPU trace generation is tested among these tests.
closes INT-4972 INT-5023 INT-5024 INT-5025 INT-5026
---------
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Jonathan Wang <[email protected]>
Co-authored-by: Ayush Shukla <[email protected]>
Co-authored-by: Alexander Golovanov <[email protected]>
Co-authored-by: Arayi Khalatyan <[email protected]>
Co-authored-by: Ayush Shukla <[email protected]>
Co-authored-by: Xinding Wei <[email protected]>
Co-authored-by: Lun-Kai Hsu <[email protected]>
Co-authored-by: Arayi <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: crStiv <[email protected]>
Co-authored-by: HrikB <[email protected]>
Co-authored-by: Yi Sun <[email protected]>
Co-authored-by: stephenh-axiom-xyz <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Teo Kitanovski <[email protected]>
Co-authored-by: Valery Cherepanov <[email protected]>
Co-authored-by: Peyman Jabbarzade <[email protected]>
Co-authored-by: Valery <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: William Lin <[email protected]>
A description of the `HintNonQr` and `HintSqrt` phantom instructions in the algebra extension was missing from instruction reference. This PR adds it.
4c32ab4 to
fa309ed
Compare
This comment has been minimized.
This comment has been minimized.
New Keccak with Xorin and Keccakf chip and opcode - [ ] I have performed a self-review of my own code - [ ] Add negative tests for xorin chip - [ ] Add negative tests for keccakf chip - [x] Add unit test to CI - [x] Add new guest code for E2E test to CI (the keccak example is updated, but I am thinking of adding another one) - [ ] Check with Ayush if I implemented the SizedRecord trait correctly - [ ] Rebase to include Zach's new Plonky3 update and update the keccakf trace gen to not have to transpose any more before giving it into the input - [ ] Maybe add comments to justify the correctness of the constrain_input_read and constraint_output_write function To reviewer: I will still have to complete the above checklist. But you can start reviewing if you would like to. Closes INT-5017, INT-5721, INT-5720, INT-5718, INT-5717, INT-5646, INT-5018
Resolves INT-5719
Resolves INT-5722
Closes INT-5779 --------- Co-authored-by: Ayush Shukla <[email protected]>
fa309ed to
0608cf3
Compare
This comment has been minimized.
This comment has been minimized.
Commit: 7571c7e |
3026a3f to
95b9cdf
Compare
Closes INT-5017