Skip to content

Commit 6a0a4a2

Browse files
Update SHA-256 VM chip to support SHA-512 and SHA-384
1 parent 984952a commit 6a0a4a2

File tree

18 files changed

+2561
-0
lines changed

18 files changed

+2561
-0
lines changed

extensions/sha2/circuit/Cargo.toml

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
[package]
2+
name = "openvm-sha2-circuit"
3+
version.workspace = true
4+
authors.workspace = true
5+
edition.workspace = true
6+
description = "OpenVM circuit extension for SHA-2"
7+
8+
[dependencies]
9+
openvm-stark-backend = { workspace = true }
10+
openvm-stark-sdk = { workspace = true }
11+
openvm-circuit-primitives = { workspace = true }
12+
openvm-circuit-primitives-derive = { workspace = true }
13+
openvm-circuit-derive = { workspace = true }
14+
openvm-circuit = { workspace = true }
15+
openvm-instructions = { workspace = true }
16+
openvm-sha2-transpiler = { workspace = true }
17+
openvm-rv32im-circuit = { workspace = true }
18+
openvm-sha2-air = { workspace = true }
19+
20+
derive-new.workspace = true
21+
derive_more = { workspace = true, features = ["from"] }
22+
rand.workspace = true
23+
serde.workspace = true
24+
sha2 = { version = "0.10", default-features = false }
25+
ndarray = { workspace = true, default-features = false }
26+
27+
[dev-dependencies]
28+
openvm-stark-sdk = { workspace = true }
29+
openvm-circuit = { workspace = true, features = ["test-utils"] }
30+
31+
[features]
32+
default = ["parallel", "mimalloc"]
33+
parallel = ["openvm-circuit/parallel"]
34+
test-utils = ["openvm-circuit/test-utils"]
35+
# performance features:
36+
mimalloc = ["openvm-circuit/mimalloc"]
37+
jemalloc = ["openvm-circuit/jemalloc"]
38+
jemalloc-prof = ["openvm-circuit/jemalloc-prof"]
39+
nightly-features = ["openvm-circuit/nightly-features"]
40+
41+
[package.metadata.cargo-shear]
42+
ignored = ["ndarray"]

extensions/sha2/circuit/README.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# SHA-2 VM Extension
2+
3+
This crate contains circuits for the SHA-2 family of hash functions.
4+
We support SHA-256, SHA-512, and SHA-384.
5+
6+
## SHA-2 Algorithms Summary
7+
8+
The SHA-256, SHA-512, and SHA-384 algorithms are similar in structure.
9+
We will first describe the SHA-256 algorithm, and then describe the differences between the three algorithms.
10+
11+
See the [FIPS standard](https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf) for reference. In particular, sections 6.2, 6.4, and 6.5.
12+
13+
In short the SHA-256 algorithm works as follows.
14+
1. Pad the message to 512 bits and split it into 512-bit 'blocks'.
15+
2. Initialize a hash state consisting of eight 32-bit words to a specific constant value.
16+
3. For each block,
17+
1. split the message into 16 32-bit words and produce 48 more words based on them. The 16 message words together with the 48 additional words are called the 'message schedule'.
18+
2. apply a scrambling function 64 times to the hash state to update it based on the message schedule. We call each update a 'round'.
19+
3. add the previous block's final hash state to the current hash state (modulo $2^{32}$).
20+
4. The output is the final hash state
21+
22+
The differences with the SHA-512 algorithm are that:
23+
- SHA-512 uses 64-bit words, 1024-bit blocks, performs 80 rounds, and produces a 512-bit output.
24+
- all the arithmetic is done modulo $2^{64}$.
25+
- the initial hash state is different.
26+
27+
The SHA-384 algorithm is a truncation of the SHA-512 output to 384 bits, and the only difference is that the initial hash state is different.
28+
29+
## Design Overview
30+
31+
We reuse the same AIR code to produce circuits for all three algorithms.
32+
To achieve this, we parameterize the AIR by constants (such as the word size, number of rounds, and block size) that are specific to each algorithm.
33+
34+
This chip produces an AIR that consists of $R+1$ rows for each block of the message, and no more rows
35+
(for SHA-256, $R = 16$ and for SHA-512 and SHA-384, $R = 20$).
36+
The first $R$ rows of each block are called 'round rows', and each of them constrains four rounds of the hash algorithm.
37+
Each row constrains updates to the working variables on each round, and also constrains the message schedule words based on previous rounds.
38+
The final row of each block is called a 'digest row' and it produces a final hash for the block, computed as the sum of the working variables and the previous block's final hash.
39+
40+
Note that this chip only supports messages of length less than $2^{29}$ bytes.
41+
42+
### Storing working variables
43+
44+
One optimization is that we only keep track of the `a` and `e` working variables.
45+
It turns out that if we have their values over four consecutive rounds, we can reconstruct all eight variables at the end of the four rounds.
46+
This is because there is overlap between the values of the working variables in adjacent rounds.
47+
If the state is visualized as an array, `s_0 = [a, b, c, d, e, f, g, h]`, then the new state, `s_1`, after one round is produced by a right-shift and an addition.
48+
More formally,
49+
```
50+
s_1 = (s_0 >> 1) + [T_1 + T_2, 0, 0, 0, T_1, 0, 0, 0]
51+
= [0, a, b, c, d, e, f, g] + [T_1 + T_2, 0, 0, 0, T_1, 0, 0, 0]
52+
= [T_1 + T_2, a, b, c, d + T_1, e, f, g]
53+
```
54+
where `T_1` and `T_2` are certain functions of the working variables and message data (see the FIPS spec).
55+
So if `a_i` and `e_i` denote the values of `a` and `e` after the `i`th round, for `0 <= i < 4`, then the state `s_3` after the fourth round can be written as `s_3 = [a_3, a_2, a_1, a_0, e_3, e_2, e_1, e_0]`.
56+
57+
### Message schedule constraints
58+
59+
The algorithm for computing the message schedule involves message schedule words from 16 rounds ago.
60+
Since we can only constrain two rows at a time, we cannot access data from more than four rounds ago for the first round in each row.
61+
So, we maintain intermediate values that we call `intermed_4`, `intermed_8` and `intermed_12`, where `intermed_i = w_i + sig_0(w_{i+1})` where `w_i` is the value of `w` from `i` rounds ago and `sig_0` denotes the `sigma_0` function from the FIPS spec.
62+
Since we can reliably constrain values from four rounds ago, we can build up `intermed_16` from these values, which is needed for computing the message schedule.
63+
64+
### Note about `is_last_block`
65+
66+
The last block of every message should have the `is_last_block` flag set to `1`.
67+
Note that `is_last_block` is not constrained to be true for the last block of every message, instead it *defines* what the last block of a message is.
68+
For instance, if we produce a trace with 10 blocks and only the last block has `is_last_block = 1` then the constraints will interpret it as a single message of length 10 blocks.
69+
If, however, we set `is_last_block` to true for the 6th block, the trace will be interpreted as hashing two messages, each of length 5 blocks.
70+
71+
Note that we do constrain, however, that the very last block of the trace has `is_last_block = 1`.
72+
73+
### Dummy values
74+
75+
Some constraints have degree three, and so we cannot restrict them to particular rows due to the limitation of the maximum constraint degree.
76+
We must enforce them on all rows, and in order to ensure they hold on the remaining rows we must fill in some cells with appropriate dummy values.
77+
We use this trick in several places in this chip.
78+
79+
### Block index counter variables
80+
81+
There are two "block index" counter variables in each row named `global_block_idx` and `local_block_idx`.
82+
Both of these variables take on the same value on all $R+1$ rows in a block.
83+
84+
The `global_block_idx` is the index of the block in the entire trace.
85+
The very first block in the trace will have `global_block_idx = 1` on each row and the counter will increment by 1 between blocks.
86+
The padding rows will all have `global_block_idx = 0`.
87+
The `global_block_idx` is used in interaction constraints to constrain the value of `hash` between blocks.
88+
89+
The `local_block_idx` is the index of the block in the current message.
90+
It starts at 0 for the first block of each message and increments by 1 for each block.
91+
The `local_block_idx` is reset to 0 after each message.
92+
The padding rows will all have `local_block_idx = 0`.
93+
The `local_block_idx` is used to calculate the length of the message processed so far when the first padding row is encountered.
94+
95+
### VM air vs SubAir
96+
97+
The SHA-2 VM extension chip uses the `Sha2Air` SubAir to help constrain the appropriate SHA-2 hash algorithm.
98+
The SubAir is also parameterized by the specific SHA-2 variant's constants.
99+
The VM extension AIR constrains the correctness of the message padding, while the SubAir adds all other constraints related to the hash algorithm.
100+
The VM extension AIR also constrains memory reads and writes.
101+
102+
### A gotcha about padding rows
103+
104+
There are two senses of the word padding used in the context of this chip and this can be confusing.
105+
First, we use padding to refer to the extra bits added to the message that is input to the hash algorithm in order to make the input's length a multiple of the block size.
106+
So, we may use the term 'padding rows' to refer to round rows that correspond to the padded bits of a message (as in `Sha2VmAir::eval_padding_row`).
107+
Second, the dummy rows that are added to the trace to make the trace height a power of 2 are also called padding rows (see the `is_padding_row` flag).
108+
In the SubAir, padding row probably means dummy row.
109+
In the VM air, it probably refers to the message padding.
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
use derive_more::derive::From;
2+
use openvm_circuit::{
3+
arch::{SystemConfig, VmExtension, VmInventory, VmInventoryBuilder, VmInventoryError},
4+
system::phantom::PhantomChip,
5+
};
6+
use openvm_circuit_derive::{AnyEnum, InstructionExecutor, VmConfig};
7+
use openvm_circuit_primitives::bitwise_op_lookup::{
8+
BitwiseOperationLookupBus, SharedBitwiseOperationLookupChip,
9+
};
10+
use openvm_circuit_primitives_derive::{Chip, ChipUsageGetter};
11+
use openvm_instructions::*;
12+
use openvm_rv32im_circuit::{
13+
Rv32I, Rv32IExecutor, Rv32IPeriphery, Rv32Io, Rv32IoExecutor, Rv32IoPeriphery, Rv32M,
14+
Rv32MExecutor, Rv32MPeriphery,
15+
};
16+
use openvm_sha2_air::{Sha256Config, Sha384Config, Sha512Config};
17+
use openvm_sha2_transpiler::Rv32Sha2Opcode;
18+
use openvm_stark_backend::p3_field::PrimeField32;
19+
use serde::{Deserialize, Serialize};
20+
21+
use crate::*;
22+
23+
#[derive(Clone, Debug, VmConfig, derive_new::new, Serialize, Deserialize)]
24+
pub struct Sha2Rv32Config {
25+
#[system]
26+
pub system: SystemConfig,
27+
#[extension]
28+
pub rv32i: Rv32I,
29+
#[extension]
30+
pub rv32m: Rv32M,
31+
#[extension]
32+
pub io: Rv32Io,
33+
#[extension]
34+
pub sha2: Sha2,
35+
}
36+
37+
impl Default for Sha2Rv32Config {
38+
fn default() -> Self {
39+
Self {
40+
system: SystemConfig::default().with_continuations(),
41+
rv32i: Rv32I,
42+
rv32m: Rv32M::default(),
43+
io: Rv32Io,
44+
sha2: Sha2,
45+
}
46+
}
47+
}
48+
49+
#[derive(Clone, Copy, Debug, Default, Serialize, Deserialize)]
50+
pub struct Sha2;
51+
52+
#[derive(ChipUsageGetter, Chip, InstructionExecutor, From, AnyEnum)]
53+
pub enum Sha2Executor<F: PrimeField32> {
54+
Sha256(Sha2VmChip<F, Sha256Config>),
55+
Sha512(Sha2VmChip<F, Sha512Config>),
56+
Sha384(Sha2VmChip<F, Sha384Config>),
57+
}
58+
59+
#[derive(From, ChipUsageGetter, Chip, AnyEnum)]
60+
pub enum Sha2Periphery<F: PrimeField32> {
61+
BitwiseOperationLookup(SharedBitwiseOperationLookupChip<8>),
62+
Phantom(PhantomChip<F>),
63+
}
64+
65+
impl<F: PrimeField32> VmExtension<F> for Sha2 {
66+
type Executor = Sha2Executor<F>;
67+
type Periphery = Sha2Periphery<F>;
68+
69+
fn build(
70+
&self,
71+
builder: &mut VmInventoryBuilder<F>,
72+
) -> Result<VmInventory<Self::Executor, Self::Periphery>, VmInventoryError> {
73+
let mut inventory = VmInventory::new();
74+
let bitwise_lu_chip = if let Some(&chip) = builder
75+
.find_chip::<SharedBitwiseOperationLookupChip<8>>()
76+
.first()
77+
{
78+
chip.clone()
79+
} else {
80+
let bitwise_lu_bus = BitwiseOperationLookupBus::new(builder.new_bus_idx());
81+
let chip = SharedBitwiseOperationLookupChip::new(bitwise_lu_bus);
82+
inventory.add_periphery_chip(chip.clone());
83+
chip
84+
};
85+
86+
let sha256_chip = Sha2VmChip::<F, Sha256Config>::new(
87+
builder.system_port(),
88+
builder.system_config().memory_config.pointer_max_bits,
89+
bitwise_lu_chip.clone(),
90+
builder.new_bus_idx(),
91+
builder.system_base().offline_memory(),
92+
);
93+
inventory.add_executor(sha256_chip, vec![Rv32Sha2Opcode::SHA256.global_opcode()])?;
94+
95+
let sha512_chip = Sha2VmChip::<F, Sha512Config>::new(
96+
builder.system_port(),
97+
builder.system_config().memory_config.pointer_max_bits,
98+
bitwise_lu_chip.clone(),
99+
builder.new_bus_idx(),
100+
builder.system_base().offline_memory(),
101+
);
102+
inventory.add_executor(sha512_chip, vec![Rv32Sha2Opcode::SHA512.global_opcode()])?;
103+
104+
let sha384_chip = Sha2VmChip::<F, Sha384Config>::new(
105+
builder.system_port(),
106+
builder.system_config().memory_config.pointer_max_bits,
107+
bitwise_lu_chip,
108+
builder.new_bus_idx(),
109+
builder.system_base().offline_memory(),
110+
);
111+
inventory.add_executor(sha384_chip, vec![Rv32Sha2Opcode::SHA384.global_opcode()])?;
112+
113+
Ok(inventory)
114+
}
115+
}

extensions/sha2/circuit/src/lib.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
mod sha2_chip;
2+
pub use sha2_chip::*;
3+
4+
mod extension;
5+
pub use extension::*;

0 commit comments

Comments
 (0)