Skip to content

Commit 5ace424

Browse files
authored
recover_cells_and_kzg_proofs & matrix refactor (#3788)
* Recover cells and proofs & matrix clean up * Fix table of contents * Update reference tests generator * Update test format * Remove unused imports * Fix some minor nits * Rename MatrixEntry's proof to kzg_proof * Move RowIndex & ColumnIndex to das-core
1 parent 5633417 commit 5ace424

File tree

7 files changed

+255
-164
lines changed

7 files changed

+255
-164
lines changed

specs/_features/eip7594/das-core.md

Lines changed: 51 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
- [Custody setting](#custody-setting)
1818
- [Containers](#containers)
1919
- [`DataColumnSidecar`](#datacolumnsidecar)
20+
- [`MatrixEntry`](#matrixentry)
2021
- [Helper functions](#helper-functions)
2122
- [`get_custody_columns`](#get_custody_columns)
2223
- [`compute_extended_matrix`](#compute_extended_matrix)
@@ -53,12 +54,10 @@ The following values are (non-configurable) constants used throughout the specif
5354

5455
## Custom types
5556

56-
We define the following Python custom types for type hinting and readability:
57-
5857
| Name | SSZ equivalent | Description |
5958
| - | - | - |
60-
| `DataColumn` | `List[Cell, MAX_BLOB_COMMITMENTS_PER_BLOCK]` | The data of each column in EIP-7594 |
61-
| `ExtendedMatrix` | `List[Cell, MAX_CELLS_IN_EXTENDED_MATRIX]` | The full data of one-dimensional erasure coding extended blobs (in row major format). |
59+
| `RowIndex` | `uint64` | Row identifier in the matrix of cells |
60+
| `ColumnIndex` | `uint64` | Column identifier in the matrix of cells |
6261

6362
## Configuration
6463

@@ -79,7 +78,7 @@ We define the following Python custom types for type hinting and readability:
7978

8079
| Name | Value | Description |
8180
| - | - | - |
82-
| `SAMPLES_PER_SLOT` | `8` | Number of `DataColumn` random samples a node queries per slot |
81+
| `SAMPLES_PER_SLOT` | `8` | Number of `DataColumnSidecar` random samples a node queries per slot |
8382
| `CUSTODY_REQUIREMENT` | `1` | Minimum number of subnets an honest node custodies and serves samples from |
8483
| `TARGET_NUMBER_OF_PEERS` | `70` | Suggested minimum peer count |
8584

@@ -90,13 +89,23 @@ We define the following Python custom types for type hinting and readability:
9089
```python
9190
class DataColumnSidecar(Container):
9291
index: ColumnIndex # Index of column in extended matrix
93-
column: DataColumn
92+
column: List[Cell, MAX_BLOB_COMMITMENTS_PER_BLOCK]
9493
kzg_commitments: List[KZGCommitment, MAX_BLOB_COMMITMENTS_PER_BLOCK]
9594
kzg_proofs: List[KZGProof, MAX_BLOB_COMMITMENTS_PER_BLOCK]
9695
signed_block_header: SignedBeaconBlockHeader
9796
kzg_commitments_inclusion_proof: Vector[Bytes32, KZG_COMMITMENTS_INCLUSION_PROOF_DEPTH]
9897
```
9998

99+
#### `MatrixEntry`
100+
101+
```python
102+
class MatrixEntry(Container):
103+
cell: Cell
104+
kzg_proof: KZGProof
105+
column_index: ColumnIndex
106+
row_index: RowIndex
107+
```
108+
100109
### Helper functions
101110

102111
#### `get_custody_columns`
@@ -132,37 +141,52 @@ def get_custody_columns(node_id: NodeID, custody_subnet_count: uint64) -> Sequen
132141
#### `compute_extended_matrix`
133142

134143
```python
135-
def compute_extended_matrix(blobs: Sequence[Blob]) -> ExtendedMatrix:
144+
def compute_extended_matrix(blobs: Sequence[Blob]) -> List[MatrixEntry, MAX_CELLS_IN_EXTENDED_MATRIX]:
136145
"""
137146
Return the full ``ExtendedMatrix``.
138147
139148
This helper demonstrates the relationship between blobs and ``ExtendedMatrix``.
140149
The data structure for storing cells is implementation-dependent.
141150
"""
142151
extended_matrix = []
143-
for blob in blobs:
144-
extended_matrix.extend(compute_cells(blob))
145-
return ExtendedMatrix(extended_matrix)
152+
for blob_index, blob in enumerate(blobs):
153+
cells, proofs = compute_cells_and_kzg_proofs(blob)
154+
for cell_id, (cell, proof) in enumerate(zip(cells, proofs)):
155+
extended_matrix.append(MatrixEntry(
156+
cell=cell,
157+
kzg_proof=proof,
158+
row_index=blob_index,
159+
column_index=cell_id,
160+
))
161+
return extended_matrix
146162
```
147163

148164
#### `recover_matrix`
149165

150166
```python
151-
def recover_matrix(cells_dict: Dict[Tuple[BlobIndex, CellID], Cell], blob_count: uint64) -> ExtendedMatrix:
167+
def recover_matrix(partial_matrix: Sequence[MatrixEntry],
168+
blob_count: uint64) -> List[MatrixEntry, MAX_CELLS_IN_EXTENDED_MATRIX]:
152169
"""
153-
Return the recovered ``ExtendedMatrix``.
170+
Return the recovered extended matrix.
154171
155-
This helper demonstrates how to apply ``recover_all_cells``.
172+
This helper demonstrates how to apply ``recover_cells_and_kzg_proofs``.
156173
The data structure for storing cells is implementation-dependent.
157174
"""
158-
extended_matrix: List[Cell] = []
175+
extended_matrix = []
159176
for blob_index in range(blob_count):
160-
cell_ids = [cell_id for b_index, cell_id in cells_dict.keys() if b_index == blob_index]
161-
cells = [cells_dict[(BlobIndex(blob_index), cell_id)] for cell_id in cell_ids]
162-
163-
all_cells_for_row = recover_all_cells(cell_ids, cells)
164-
extended_matrix.extend(all_cells_for_row)
165-
return ExtendedMatrix(extended_matrix)
177+
cell_ids = [e.column_index for e in partial_matrix if e.row_index == blob_index]
178+
cells = [e.cell for e in partial_matrix if e.row_index == blob_index]
179+
proofs = [e.kzg_proof for e in partial_matrix if e.row_index == blob_index]
180+
181+
recovered_cells, recovered_proofs = recover_cells_and_kzg_proofs(cell_ids, cells, proofs)
182+
for cell_id, (cell, proof) in enumerate(zip(recovered_cells, recovered_proofs)):
183+
extended_matrix.append(MatrixEntry(
184+
cell=cell,
185+
kzg_proof=proof,
186+
row_index=blob_index,
187+
column_index=cell_id,
188+
))
189+
return extended_matrix
166190
```
167191

168192
#### `get_data_column_sidecars`
@@ -182,15 +206,15 @@ def get_data_column_sidecars(signed_block: SignedBeaconBlock,
182206
proofs = [cells_and_proofs[i][1] for i in range(blob_count)]
183207
sidecars = []
184208
for column_index in range(NUMBER_OF_COLUMNS):
185-
column = DataColumn([cells[row_index][column_index]
186-
for row_index in range(blob_count)])
187-
kzg_proof_of_column = [proofs[row_index][column_index]
188-
for row_index in range(blob_count)]
209+
column_cells = [cells[row_index][column_index]
210+
for row_index in range(blob_count)]
211+
column_proofs = [proofs[row_index][column_index]
212+
for row_index in range(blob_count)]
189213
sidecars.append(DataColumnSidecar(
190214
index=column_index,
191-
column=column,
215+
column=column_cells,
192216
kzg_commitments=block.body.blob_kzg_commitments,
193-
kzg_proofs=kzg_proof_of_column,
217+
kzg_proofs=column_proofs,
194218
signed_block_header=signed_block_header,
195219
kzg_commitments_inclusion_proof=kzg_commitments_inclusion_proof,
196220
))
@@ -283,7 +307,7 @@ Such trailing techniques and their analysis will be valuable for any DAS constru
283307

284308
### Row (blob) custody
285309

286-
In the one-dimension construction, a node samples the peers by requesting the whole `DataColumn`. In reconstruction, a node can reconstruct all the blobs by 50% of the columns. Note that nodes can still download the row via `blob_sidecar_{subnet_id}` subnets.
310+
In the one-dimension construction, a node samples the peers by requesting the whole `DataColumnSidecar`. In reconstruction, a node can reconstruct all the blobs by 50% of the columns. Note that nodes can still download the row via `blob_sidecar_{subnet_id}` subnets.
287311

288312
The potential benefits of having row custody could include:
289313

specs/_features/eip7594/polynomial-commitments-sampling.md

Lines changed: 33 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# EIP-7594 -- Polynomial Commitments
1+
# EIP-7594 -- Polynomial Commitments Sampling
22

33
## Table of contents
44

@@ -46,7 +46,7 @@
4646
- [`construct_vanishing_polynomial`](#construct_vanishing_polynomial)
4747
- [`recover_shifted_data`](#recover_shifted_data)
4848
- [`recover_original_data`](#recover_original_data)
49-
- [`recover_all_cells`](#recover_all_cells)
49+
- [`recover_cells_and_kzg_proofs`](#recover_cells_and_kzg_proofs)
5050

5151
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
5252
<!-- /TOC -->
@@ -67,9 +67,7 @@ Public functions MUST accept raw bytes as input and perform the required cryptog
6767
| `Coset` | `Vector[BLSFieldElement, FIELD_ELEMENTS_PER_CELL]` | The evaluation domain of a cell |
6868
| `CosetEvals` | `Vector[BLSFieldElement, FIELD_ELEMENTS_PER_CELL]` | The internal representation of a cell (the evaluations over its Coset) |
6969
| `Cell` | `ByteVector[BYTES_PER_FIELD_ELEMENT * FIELD_ELEMENTS_PER_CELL]` | The unit of blob data that can come with its own KZG proof |
70-
| `CellID` | `uint64` | Cell identifier |
71-
| `RowIndex` | `uint64` | Row identifier |
72-
| `ColumnIndex` | `uint64` | Column identifier |
70+
| `CellID` | `uint64` | Validation: `x < CELLS_PER_EXT_BLOB` |
7371

7472
## Constants
7573

@@ -660,32 +658,39 @@ def recover_original_data(eval_shifted_extended_evaluation: Sequence[BLSFieldEle
660658
return reconstructed_data
661659
```
662660

663-
### `recover_all_cells`
661+
### `recover_cells_and_kzg_proofs`
664662

665663
```python
666-
def recover_all_cells(cell_ids: Sequence[CellID], cells: Sequence[Cell]) -> Sequence[Cell]:
664+
def recover_cells_and_kzg_proofs(cell_ids: Sequence[CellID],
665+
cells: Sequence[Cell],
666+
proofs_bytes: Sequence[Bytes48]) -> Tuple[
667+
Vector[Cell, CELLS_PER_EXT_BLOB],
668+
Vector[KZGProof, CELLS_PER_EXT_BLOB]]:
667669
"""
668-
Recover all of the cells in the extended blob from FIELD_ELEMENTS_PER_EXT_BLOB evaluations,
669-
half of which can be missing.
670-
This algorithm uses FFTs to recover cells faster than using Lagrange implementation, as can be seen here:
670+
Given at least 50% of cells/proofs for a blob, recover all the cells/proofs.
671+
This algorithm uses FFTs to recover cells faster than using Lagrange
672+
implementation, as can be seen here:
671673
https://ethresear.ch/t/reed-solomon-erasure-code-recovery-in-n-log-2-n-time-with-ffts/3039
672674
673675
A faster version thanks to Qi Zhou can be found here:
674676
https://github.com/ethereum/research/blob/51b530a53bd4147d123ab3e390a9d08605c2cdb8/polynomial_reconstruction/polynomial_reconstruction_danksharding.py
675677
676678
Public method.
677679
"""
678-
assert len(cell_ids) == len(cells)
680+
assert len(cell_ids) == len(cells) == len(proofs_bytes)
679681
# Check we have enough cells to be able to perform the reconstruction
680682
assert CELLS_PER_EXT_BLOB / 2 <= len(cell_ids) <= CELLS_PER_EXT_BLOB
681683
# Check for duplicates
682684
assert len(cell_ids) == len(set(cell_ids))
683-
# Check that each cell is the correct length
684-
for cell in cells:
685-
assert len(cell) == BYTES_PER_CELL
686685
# Check that the cell ids are within bounds
687686
for cell_id in cell_ids:
688687
assert cell_id < CELLS_PER_EXT_BLOB
688+
# Check that each cell is the correct length
689+
for cell in cells:
690+
assert len(cell) == BYTES_PER_CELL
691+
# Check that each proof is the correct length
692+
for proof_bytes in proofs_bytes:
693+
assert len(proof_bytes) == BYTES_PER_PROOF
689694

690695
# Get the extended domain
691696
roots_of_unity_extended = compute_roots_of_unity(FIELD_ELEMENTS_PER_EXT_BLOB)
@@ -716,9 +721,21 @@ def recover_all_cells(cell_ids: Sequence[CellID], cells: Sequence[Cell]) -> Sequ
716721
end = (cell_id + 1) * FIELD_ELEMENTS_PER_CELL
717722
assert reconstructed_data[start:end] == coset_evals
718723

719-
reconstructed_data_as_cells = [
724+
recovered_cells = [
720725
coset_evals_to_cell(reconstructed_data[i * FIELD_ELEMENTS_PER_CELL:(i + 1) * FIELD_ELEMENTS_PER_CELL])
721726
for i in range(CELLS_PER_EXT_BLOB)]
727+
728+
polynomial_eval = reconstructed_data[:FIELD_ELEMENTS_PER_BLOB]
729+
polynomial_coeff = polynomial_eval_to_coeff(polynomial_eval)
730+
recovered_proofs = [None] * CELLS_PER_EXT_BLOB
731+
for i, cell_id in enumerate(cell_ids):
732+
recovered_proofs[cell_id] = bytes_to_kzg_proof(proofs_bytes[i])
733+
for i in range(CELLS_PER_EXT_BLOB):
734+
if recovered_proofs[i] is None:
735+
coset = coset_for_cell(CellID(i))
736+
proof, ys = compute_kzg_proof_multi_impl(polynomial_coeff, coset)
737+
assert coset_evals_to_cell(ys) == recovered_cells[i]
738+
recovered_proofs[i] = proof
722739

723-
return reconstructed_data_as_cells
740+
return recovered_cells, recovered_proofs
724741
```

tests/core/pyspec/eth2spec/test/eip7594/unittests/das/test_das.py

Lines changed: 24 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,11 @@
99
)
1010

1111

12+
def chunks(lst, n):
13+
"""Helper that splits a list into N sized chunks."""
14+
return [lst[i:i + n] for i in range(0, len(lst), n)]
15+
16+
1217
@with_eip7594_and_later
1318
@spec_test
1419
@single_phase
@@ -20,15 +25,15 @@ def test_compute_extended_matrix(spec):
2025
extended_matrix = spec.compute_extended_matrix(input_blobs)
2126
assert len(extended_matrix) == spec.CELLS_PER_EXT_BLOB * blob_count
2227

23-
rows = [extended_matrix[i:(i + spec.CELLS_PER_EXT_BLOB)]
24-
for i in range(0, len(extended_matrix), spec.CELLS_PER_EXT_BLOB)]
28+
rows = chunks(extended_matrix, spec.CELLS_PER_EXT_BLOB)
2529
assert len(rows) == blob_count
26-
assert len(rows[0]) == spec.CELLS_PER_EXT_BLOB
30+
for row in rows:
31+
assert len(row) == spec.CELLS_PER_EXT_BLOB
2732

2833
for blob_index, row in enumerate(rows):
2934
extended_blob = []
30-
for cell in row:
31-
extended_blob.extend(spec.cell_to_coset_evals(cell))
35+
for entry in row:
36+
extended_blob.extend(spec.cell_to_coset_evals(entry.cell))
3237
blob_part = extended_blob[0:len(extended_blob) // 2]
3338
blob = b''.join([spec.bls_field_to_bytes(x) for x in blob_part])
3439
assert blob == input_blobs[blob_index]
@@ -43,27 +48,19 @@ def test_recover_matrix(spec):
4348
# Number of samples we will be recovering from
4449
N_SAMPLES = spec.CELLS_PER_EXT_BLOB // 2
4550

51+
# Compute an extended matrix with two blobs
4652
blob_count = 2
47-
cells_dict = {}
48-
original_cells = []
49-
for blob_index in range(blob_count):
50-
# Get the data we will be working with
51-
blob = get_sample_blob(spec, rng=rng)
52-
# Extend data with Reed-Solomon and split the extended data in cells
53-
cells = spec.compute_cells(blob)
54-
original_cells.append(cells)
55-
cell_ids = []
56-
# First figure out just the indices of the cells
57-
for _ in range(N_SAMPLES):
58-
cell_id = rng.randint(0, spec.CELLS_PER_EXT_BLOB - 1)
59-
while cell_id in cell_ids:
60-
cell_id = rng.randint(0, spec.CELLS_PER_EXT_BLOB - 1)
61-
cell_ids.append(cell_id)
62-
cell = cells[cell_id]
63-
cells_dict[(blob_index, cell_id)] = cell
64-
assert len(cell_ids) == N_SAMPLES
53+
blobs = [get_sample_blob(spec, rng=rng) for _ in range(blob_count)]
54+
extended_matrix = spec.compute_extended_matrix(blobs)
55+
56+
# Construct a matrix with some entries missing
57+
partial_matrix = []
58+
for blob_entries in chunks(extended_matrix, spec.CELLS_PER_EXT_BLOB):
59+
rng.shuffle(blob_entries)
60+
partial_matrix.extend(blob_entries[:N_SAMPLES])
61+
62+
# Given the partial matrix, recover the missing entries
63+
recovered_matrix = spec.recover_matrix(partial_matrix, blob_count)
6564

66-
# Recover the matrix
67-
recovered_matrix = spec.recover_matrix(cells_dict, blob_count)
68-
flatten_original_cells = [cell for cells in original_cells for cell in cells]
69-
assert recovered_matrix == flatten_original_cells
65+
# Ensure that the recovered matrix matches the original matrix
66+
assert recovered_matrix == extended_matrix

tests/core/pyspec/eth2spec/test/eip7594/unittests/polynomial_commitments/test_polynomial_commitments.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ def test_verify_cell_kzg_proof_batch(spec):
6464
@with_eip7594_and_later
6565
@spec_test
6666
@single_phase
67-
def test_recover_all_cells(spec):
67+
def test_recover_cells_and_kzg_proofs(spec):
6868
rng = random.Random(5566)
6969

7070
# Number of samples we will be recovering from
@@ -74,7 +74,7 @@ def test_recover_all_cells(spec):
7474
blob = get_sample_blob(spec)
7575

7676
# Extend data with Reed-Solomon and split the extended data in cells
77-
cells = spec.compute_cells(blob)
77+
cells, proofs = spec.compute_cells_and_kzg_proofs(blob)
7878

7979
# Compute the cells we will be recovering from
8080
cell_ids = []
@@ -84,19 +84,21 @@ def test_recover_all_cells(spec):
8484
while j in cell_ids:
8585
j = rng.randint(0, spec.CELLS_PER_EXT_BLOB - 1)
8686
cell_ids.append(j)
87-
# Now the cells themselves
87+
# Now the cells/proofs themselves
8888
known_cells = [cells[cell_id] for cell_id in cell_ids]
89+
known_proofs = [proofs[cell_id] for cell_id in cell_ids]
8990

90-
# Recover all of the cells
91-
recovered_cells = spec.recover_all_cells(cell_ids, known_cells)
91+
# Recover the missing cells and proofs
92+
recovered_cells, recovered_proofs = spec.recover_cells_and_kzg_proofs(cell_ids, known_cells, known_proofs)
9293
recovered_data = [x for xs in recovered_cells for x in xs]
9394

9495
# Check that the original data match the non-extended portion of the recovered data
9596
blob_byte_array = [b for b in blob]
9697
assert blob_byte_array == recovered_data[:len(recovered_data) // 2]
9798

98-
# Check that the recovered cells match the original cells
99+
# Check that the recovered cells/proofs match the original cells/proofs
99100
assert cells == recovered_cells
101+
assert proofs == recovered_proofs
100102

101103

102104
@with_eip7594_and_later

tests/formats/kzg_7594/recover_all_cells.md

Lines changed: 0 additions & 23 deletions
This file was deleted.

0 commit comments

Comments
 (0)