Skip to content

Commit 0a49572

Browse files
authored
Merge pull request #3782 from cskiraly/peer-das-sampling
PeerDAS sampling clarifications
2 parents 258c2c9 + 1ad381d commit 0a49572

File tree

2 files changed

+147
-1
lines changed

2 files changed

+147
-1
lines changed

specs/_features/eip7594/das-core.md

Lines changed: 68 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
- [`compute_extended_matrix`](#compute_extended_matrix)
2424
- [`recover_matrix`](#recover_matrix)
2525
- [`get_data_column_sidecars`](#get_data_column_sidecars)
26+
- [`get_extended_sample_count`](#get_extended_sample_count)
2627
- [Custody](#custody)
2728
- [Custody requirement](#custody-requirement)
2829
- [Public, deterministic selection](#public-deterministic-selection)
@@ -31,6 +32,8 @@
3132
- [Column gossip](#column-gossip)
3233
- [Parameters](#parameters)
3334
- [Peer sampling](#peer-sampling)
35+
- [Sample selection](#sample-selection)
36+
- [Sample queries](#sample-queries)
3437
- [Peer scoring](#peer-scoring)
3538
- [Reconstruction and cross-seeding](#reconstruction-and-cross-seeding)
3639
- [DAS providers](#das-providers)
@@ -221,6 +224,48 @@ def get_data_column_sidecars(signed_block: SignedBeaconBlock,
221224
return sidecars
222225
```
223226

227+
#### `get_extended_sample_count`
228+
229+
```python
230+
def get_extended_sample_count(allowed_failures: uint64) -> uint64:
231+
assert 0 <= allowed_failures <= NUMBER_OF_COLUMNS // 2
232+
"""
233+
Return the sample count if allowing failures.
234+
235+
This helper demonstrates how to calculate the number of columns to query per slot when
236+
allowing given number of failures, assuming uniform random selection without replacement.
237+
Nested functions are direct replacements of Python library functions math.comb and
238+
scipy.stats.hypergeom.cdf, with the same signatures.
239+
"""
240+
241+
def math_comb(n: int, k: int) -> int:
242+
if not 0 <= k <= n:
243+
return 0
244+
r = 1
245+
for i in range(min(k, n - k)):
246+
r = r * (n - i) // (i + 1)
247+
return r
248+
249+
def hypergeom_cdf(k: uint64, M: uint64, n: uint64, N: uint64) -> float:
250+
# NOTE: It contains float-point computations.
251+
# Convert uint64 to Python integers before computations.
252+
k = int(k)
253+
M = int(M)
254+
n = int(n)
255+
N = int(N)
256+
return sum([math_comb(n, i) * math_comb(M - n, N - i) / math_comb(M, N)
257+
for i in range(k + 1)])
258+
259+
worst_case_missing = NUMBER_OF_COLUMNS // 2 + 1
260+
false_positive_threshold = hypergeom_cdf(0, NUMBER_OF_COLUMNS,
261+
worst_case_missing, SAMPLES_PER_SLOT)
262+
for sample_count in range(SAMPLES_PER_SLOT, NUMBER_OF_COLUMNS + 1):
263+
if hypergeom_cdf(allowed_failures, NUMBER_OF_COLUMNS,
264+
worst_case_missing, sample_count) <= false_positive_threshold:
265+
break
266+
return sample_count
267+
```
268+
224269
## Custody
225270

226271
### Custody requirement
@@ -263,7 +308,29 @@ Verifiable samples from their respective column are distributed on the assigned
263308

264309
## Peer sampling
265310

266-
A node SHOULD maintain a diverse set of peers for each column and each slot by verifying responsiveness to sample queries. At each slot, a node makes `SAMPLES_PER_SLOT` queries for samples from their peers via `DataColumnSidecarsByRoot` request. A node utilizes `get_custody_columns` helper to determine which peer(s) to request from. If a node has enough good/honest peers across all rows and columns, this has a high chance of success.
311+
### Sample selection
312+
313+
At each slot, a node SHOULD select at least `SAMPLES_PER_SLOT` column IDs for sampling. It is recommended to use uniform random selection without replacement based on local randomness. Sampling is considered successful if the node manages to retrieve all selected columns.
314+
315+
Alternatively, a node MAY use a method that selects more than `SAMPLES_PER_SLOT` columns while allowing some missing, respecting the same target false positive threshold (the probability of successful sampling of an unavailable block) as dictated by the `SAMPLES_PER_SLOT` parameter. If using uniform random selection without replacement, a node can use the `get_extended_sample_count(allowed_failures) -> sample_count` helper function to determine the sample count (number of unique column IDs) for any selected number of allowed failures. Sampling is then considered successful if any `sample_count - allowed_failures` columns are retrieved successfully.
316+
317+
For reference, the table below shows the number of samples and the number of allowed missing columns assuming `NUMBER_OF_COLUMNS = 128` and `SAMPLES_PER_SLOT = 16`.
318+
319+
| Allowed missing | 0| 1| 2| 3| 4| 5| 6| 7| 8|
320+
|-----------------|--|--|--|--|--|--|--|--|--|
321+
| Sample count |16|20|24|27|29|32|35|37|40|
322+
323+
### Sample queries
324+
325+
A node SHOULD maintain a diverse set of peers for each column and each slot by verifying responsiveness to sample queries.
326+
327+
A node SHOULD query for samples from selected peers via `DataColumnSidecarsByRoot` request. A node utilizes `get_custody_columns` helper to determine which peer(s) it could request from, identifying a list of candidate peers for each selected column.
328+
329+
If more than one candidate peer is found for a given column, a node SHOULD randomize its peer selection to distribute sample query load in the network. Nodes MAY use peer scoring to tune this selection (for example, by using weighted selection or by using a cut-off threshold). If possible, it is also recommended to avoid requesting many columns from the same peer in order to avoid relying on and exposing the sample selection to a single peer.
330+
331+
If a node already has a column because of custody, it is not required to send out queries for that column.
332+
333+
If a node has enough good/honest peers across all columns, and the data is being made available, the above procedure has a high chance of success.
267334

268335
## Peer scoring
269336

tests/core/pyspec/eth2spec/test/eip7594/unittests/das/test_das.py

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
import random
22
from eth2spec.test.context import (
3+
expect_assertion_error,
34
spec_test,
45
single_phase,
6+
with_config_overrides,
57
with_eip7594_and_later,
68
)
79
from eth2spec.test.helpers.sharding import (
@@ -64,3 +66,80 @@ def test_recover_matrix(spec):
6466

6567
# Ensure that the recovered matrix matches the original matrix
6668
assert recovered_matrix == extended_matrix
69+
70+
71+
@with_eip7594_and_later
72+
@spec_test
73+
@single_phase
74+
def test_get_extended_sample_count__1(spec):
75+
rng = random.Random(1111)
76+
allowed_failures = rng.randint(0, spec.config.NUMBER_OF_COLUMNS // 2)
77+
spec.get_extended_sample_count(allowed_failures)
78+
79+
80+
@with_eip7594_and_later
81+
@spec_test
82+
@single_phase
83+
def test_get_extended_sample_count__2(spec):
84+
rng = random.Random(2222)
85+
allowed_failures = rng.randint(0, spec.config.NUMBER_OF_COLUMNS // 2)
86+
spec.get_extended_sample_count(allowed_failures)
87+
88+
89+
@with_eip7594_and_later
90+
@spec_test
91+
@single_phase
92+
def test_get_extended_sample_count__3(spec):
93+
rng = random.Random(3333)
94+
allowed_failures = rng.randint(0, spec.config.NUMBER_OF_COLUMNS // 2)
95+
spec.get_extended_sample_count(allowed_failures)
96+
97+
98+
@with_eip7594_and_later
99+
@spec_test
100+
@single_phase
101+
def test_get_extended_sample_count__lower_bound(spec):
102+
allowed_failures = 0
103+
spec.get_extended_sample_count(allowed_failures)
104+
105+
106+
@with_eip7594_and_later
107+
@spec_test
108+
@single_phase
109+
def test_get_extended_sample_count__upper_bound(spec):
110+
allowed_failures = spec.config.NUMBER_OF_COLUMNS // 2
111+
spec.get_extended_sample_count(allowed_failures)
112+
113+
114+
@with_eip7594_and_later
115+
@spec_test
116+
@single_phase
117+
def test_get_extended_sample_count__upper_bound_exceed(spec):
118+
allowed_failures = spec.config.NUMBER_OF_COLUMNS // 2 + 1
119+
expect_assertion_error(lambda: spec.get_extended_sample_count(allowed_failures))
120+
121+
122+
@with_eip7594_and_later
123+
@spec_test
124+
@with_config_overrides({
125+
'NUMBER_OF_COLUMNS': 128,
126+
'SAMPLES_PER_SLOT': 16,
127+
})
128+
@single_phase
129+
def test_get_extended_sample_count__table_in_spec(spec):
130+
table = dict(
131+
# (allowed_failures, expected_extended_sample_count)
132+
{
133+
0: 16,
134+
1: 20,
135+
2: 24,
136+
3: 27,
137+
4: 29,
138+
5: 32,
139+
6: 35,
140+
7: 37,
141+
8: 40,
142+
}
143+
)
144+
for allowed_failures, expected_extended_sample_count in table.items():
145+
assert spec.get_extended_sample_count(allowed_failures=allowed_failures) == expected_extended_sample_count

0 commit comments

Comments
 (0)