Skip to content

Commit 651732d

Browse files
authored
docs: blob storage documentation (#19194)
## Summary - Add comprehensive blob storage documentation for node operators - Rename `BLOB_SINK_ARCHIVE_API_URL` → `BLOB_ARCHIVE_API_URL` (cleanup after BlobSink removal) - Remove dead environment variables `BLOB_SINK_PORT` and `BLOB_SINK_URL` ## Description Following the removal of the BlobSink HTTP server (#19143), this PR: 1. **Adds new documentation** ([blob_storage.md](docs/docs-network/setup/blob_storage.md)) explaining how Aztec nodes store and retrieve blob data, including: - Overview of blob sources (FileStore, L1 Consensus, Archive API) - PeerDAS and supernode requirements for L1 consensus - Configuration examples for GCS, S3, and Cloudflare R2 - Authentication setup - Troubleshooting guide 2. **Adds blob upload documentation** ([blob_upload.md](docs/docs-network/setup/blob_upload.md)) for node operators who want to contribute to the network by hosting a blob file store, including: - Upload configuration with `BLOB_FILE_STORE_UPLOAD_URL` - How to expose public HTTP endpoints for GCS, S3, and R2 - Authentication with write permissions 3. **Cleans up legacy naming** by renaming `BLOB_SINK_ARCHIVE_API_URL` to `BLOB_ARCHIVE_API_URL` - the "sink" terminology is no longer accurate since the HTTP server was removed 4. **Removes dead code** - `BLOB_SINK_PORT` and `BLOB_SINK_URL` env vars that were left behind after BlobSink removal --- Fixes A-389
2 parents 629a41b + 556a9a9 commit 651732d

File tree

6 files changed

+418
-5
lines changed

6 files changed

+418
-5
lines changed
Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
---
2+
id: blob_storage
3+
sidebar_position: 4
4+
title: Blob retrieval
5+
description: Learn how Aztec nodes retrieve blob data for L1 transactions.
6+
---
7+
8+
## Overview
9+
10+
Aztec uses EIP-4844 blobs to publish transaction data to Ethereum Layer 1. Since blob data is only available on L1 for a limited period (~18 days / 4,096 epochs), nodes need reliable ways to store and retrieve blob data for synchronization and historical access.
11+
12+
Aztec nodes can be configured to retrieve blobs from L1 consensus (beacon nodes), file stores (S3, GCS, R2), and archive services.
13+
14+
:::tip Automatic Configuration
15+
When using `--network [NETWORK_NAME]`, blob file stores are automatically configured for you. Most users don't need to manually configure blob storage.
16+
:::
17+
18+
:::warning Override Behavior
19+
Setting the `BLOB_FILE_STORE_URLS` environment variable overrides the file store configuration from the network config.
20+
:::
21+
22+
## Understanding blob sources
23+
24+
The blob client can retrieve blobs from multiple sources, tried in order:
25+
26+
1. **File Store**: Fast retrieval from configured storage (S3, GCS, R2, local files, HTTPS)
27+
2. **L1 Consensus**: Beacon node API to a (semi-)supernode for recent blobs (within ~18 days)
28+
3. **Archive API**: Services like Blobscan for historical blob data
29+
30+
For near-tip synchronization, the client will retry file stores with backoff to handle eventual consistency when blobs are still being uploaded by other validators.
31+
32+
### L1 consensus and blob availability
33+
34+
If your beacon node has access to [supernodes or semi-supernodes](https://ethereum.org/roadmap/fusaka/peerdas/), L1 consensus alone may be sufficient for retrieving blobs within the ~18 day retention period. With the Fusaka upgrade and [PeerDAS (Peer Data Availability Sampling)](https://eips.ethereum.org/EIPS/eip-7594), Ethereum uses erasure coding to split blobs into 128 columns, enabling robust data availability:
35+
36+
- **Supernodes** (validators with ≥4,096 ETH staked): Custody all 128 columns and all blob data for the full ~18 day retention period. These nodes form the backbone of the network and continuously heal data gaps.
37+
- **Semi-supernodes** (validators with ≥1,824 ETH / 57 validators): Handle at least 64 columns, enabling reconstruction of complete blob data.
38+
- **Regular nodes**: Only download 1/8th of the data (8 of 128 columns) to verify availability. This is **not sufficient** to serve complete blob data.
39+
40+
:::warning Supernodes
41+
If L1 consensus is your only blob source, your beacon node must be a supernode or semi-supernode (or connected to one) to retrieve complete blobs. A regular node cannot reconstruct full blob data from its partial columns alone.
42+
:::
43+
44+
This means that for recent blobs, configuring `L1_CONSENSUS_HOST_URLS` pointing to a well-connected supernode or semi-supernode may be all you need. However, file stores and archive APIs are still recommended for:
45+
- Faster retrieval (file stores are typically faster than L1 consensus queries)
46+
- Historical access (blobs older than ~18 days are pruned from L1)
47+
- Redundancy (multiple sources improve reliability)
48+
49+
## Configuring blob sources
50+
51+
### Environment variables
52+
53+
Configure blob sources using environment variables:
54+
55+
| Variable | Description | Example |
56+
|----------|-------------|---------|
57+
| `BLOB_FILE_STORE_URLS` | Comma-separated URLs to read blobs from | `gs://bucket/,s3://bucket/` |
58+
| `L1_CONSENSUS_HOST_URLS` | Beacon node URLs (comma-separated) | `https://beacon.example.com` |
59+
| `L1_CONSENSUS_HOST_API_KEYS` | API keys for beacon nodes | `key1,key2` |
60+
| `L1_CONSENSUS_HOST_API_KEY_HEADERS` | Header names for API keys | `Authorization` |
61+
| `BLOB_ARCHIVE_API_URL` | Archive API URL (e.g., Blobscan) | `https://api.blobscan.com` |
62+
| `BLOB_ALLOW_EMPTY_SOURCES` | Allow no blob sources (default: false) | `false` |
63+
64+
:::tip
65+
If you want to contribute to the network by hosting a blob file store, see the [Blob upload guide](./blob_upload.md).
66+
:::
67+
68+
### Supported storage backends
69+
70+
The blob client supports the same storage backends as snapshots:
71+
72+
- **Google Cloud Storage** - `gs://bucket-name/path/`
73+
- **Amazon S3** - `s3://bucket-name/path/`
74+
- **Cloudflare R2** - `s3://bucket-name/path/?endpoint=https://[ACCOUNT_ID].r2.cloudflarestorage.com`
75+
- **HTTP/HTTPS** (read-only) - `https://host/path`
76+
- **Local filesystem** - `file:///absolute/path`
77+
78+
### Storage path format
79+
80+
Blobs are stored using the following path structure:
81+
82+
```
83+
{base_url}/aztec-{l1ChainId}-{rollupVersion}-{rollupAddress}/blobs/{versionedBlobHash}.data
84+
```
85+
86+
For example:
87+
```
88+
gs://my-bucket/aztec-1-1-0x1234abcd.../blobs/0x01abc123...def.data
89+
```
90+
91+
## Configuration examples
92+
93+
### Basic file store configuration
94+
95+
```bash
96+
# Read blobs from GCS
97+
BLOB_FILE_STORE_URLS=gs://my-snapshots/
98+
```
99+
100+
### Multiple read sources with L1 fallback
101+
102+
```bash
103+
# Try multiple sources in order
104+
BLOB_FILE_STORE_URLS=gs://primary-bucket/,s3://backup-bucket/
105+
106+
# L1 consensus fallback
107+
L1_CONSENSUS_HOST_URLS=https://beacon1.example.com,https://beacon2.example.com
108+
109+
# Archive fallback for historical blobs
110+
BLOB_ARCHIVE_API_URL=https://api.blobscan.com
111+
```
112+
113+
### Cloudflare R2 configuration
114+
115+
```bash
116+
BLOB_FILE_STORE_URLS=s3://my-bucket/?endpoint=https://[ACCOUNT_ID].r2.cloudflarestorage.com
117+
```
118+
119+
Replace `[ACCOUNT_ID]` with your Cloudflare account ID.
120+
121+
### Local filesystem (for testing)
122+
123+
```bash
124+
BLOB_FILE_STORE_URLS=file:///data/blobs
125+
```
126+
127+
## Authentication
128+
129+
### Google Cloud Storage
130+
131+
Set up [Application Default Credentials](https://cloud.google.com/docs/authentication/application-default-credentials):
132+
133+
```bash
134+
gcloud auth application-default login
135+
```
136+
137+
Or use a service account key:
138+
139+
```bash
140+
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json
141+
```
142+
143+
### Amazon S3 / Cloudflare R2
144+
145+
Set AWS credentials as environment variables:
146+
147+
```bash
148+
export AWS_ACCESS_KEY_ID=your-access-key
149+
export AWS_SECRET_ACCESS_KEY=your-secret-key
150+
```
151+
152+
For R2, these credentials come from your Cloudflare R2 API tokens.
153+
154+
## How blob retrieval works
155+
156+
When a node needs blobs for a block, the blob client follows this retrieval order:
157+
158+
### During historical sync
159+
1. **File Store** - Quick lookup in configured file stores
160+
2. **L1 Consensus** - Query beacon nodes using slot number
161+
3. **Archive API** - Fall back to Blobscan or similar service
162+
163+
### During near-tip sync
164+
1. **File Store** - Quick lookup (no retries)
165+
2. **L1 Consensus** - Query beacon nodes
166+
3. **File Store with retries** - Retry with backoff for eventual consistency
167+
4. **Archive API** - Final fallback
168+
169+
## Troubleshooting
170+
171+
### No blob sources configured
172+
173+
**Issue**: Node starts with warning about no blob sources.
174+
175+
**Solutions**:
176+
- Configure at least one of: `BLOB_FILE_STORE_URLS`, `L1_CONSENSUS_HOST_URLS`, or `BLOB_ARCHIVE_API_URL`
177+
- Set `BLOB_ALLOW_EMPTY_SOURCES=true` only if you understand the implications (node may fail to sync)
178+
179+
### Blob retrieval fails
180+
181+
**Issue**: Node cannot retrieve blobs for a block.
182+
183+
**Solutions**:
184+
- Verify your file store URLs are accessible
185+
- Check L1 consensus host connectivity
186+
- Ensure authentication credentials are configured
187+
- Try using multiple file store URLs for redundancy
188+
189+
### L1 consensus host errors
190+
191+
**Issue**: Cannot connect to beacon nodes.
192+
193+
**Solutions**:
194+
- Verify beacon node URLs are correct and accessible
195+
- Check if API keys are required and correctly configured
196+
- Ensure the beacon node is synced
197+
- Try multiple beacon node URLs for redundancy
198+
199+
## Best practices
200+
201+
- **Configure multiple sources**: Use multiple file store URLs and L1 consensus hosts for redundancy
202+
- **Use file stores for production**: File stores provide faster, more reliable blob retrieval than L1 consensus
203+
- **Use archive API for historical access**: Configure `BLOB_ARCHIVE_API_URL` for accessing blobs older than ~18 days. Even with PeerDAS supernodes providing robust data availability, blob data is pruned from L1 after 4,096 epochs. Archive services like [Blobscan](https://blobscan.com/) store historical blob data indefinitely
204+
205+
## Next Steps
206+
207+
- Learn how to [host a blob file store](./blob_upload.md) to contribute to the network
208+
- Learn about [using snapshots](./syncing_best_practices.md) for faster node synchronization
209+
- Set up [monitoring](../operation/monitoring.md) to track your node's blob retrieval
210+
- Check the [CLI reference](../reference/cli_reference.md) for additional blob-related options
211+
- Join the [Aztec Discord](https://discord.gg/aztec) for support

0 commit comments

Comments
 (0)