Skip to content

Commit 52f3374

Browse files
committed
docs(02-04): complete block proxy plan — upstream wiring, fallback chain, routes, Docker verified
- 02-04-SUMMARY.md: execution summary with deviations and Docker verification results - STATE.md: Phase 2 position updated to 4/4 COMPLETE, decisions and metrics added - ROADMAP.md: Phase 2 marked complete (4/4 plans) - REQUIREMENTS.md: PRXY-01, PRXY-02, ENDP-02, ENDP-03 marked complete
1 parent fd2d916 commit 52f3374

File tree

4 files changed

+497
-0
lines changed

4 files changed

+497
-0
lines changed

.planning/REQUIREMENTS.md

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Requirements: NearBlocks Block Proxy
2+
3+
**Defined:** 2026-03-01
4+
**Core Value:** All indexers can reliably fetch block data from a single proxy service with full fault tolerance across multiple upstream sources
5+
6+
## v1 Requirements
7+
8+
### Core Proxy
9+
10+
- [x] **PRXY-01**: Proxy serves block data via `GET /block/:block_height` returning JSON
11+
- [x] **PRXY-02**: Concurrent requests for the same block height are deduplicated via in-memory singleflight — only one upstream fetch occurs
12+
- [x] **PRXY-03**: Block data is cached to local filesystem with atomic write-then-rename
13+
- [x] **PRXY-04**: Cached blocks are evicted after configurable TTL (recent blocks only)
14+
- [x] **PRXY-05**: Upstream sources are checked in configurable order: local cache -> S3 -> fastnear -> NEAR Lake
15+
- [x] **PRXY-06**: Each upstream source can be enabled/disabled via environment variables
16+
- [x] **PRXY-07**: Upstream sources have aggressive per-source timeouts (5-8s) to prevent cascading stalls
17+
- [ ] **PRXY-08**: Circuit breaker auto-disables upstream after consecutive failures with cooldown period
18+
- [x] **PRXY-09**: Block data is normalized to a canonical JSON format regardless of which upstream served it
19+
- [x] **PRXY-10**: Block data is passed through without deserialization where possible (serde_json RawValue)
20+
- [x] **PRXY-11**: Cached blocks are compressed with zstd to reduce disk usage
21+
22+
### Endpoints
23+
24+
- [x] **ENDP-01**: `GET /healthz` returns health status for Docker/K8s liveness probes
25+
- [x] **ENDP-02**: `GET /last_block/final` proxies to fastnear for chain tip (NOT cached, real-time)
26+
- [x] **ENDP-03**: Responses include `X-Upstream-Source` header indicating which source served the data
27+
- [ ] **ENDP-04**: `GET /stats` returns JSON with cache hit rate, dedup saves, cache size, upstream latencies
28+
29+
### Admin & Observability
30+
31+
- [ ] **ADMN-01**: Web admin dashboard displays upstream source status and allows toggling
32+
- [ ] **ADMN-02**: `POST /admin/upstreams/{source}/enable|disable` toggles sources at runtime without restart
33+
- [ ] **ADMN-03**: Prometheus `/metrics` endpoint exposes request counts, cache hit/miss, dedup saves, upstream latencies
34+
- [ ] **ADMN-04**: Admin routes are isolated from data-plane routes (separate auth/access)
35+
36+
### TypeScript Integration
37+
38+
- [ ] **TSIN-01**: nb-neardata package updated to fetch from `BLOCK_PROXY_URL` instead of direct fastnear
39+
- [ ] **TSIN-02**: nb-blocks package updated to fetch from `BLOCK_PROXY_URL` instead of direct S3/MinIO
40+
- [ ] **TSIN-03**: nb-neardata-raw (if applicable) updated to fetch from proxy
41+
- [ ] **TSIN-04**: Indexer apps require zero code changes — only package-level updates
42+
- [ ] **TSIN-05**: "Find final block" logic uses proxy's `/last_block/final` endpoint
43+
- [ ] **TSIN-06**: No fallback to direct upstream in packages — proxy is the single source
44+
45+
### Operations
46+
47+
- [ ] **OPER-01**: Graceful shutdown completes in-flight requests before process exit
48+
- [x] **OPER-02**: Structured logging via tracing crate with JSON output
49+
- [x] **OPER-03**: Docker image with deployment configuration (Dockerfile, compose)
50+
- [ ] **OPER-04**: Startup cache pre-scan repopulates index from existing filesystem cache
51+
- [x] **OPER-05**: Cold-start handling with readiness probe (don't accept traffic until ready)
52+
53+
## v2 Requirements
54+
55+
### Enhanced Observability
56+
57+
- **OBSV-01**: Per-upstream P50/P95 latency tracking with historical graphs
58+
- **OBSV-02**: Cache usage visualization in admin dashboard
59+
- **OBSV-03**: Block height range validation (reject invalid block numbers)
60+
61+
### Resilience
62+
63+
- **RESL-01**: Distributed proxy / HA mode for multi-instance deployment
64+
- **RESL-02**: Cache warming from historical block range on demand
65+
66+
## Out of Scope
67+
68+
| Feature | Reason |
69+
|---------|--------|
70+
| Redis for caching/dedup | Explicitly excluded — single-process proxy; in-memory is lower latency with no external dependency |
71+
| WebSocket streaming push API | Breaks existing pull-model indexer contracts; indexers request blocks, not subscribe |
72+
| Distributed proxy / HA | Singleflight breaks across processes; single instance with upstream fallback is sufficient for v1 |
73+
| Full archival cache | 100TB+ to cache all of NEAR mainnet; recent blocks only with upstream fallback for older blocks |
74+
| Fallback to direct upstream in packages | Proxy is the single source; no bypass logic in TypeScript packages |
75+
| Mobile or public-facing UI | Admin dashboard is internal only |
76+
77+
## Traceability
78+
79+
| Requirement | Phase | Status |
80+
|-------------|-------|--------|
81+
| PRXY-01 | Phase 2 | Complete |
82+
| PRXY-02 | Phase 2 | Complete |
83+
| PRXY-03 | Phase 2 | Complete |
84+
| PRXY-04 | Phase 2 | Complete |
85+
| PRXY-05 | Phase 2 | Complete |
86+
| PRXY-06 | Phase 2 | Complete |
87+
| PRXY-07 | Phase 2 | Complete |
88+
| PRXY-08 | Phase 5 | Pending |
89+
| PRXY-09 | Phase 2 | Complete |
90+
| PRXY-10 | Phase 2 | Complete |
91+
| PRXY-11 | Phase 2 | Complete |
92+
| ENDP-01 | Phase 1 | Complete |
93+
| ENDP-02 | Phase 2 | Complete |
94+
| ENDP-03 | Phase 2 | Complete |
95+
| ENDP-04 | Phase 4 | Pending |
96+
| ADMN-01 | Phase 4 | Pending |
97+
| ADMN-02 | Phase 4 | Pending |
98+
| ADMN-03 | Phase 4 | Pending |
99+
| ADMN-04 | Phase 4 | Pending |
100+
| TSIN-01 | Phase 3 | Pending |
101+
| TSIN-02 | Phase 3 | Pending |
102+
| TSIN-03 | Phase 3 | Pending |
103+
| TSIN-04 | Phase 3 | Pending |
104+
| TSIN-05 | Phase 3 | Pending |
105+
| TSIN-06 | Phase 3 | Pending |
106+
| OPER-01 | Phase 5 | Pending |
107+
| OPER-02 | Phase 1 | Complete |
108+
| OPER-03 | Phase 1 | Complete |
109+
| OPER-04 | Phase 5 | Pending |
110+
| OPER-05 | Phase 1 | Complete |
111+
112+
**Coverage:**
113+
- v1 requirements: 30 total
114+
- Mapped to phases: 30
115+
- Unmapped: 0
116+
117+
---
118+
*Requirements defined: 2026-03-01*
119+
*Last updated: 2026-03-01 after plan 01-01 completion — OPER-02, OPER-05, ENDP-01 marked complete*

.planning/ROADMAP.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Roadmap: NearBlocks Block Proxy
2+
3+
## Overview
4+
5+
Five phases deliver a Rust-based block data proxy that replaces hosted S3 as the central data source
6+
for all NEAR blockchain indexers. Phase 1 establishes a compilable, deployable service shell.
7+
Phase 2 builds the core proxy logic — singleflight deduplication, filesystem cache, and upstream
8+
fallback chain. Phase 3 redirects all TypeScript indexer packages at the proxy, validating format
9+
compatibility end-to-end. Phase 4 adds the admin dashboard, Prometheus metrics, and runtime
10+
upstream toggling. Phase 5 hardens the system with circuit breakers, graceful shutdown, and
11+
startup cache pre-scan.
12+
13+
## Phases
14+
15+
**Phase Numbering:**
16+
- Integer phases (1, 2, 3): Planned milestone work
17+
- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)
18+
19+
Decimal phases appear between their surrounding integers in numeric order.
20+
21+
- [x] **Phase 1: Foundation** - Compilable axum skeleton with Docker, health check, and structured logging (completed 2026-03-01)
22+
- [x] **Phase 2: Core Proxy** - Block serving endpoint with singleflight dedup, filesystem cache, and upstream fallback chain (completed 2026-03-01)
23+
- [ ] **Phase 3: TypeScript Integration** - nb-neardata and nb-blocks packages updated to route through proxy; zero indexer code changes
24+
- [ ] **Phase 4: Observability and Admin** - Admin dashboard, Prometheus metrics, runtime upstream toggle, and stats endpoint
25+
- [ ] **Phase 5: Hardening and Operations** - Circuit breaker, graceful shutdown, startup cache pre-scan
26+
27+
## Phase Details
28+
29+
### Phase 1: Foundation
30+
**Goal**: A deployable Rust service skeleton is running and reachable in Docker with health probes passing
31+
**Depends on**: Nothing (first phase)
32+
**Requirements**: OPER-02, OPER-03, OPER-05, ENDP-01
33+
**Success Criteria** (what must be TRUE):
34+
1. `GET /healthz` returns 200 from the running container
35+
2. Docker compose brings the proxy up and it is reachable from other containers on the network
36+
3. Structured JSON logs are visible in container output for every request
37+
4. Readiness probe blocks traffic until service initialization is complete (returns non-200 until ready)
38+
5. All env-var config values load at startup and log their effective values
39+
**Plans**: 2 plans
40+
- [x] 01-01-PLAN.md — Rust service skeleton with config, logging, state, and health/readiness endpoints
41+
- [x] 01-02-PLAN.md — Docker deployment with cargo-chef Dockerfile and mainnet/testnet compose files
42+
43+
### Phase 2: Core Proxy
44+
**Goal**: Indexers can request any block by height and receive correct JSON, with concurrent deduplication, local caching, and transparent upstream fallback
45+
**Depends on**: Phase 1
46+
**Requirements**: PRXY-01, PRXY-02, PRXY-03, PRXY-04, PRXY-05, PRXY-06, PRXY-07, PRXY-09, PRXY-10, PRXY-11, ENDP-02, ENDP-03
47+
**Success Criteria** (what must be TRUE):
48+
1. `GET /block/:height` returns valid NEAR block JSON; 10 simultaneous requests for the same height produce exactly 1 upstream fetch
49+
2. A block fetched once is served from local filesystem cache on subsequent requests (observable via `X-Upstream-Source: cache` response header)
50+
3. Disabling S3 via env var causes requests to fall through to fastnear without error; disabling fastnear causes fallback to NEAR Lake
51+
4. `GET /last_block/final` returns the current chain tip in real time (no cached value)
52+
5. A single upstream source timing out does not stall the request beyond the per-source timeout window; the next source in the chain is tried
53+
**Plans**: 4 plans
54+
- [ ] 02-01-PLAN.md — Dependencies, Config extension, AppError, and AppState foundation
55+
- [ ] 02-02-PLAN.md — Filesystem cache with zstd compression, atomic writes, and background eviction
56+
- [ ] 02-03-PLAN.md — Upstream fetcher modules (S3/MinIO, fastnear, NEAR Lake with shard assembly)
57+
- [ ] 02-04-PLAN.md — Singleflight dedup, fallback chain orchestrator, and route handlers
58+
59+
### Phase 3: TypeScript Integration
60+
**Goal**: All TypeScript indexer packages fetch blocks exclusively from the proxy; at least one indexer (indexer-events) runs against the proxy for 500+ consecutive blocks with zero deserialization errors
61+
**Depends on**: Phase 2
62+
**Requirements**: TSIN-01, TSIN-02, TSIN-03, TSIN-04, TSIN-05, TSIN-06
63+
**Success Criteria** (what must be TRUE):
64+
1. Setting `BLOCK_PROXY_URL` in the environment redirects nb-neardata and nb-blocks to the proxy without any changes to indexer app code
65+
2. indexer-events processes 500+ consecutive blocks through the proxy with no errors or deserialization failures
66+
3. The canonical block JSON format is documented and an integration test asserts the proxy response matches what nb-neardata/nb-blocks expect
67+
4. Removing direct S3/fastnear credentials from an indexer environment does not cause errors — the proxy is the only required endpoint
68+
**Plans**: TBD
69+
70+
### Phase 4: Observability and Admin
71+
**Goal**: Operators can view proxy health and cache stats at a glance, toggle upstream sources at runtime without a restart, and Prometheus metrics are being scraped
72+
**Depends on**: Phase 3
73+
**Requirements**: ADMN-01, ADMN-02, ADMN-03, ADMN-04, ENDP-04
74+
**Success Criteria** (what must be TRUE):
75+
1. The web admin dashboard shows current upstream source status (enabled/disabled) and cache hit rate without requiring any CLI access
76+
2. `POST /admin/upstreams/s3/disable` disables S3 and subsequent block requests skip it immediately, without a service restart
77+
3. `GET /metrics` returns Prometheus-formatted counters including request count, cache hit/miss, dedup saves, and per-upstream latency
78+
4. `GET /stats` returns a JSON snapshot of cache hit rate, dedup saves, cache size, and upstream latencies
79+
5. Admin endpoints are not reachable on the same port/path as data-plane endpoints
80+
**Plans**: TBD
81+
82+
### Phase 5: Hardening and Operations
83+
**Goal**: The proxy handles upstream failures automatically, shuts down cleanly under load, and restarts without a cold-start thundering herd from an empty cache
84+
**Depends on**: Phase 4
85+
**Requirements**: PRXY-08, OPER-01, OPER-04
86+
**Success Criteria** (what must be TRUE):
87+
1. An upstream that fails 3 consecutive times is auto-disabled for a cooldown period; it re-enables automatically after cooldown without operator intervention
88+
2. Sending SIGTERM to the proxy allows in-flight block requests to complete before the process exits (no 50x errors during graceful shutdown)
89+
3. Restarting the proxy with an existing cache directory repopulates the in-memory cache index at startup, serving cache hits immediately rather than treating all blocks as misses
90+
**Plans**: TBD
91+
92+
## Progress
93+
94+
**Execution Order:**
95+
Phases execute in numeric order: 1 -> 2 -> 3 -> 4 -> 5
96+
97+
| Phase | Plans Complete | Status | Completed |
98+
|-------|----------------|--------|-----------|
99+
| 1. Foundation | 2/2 | Complete | 2026-03-01 |
100+
| 2. Core Proxy | 4/4 | Complete | 2026-03-01 |
101+
| 3. TypeScript Integration | 0/? | Not started | - |
102+
| 4. Observability and Admin | 0/? | Not started | - |
103+
| 5. Hardening and Operations | 0/? | Not started | - |

.planning/STATE.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
---
2+
gsd_state_version: 1.0
3+
milestone: v1.0
4+
milestone_name: milestone
5+
status: unknown
6+
last_updated: "2026-03-01T10:36:23.434Z"
7+
progress:
8+
total_phases: 2
9+
completed_phases: 2
10+
total_plans: 6
11+
completed_plans: 6
12+
---
13+
14+
# Project State
15+
16+
## Project Reference
17+
18+
See: .planning/PROJECT.md (updated 2026-03-01)
19+
20+
**Core value:** All indexers can reliably and efficiently fetch block data from a single proxy service with full fault tolerance — if any upstream source fails, the system transparently falls back to alternatives without indexer downtime.
21+
**Current focus:** Phase 2 — Core Proxy
22+
23+
## Current Position
24+
25+
Phase: 2 of 5 (Core Proxy)
26+
Plan: 4 of 4 in current phase (COMPLETE — Phase 2 done)
27+
Status: Phase 2 complete
28+
Last activity: 2026-03-01 — Completed Plan 02-04: Wire upstream clients, fallback chain orchestrator, block/last_block route handlers; Docker verified with cache, fastnear, structured logs
29+
30+
Progress: [██████░░░░] 60%
31+
32+
## Performance Metrics
33+
34+
**Velocity:**
35+
- Total plans completed: 5
36+
- Average duration: 4.2 min
37+
- Total execution time: 0.4 hours
38+
39+
**By Phase:**
40+
41+
| Phase | Plans | Total | Avg/Plan |
42+
|-------|-------|-------|----------|
43+
| 1. Foundation | 2 | 11 min | 5.5 min |
44+
| 2. Core Proxy | 4 | 47 min | 11.8 min |
45+
46+
**Recent Trend:**
47+
- Last 5 plans: 01-02 (5 min), 02-01 (5 min), 02-02 (3 min), 02-03 (3 min), 02-04 (36 min)
48+
- Trend: Plan 04 longer due to Docker verification + 2 runtime bug fixes
49+
50+
*Updated after each plan completion*
51+
52+
## Accumulated Context
53+
54+
### Decisions
55+
56+
Decisions are logged in PROJECT.md Key Decisions table.
57+
Recent decisions affecting current work:
58+
59+
- [Pre-phase]: Rust for proxy service — high-concurrency in-memory locking, no GC pauses
60+
- [Pre-phase]: async_singleflight over manual DashMap — DashMap references held across .await cause deadlocks
61+
- [Pre-phase]: Filesystem cache over database — simple, bounded, no extra infra
62+
- [Pre-phase]: Fallback chain local -> S3 -> fastnear -> near lake — ordered by latency/reliability
63+
- [Pre-phase]: Package-level changes only — indexers require zero app code changes
64+
- [01-01]: PinoFormat custom FormatEvent — numeric level integers (not strings) to match pino consumers
65+
- [01-01]: JsonVisitor with serde_json::Map — proper typed field collection (not all-string Debug output)
66+
- [01-01]: set_ready() called after TcpListener::bind() succeeds, never before — prevents premature traffic
67+
- [01-01]: Cargo.lock committed — binary crate convention for reproducible builds
68+
- [Phase 01-02]: cargo-chef three-stage build: planner recipe.json -> builder cook+compile -> debian:bookworm-slim runtime
69+
- [Phase 01-02]: debian:bookworm-slim runtime base (not alpine) — Rust glibc linking; musl cross-compilation not needed
70+
- [Phase 01-02]: curl installed in runtime image for Docker healthcheck CMD — matches monorepo pattern
71+
- [Phase 01-02]: Port 3015 (mainnet) and 3016 (testnet) — avoids conflicts with existing services on 3003-3013
72+
- [Phase 02-core-proxy]: Arc<Config> in AppState: config is immutable after startup, Arc enables cheap Clone without data copy
73+
- [Phase 02-core-proxy]: AppError pattern: blanket From<E: Into<anyhow::Error>> so ? operator works in any axum handler automatically
74+
- [Phase 02-core-proxy]: BlockResult = (Bytes, &'static str): decouples block data from source label for X-Upstream-Source header injection
75+
- [02-02]: block_height_to_path sharding: dir1=first 6 digits, dir2=next 3 digits — max 1000 files per leaf, handles 200M+ blocks
76+
- [02-02]: Temp file for atomic write must be in same directory as target — cross-filesystem rename is not atomic
77+
- [02-02]: write_background is NOT async — spawns and returns immediately so cache writes never add latency to response path
78+
- [02-02]: Historical blocks (outside recent_block_window of tip) are never evicted — only recent blocks with expired TTL
79+
- [02-03]: tokio::time::timeout() wraps entire S3/NEAR Lake async block — AWS SDK has no per-request timeout API
80+
- [02-03]: FastnearUpstream uses per-request .timeout() on reqwest RequestBuilder, not client-level timeout
81+
- [02-03]: NearLakeUpstream::new() is async (aws_config::load_defaults is async) — Plan 04 must await constructor
82+
- [02-03]: get_object_owned(String) pattern: owned key avoids Rust E0515 lifetime issue in try_join_all futures
83+
- [02-03]: NEAR Lake shard count defaults to 4 when chunks_included field missing (NEAR mainnet has 4 shards)
84+
- [Phase 02-04]: async_singleflight DefaultGroup::work() returns Result<T, Option<E>>: leader gets full errors, followers get Err(None) — handled with synthetic UpstreamError
85+
- [Phase 02-04]: axum 0.8 path syntax requires {param} not :param — runtime panic not compile error, Docker verified
86+
- [Phase 02-04]: cargo-chef:latest-rust-1-bookworm pinned for builder/runtime glibc compatibility (trixie=2.41 vs bookworm=2.36)
87+
88+
### Pending Todos
89+
90+
None yet.
91+
92+
### Blockers/Concerns
93+
94+
- [Phase 2]: Upstream endpoint URLs for fastnear and NEAR Lake must be confirmed against live endpoints before implementing upstream fetcher
95+
- [Phase 2]: Cache directory sharding strategy (subdirectory by height prefix) must be decided before filesystem write path is implemented — retrofitting is painful
96+
- [Phase 2]: Disk usage cap / eviction trigger parameter must be defined before eviction task is implemented
97+
- [Phase 3]: Canonical block JSON format (unified schema across S3 split format and fastnear unified format) must be formally specified before integration tests can be written
98+
99+
## Session Continuity
100+
101+
Last session: 2026-03-01
102+
Stopped at: Completed 02-04-PLAN.md — Wire upstream clients, fallback chain orchestrator, block/last_block route handlers, Docker verified; Phase 2 complete
103+
Resume file: None

0 commit comments

Comments
 (0)