|
| 1 | +# Roadmap: NearBlocks Block Proxy |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Five phases deliver a Rust-based block data proxy that replaces hosted S3 as the central data source |
| 6 | +for all NEAR blockchain indexers. Phase 1 establishes a compilable, deployable service shell. |
| 7 | +Phase 2 builds the core proxy logic — singleflight deduplication, filesystem cache, and upstream |
| 8 | +fallback chain. Phase 3 redirects all TypeScript indexer packages at the proxy, validating format |
| 9 | +compatibility end-to-end. Phase 4 adds the admin dashboard, Prometheus metrics, and runtime |
| 10 | +upstream toggling. Phase 5 hardens the system with circuit breakers, graceful shutdown, and |
| 11 | +startup cache pre-scan. |
| 12 | + |
| 13 | +## Phases |
| 14 | + |
| 15 | +**Phase Numbering:** |
| 16 | +- Integer phases (1, 2, 3): Planned milestone work |
| 17 | +- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED) |
| 18 | + |
| 19 | +Decimal phases appear between their surrounding integers in numeric order. |
| 20 | + |
| 21 | +- [x] **Phase 1: Foundation** - Compilable axum skeleton with Docker, health check, and structured logging (completed 2026-03-01) |
| 22 | +- [x] **Phase 2: Core Proxy** - Block serving endpoint with singleflight dedup, filesystem cache, and upstream fallback chain (completed 2026-03-01) |
| 23 | +- [ ] **Phase 3: TypeScript Integration** - nb-neardata and nb-blocks packages updated to route through proxy; zero indexer code changes |
| 24 | +- [ ] **Phase 4: Observability and Admin** - Admin dashboard, Prometheus metrics, runtime upstream toggle, and stats endpoint |
| 25 | +- [ ] **Phase 5: Hardening and Operations** - Circuit breaker, graceful shutdown, startup cache pre-scan |
| 26 | + |
| 27 | +## Phase Details |
| 28 | + |
| 29 | +### Phase 1: Foundation |
| 30 | +**Goal**: A deployable Rust service skeleton is running and reachable in Docker with health probes passing |
| 31 | +**Depends on**: Nothing (first phase) |
| 32 | +**Requirements**: OPER-02, OPER-03, OPER-05, ENDP-01 |
| 33 | +**Success Criteria** (what must be TRUE): |
| 34 | + 1. `GET /healthz` returns 200 from the running container |
| 35 | + 2. Docker compose brings the proxy up and it is reachable from other containers on the network |
| 36 | + 3. Structured JSON logs are visible in container output for every request |
| 37 | + 4. Readiness probe blocks traffic until service initialization is complete (returns non-200 until ready) |
| 38 | + 5. All env-var config values load at startup and log their effective values |
| 39 | +**Plans**: 2 plans |
| 40 | + - [x] 01-01-PLAN.md — Rust service skeleton with config, logging, state, and health/readiness endpoints |
| 41 | + - [x] 01-02-PLAN.md — Docker deployment with cargo-chef Dockerfile and mainnet/testnet compose files |
| 42 | + |
| 43 | +### Phase 2: Core Proxy |
| 44 | +**Goal**: Indexers can request any block by height and receive correct JSON, with concurrent deduplication, local caching, and transparent upstream fallback |
| 45 | +**Depends on**: Phase 1 |
| 46 | +**Requirements**: PRXY-01, PRXY-02, PRXY-03, PRXY-04, PRXY-05, PRXY-06, PRXY-07, PRXY-09, PRXY-10, PRXY-11, ENDP-02, ENDP-03 |
| 47 | +**Success Criteria** (what must be TRUE): |
| 48 | + 1. `GET /block/:height` returns valid NEAR block JSON; 10 simultaneous requests for the same height produce exactly 1 upstream fetch |
| 49 | + 2. A block fetched once is served from local filesystem cache on subsequent requests (observable via `X-Upstream-Source: cache` response header) |
| 50 | + 3. Disabling S3 via env var causes requests to fall through to fastnear without error; disabling fastnear causes fallback to NEAR Lake |
| 51 | + 4. `GET /last_block/final` returns the current chain tip in real time (no cached value) |
| 52 | + 5. A single upstream source timing out does not stall the request beyond the per-source timeout window; the next source in the chain is tried |
| 53 | +**Plans**: 4 plans |
| 54 | + - [ ] 02-01-PLAN.md — Dependencies, Config extension, AppError, and AppState foundation |
| 55 | + - [ ] 02-02-PLAN.md — Filesystem cache with zstd compression, atomic writes, and background eviction |
| 56 | + - [ ] 02-03-PLAN.md — Upstream fetcher modules (S3/MinIO, fastnear, NEAR Lake with shard assembly) |
| 57 | + - [ ] 02-04-PLAN.md — Singleflight dedup, fallback chain orchestrator, and route handlers |
| 58 | + |
| 59 | +### Phase 3: TypeScript Integration |
| 60 | +**Goal**: All TypeScript indexer packages fetch blocks exclusively from the proxy; at least one indexer (indexer-events) runs against the proxy for 500+ consecutive blocks with zero deserialization errors |
| 61 | +**Depends on**: Phase 2 |
| 62 | +**Requirements**: TSIN-01, TSIN-02, TSIN-03, TSIN-04, TSIN-05, TSIN-06 |
| 63 | +**Success Criteria** (what must be TRUE): |
| 64 | + 1. Setting `BLOCK_PROXY_URL` in the environment redirects nb-neardata and nb-blocks to the proxy without any changes to indexer app code |
| 65 | + 2. indexer-events processes 500+ consecutive blocks through the proxy with no errors or deserialization failures |
| 66 | + 3. The canonical block JSON format is documented and an integration test asserts the proxy response matches what nb-neardata/nb-blocks expect |
| 67 | + 4. Removing direct S3/fastnear credentials from an indexer environment does not cause errors — the proxy is the only required endpoint |
| 68 | +**Plans**: TBD |
| 69 | + |
| 70 | +### Phase 4: Observability and Admin |
| 71 | +**Goal**: Operators can view proxy health and cache stats at a glance, toggle upstream sources at runtime without a restart, and Prometheus metrics are being scraped |
| 72 | +**Depends on**: Phase 3 |
| 73 | +**Requirements**: ADMN-01, ADMN-02, ADMN-03, ADMN-04, ENDP-04 |
| 74 | +**Success Criteria** (what must be TRUE): |
| 75 | + 1. The web admin dashboard shows current upstream source status (enabled/disabled) and cache hit rate without requiring any CLI access |
| 76 | + 2. `POST /admin/upstreams/s3/disable` disables S3 and subsequent block requests skip it immediately, without a service restart |
| 77 | + 3. `GET /metrics` returns Prometheus-formatted counters including request count, cache hit/miss, dedup saves, and per-upstream latency |
| 78 | + 4. `GET /stats` returns a JSON snapshot of cache hit rate, dedup saves, cache size, and upstream latencies |
| 79 | + 5. Admin endpoints are not reachable on the same port/path as data-plane endpoints |
| 80 | +**Plans**: TBD |
| 81 | + |
| 82 | +### Phase 5: Hardening and Operations |
| 83 | +**Goal**: The proxy handles upstream failures automatically, shuts down cleanly under load, and restarts without a cold-start thundering herd from an empty cache |
| 84 | +**Depends on**: Phase 4 |
| 85 | +**Requirements**: PRXY-08, OPER-01, OPER-04 |
| 86 | +**Success Criteria** (what must be TRUE): |
| 87 | + 1. An upstream that fails 3 consecutive times is auto-disabled for a cooldown period; it re-enables automatically after cooldown without operator intervention |
| 88 | + 2. Sending SIGTERM to the proxy allows in-flight block requests to complete before the process exits (no 50x errors during graceful shutdown) |
| 89 | + 3. Restarting the proxy with an existing cache directory repopulates the in-memory cache index at startup, serving cache hits immediately rather than treating all blocks as misses |
| 90 | +**Plans**: TBD |
| 91 | + |
| 92 | +## Progress |
| 93 | + |
| 94 | +**Execution Order:** |
| 95 | +Phases execute in numeric order: 1 -> 2 -> 3 -> 4 -> 5 |
| 96 | + |
| 97 | +| Phase | Plans Complete | Status | Completed | |
| 98 | +|-------|----------------|--------|-----------| |
| 99 | +| 1. Foundation | 2/2 | Complete | 2026-03-01 | |
| 100 | +| 2. Core Proxy | 4/4 | Complete | 2026-03-01 | |
| 101 | +| 3. TypeScript Integration | 0/? | Not started | - | |
| 102 | +| 4. Observability and Admin | 0/? | Not started | - | |
| 103 | +| 5. Hardening and Operations | 0/? | Not started | - | |
0 commit comments