Skip to content

Commit 3b1997b

Browse files
authored
docs: add end-to-end oracle flow diagram (#54)
Added mermaid diagram showing complete system architecture from scheduler trigger through data processing to blockchain submission. Includes visual representation of circuit breaker, cache logic, BigQuery pipeline, eligibility criteria, RPC failover, and monitoring systems.
1 parent dda3c24 commit 3b1997b

File tree

1 file changed

+175
-15
lines changed

1 file changed

+175
-15
lines changed

docs/technical-design.md

Lines changed: 175 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,168 @@
11
# Technical Design & Architecture
22

3-
This document outlines key architectural decisions and data flows within the Rewards Eligibility Oracle.
3+
This document's purpose is to visually represent the Rewards Eligibility Oracle codebase, as a more approachable alternative to reading through the codebase directly.
4+
5+
## End-to-End Oracle Flow
6+
7+
The Rewards Eligibility Oracle operates as a daily scheduled service that evaluates indexer performance and updates on-chain rewards eligibility via function calls to the RewardsEligibilityOracle contract. The diagram below illustrates the complete execution flow from scheduler trigger through data processing to blockchain submission and error handling.
8+
9+
The Oracle is designed to be resilient to transient network issues and RPC provider failures. It uses a multi-layered approach involving internal retries, provider rotation, and a circuit breaker to prevent costly infinite restart loops that needlessly burn through BigQuery requests.
10+
11+
```mermaid
12+
---
13+
title: Rewards Eligibility Oracle - End-to-End Flow
14+
---
15+
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#fee2e2', 'primaryTextColor':'#7f1d1d', 'primaryBorderColor':'#ef4444', 'lineColor':'#6b7280'}}}%%
16+
17+
graph TB
18+
%% Docker Container - Contains all oracle logic
19+
subgraph DOCKER["Docker Container"]
20+
Scheduler["Python Scheduler"]
21+
Oracle["Rewards Eligibility Oracle"]
22+
23+
subgraph CIRCUIT_BREAKER["Circuit Breaker Logic"]
24+
CB["Circuit Breaker"]
25+
CBCheck{"Has there been more<br/>than 3 failures in the <br/>last 60 minutes?"}
26+
end
27+
28+
Scheduler -.->|"Phase 1: Schedule daily run"| Oracle
29+
30+
%% Data Pipeline
31+
subgraph PIPELINE["Data Pipeline"]
32+
CacheCheck{"Do we have recent cached<br/>BigQuery results available?<br/>(< 30 min old)"}
33+
34+
subgraph BIGQUERY["BigQuery Analysis"]
35+
FetchData["Fetch Indexer Performance Data<br/>over last 28 days<br/>(from BigQuery)"]
36+
SQLQuery["- Daily query metrics<br/>- Days online calculation<br/>- Subgraph coverage"]
37+
end
38+
39+
subgraph PROCESSING["Eligibility Processing"]
40+
ApplyCriteria["Apply Criteria e.g.<br/>5+ days online<br/>Latency < 5000ms<br/>Blocks behind < 50000<br/>1+ subgraph served"]
41+
FilterData["Filter Eligible<br/>vs Ineligible"]
42+
GenArtifacts["Generate CSV Artifacts:<br/>- eligible_indexers.csv<br/>- ineligible_indexers.csv<br/>- full_metrics.csv"]
43+
end
44+
end
45+
46+
%% Blockchain Layer
47+
subgraph BLOCKCHAIN["Blockchain Submission"]
48+
Batch["Consume series of Eligible<br/>Indexers from CSV.<br/>Batch indexer addresses<br/>into groups of 125 indexers."]
49+
50+
subgraph RPC["RPC Failover System"]
51+
TryRPC["Try establish connection<br/>with RPC provider"]
52+
RPCError["Rotate to next RPC provider"]
53+
end
54+
55+
BuildTx["Build Transaction:<br/>- Estimate gas<br/>- Get nonce<br/>- Sign with key"]
56+
SubmitTx["Submit Batch to Contract<br/>call function:<br/>renewIndexerEligibility()"]
57+
WaitReceipt["Wait for Receipt<br/>30s timeout"]
58+
MoreBatches{"More<br/>Batches?"}
59+
end
60+
61+
%% Monitoring
62+
subgraph MONITOR["Monitoring & Notifications"]
63+
SlackSuccess["Slack Success:<br/>- Eligible count<br/>- Execution time<br/>- Transaction links"]
64+
SlackFailCircuitBreaker["Stop container sys.exit(0)<br/>Container will not restart<br/>Manual Intervention needed<br/>Send notification to team<br/>slack channel for debugging"]
65+
SlackFailRPC["Stop container sys.exit(1)<br/>Container will restart<br/>Send notification to slack"]
66+
SlackRotate["Send slack notification"]
67+
end
68+
end
69+
70+
%% External Systems - Define after Docker subgraph
71+
RPCProviders["Pool of 4 RPC providers<br/>(External Infrastructure)"]
72+
BQ["Google BigQuery<br/>Indexer Performance Data"]
73+
74+
subgraph FailureLogStorage["Data Storage<br/>(mounted volume)"]
75+
CBLog["Failure log"]
76+
end
77+
78+
subgraph HistoricalDataStorage["Data Storage<br/>(mounted volume)"]
79+
HistoricalData["Historical archive of<br/>eligible and ineligible<br/>indexers by date<br/>YYYY-MM-DD"]
80+
end
81+
82+
END_NO_RESTART["FAILURE<br/>Container Stopped<br/>No Restart<br/>Manual Intervention Required"]
83+
END_WITH_RESTART["FAILURE<br/>Container Stopped<br/>Restart Container<br/>Will retry entire loop again"]
84+
SUCCESS["SUCCESS<br/>Wait for next<br/>scheduled trigger"]
85+
86+
%% Main Flow - Start with Docker container to anchor it left
87+
Oracle -->|"Phase 1.1: Check if oracle<br/>should run"| CB
88+
CB -->|"Phase 1.2: Read log"| CBLog
89+
CBLog -->|"Phase 1.3: Return log"| CB
90+
CB -->|"Phase 1.4: Provides failure<br/>timestamps (if they exist)"| CBCheck
91+
CBCheck -->|"Phase 2:<br/>(Regular Path)<br/>No"| CacheCheck
92+
CacheCheck -->|"Phase 2.1: Check for<br/>recent cached data"| HistoricalData
93+
HistoricalData -->|"Phase 2.2: Return recent eligible indexers<br/>from eligible_indexers.csv<br/>(if they exist)"| CacheCheck
94+
CBCheck -.->|"Phase 2:<br/>(Alternative Path)<br/>Yes"| SlackFailCircuitBreaker
95+
SlackFailCircuitBreaker -.-> END_NO_RESTART
96+
97+
CacheCheck -->|"Phase 3:<br/>(Alternative Path)<br/>Yes"| Batch
98+
CacheCheck -->|"Phase 3:<br/>(Regular Path)<br/>No"| FetchData
99+
100+
FetchData -->|"Phase 3.1: Query data<br/>from BigQuery"| BQ
101+
BQ -->|"Phase 3.2: Returns metrics"| SQLQuery
102+
SQLQuery -->|"Phase 3.3: Process results"| ApplyCriteria
103+
ApplyCriteria --> FilterData
104+
FilterData -->|"Phase 3.4: Generate CSV's"| GenArtifacts
105+
GenArtifacts -->|"Phase 3.5: Save data"| HistoricalData
106+
GenArtifacts --> Batch
107+
108+
Batch -->|"Phase 4.1: For each batch"| TryRPC
109+
TryRPC -->|"Phase 4.2: Connect"| RPCProviders
110+
RPCProviders -->|"Phase 4.3:<br/>(Regular Path)<br/>RPC connection established"| BuildTx
111+
RPCProviders -.->|"Phase 4.3:<br/>(Alternative Path)<br/>RPC connection failed<br/>Multiple connection attempts<br/>Not possible to connect"| RPCError
112+
RPCError -.->|"Notify"| SlackRotate
113+
RPCError -->|"All exhausted"| SlackFailRPC
114+
SlackFailRPC --> END_WITH_RESTART
115+
RPCError -->|"Connection successful"| BuildTx
116+
117+
BuildTx --> SubmitTx
118+
SubmitTx --> WaitReceipt
119+
120+
WaitReceipt -->|"Phase 4.4: Batch confirmed"| MoreBatches
121+
122+
MoreBatches -->|"Yes<br/>Back to phase 4 loop<br/>Process next batch"| Batch
123+
MoreBatches -->|"Phase 5: No<br/>All complete"| SlackSuccess
124+
SlackSuccess --> SUCCESS
125+
126+
%% Styling
127+
classDef schedulerStyle fill:#fee2e2,stroke:#ef4444,stroke-width:3px,color:#7f1d1d
128+
classDef oracleStyle fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
129+
classDef dataStyle fill:#f0fdf4,stroke:#16a34a,stroke-width:2px,color:#14532d
130+
classDef processingStyle fill:#e0e7ff,stroke:#6366f1,stroke-width:2px,color:#312e81
131+
classDef blockchainStyle fill:#fee2e2,stroke:#ef4444,stroke-width:2px,color:#7f1d1d
132+
classDef monitorStyle fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
133+
classDef infraStyle fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#374151
134+
classDef contractStyle fill:#dbeafe,stroke:#2563eb,stroke-width:3px,color:#1e3a8a
135+
classDef decisionStyle fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
136+
classDef endStyle fill:#7f1d1d,stroke:#991b1b,stroke-width:3px,color:#fee2e2
137+
classDef endStyleOrange fill:#ea580c,stroke:#c2410c,stroke-width:3px,color:#ffedd5
138+
classDef successStyle fill:#14532d,stroke:#166534,stroke-width:3px,color:#f0fdf4
139+
140+
class Scheduler schedulerStyle
141+
class Oracle,CB oracleStyle
142+
class FetchData,SQLQuery,BQ dataStyle
143+
class ApplyCriteria,FilterData,GenArtifacts processingStyle
144+
class Batch,TryRPC,BuildTx,SubmitTx,WaitReceipt,Rotate,RPCError blockchainStyle
145+
class SlackSuccess,SlackFailCircuitBreaker,SlackFailRPC,SlackRotate monitorStyle
146+
class RPCProviders,HistoricalData,CBLog infraStyle
147+
class Contract contractStyle
148+
class CacheCheck,MoreBatches,CBCheck decisionStyle
149+
class END_NO_RESTART endStyle
150+
class END_WITH_RESTART endStyleOrange
151+
class SUCCESS successStyle
152+
153+
style DOCKER fill:#dbeafe,stroke:#2563eb,stroke-width:3px,color:#1e3a8a
154+
style CIRCUIT_BREAKER fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
155+
style PIPELINE fill:#f0fdf4,stroke:#16a34a,stroke-width:2px,color:#14532d
156+
style BIGQUERY fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d
157+
style PROCESSING fill:#e0e7ff,stroke:#6366f1,stroke-width:2px,color:#312e81
158+
style BLOCKCHAIN fill:#fee2e2,stroke:#ef4444,stroke-width:2px,color:#7f1d1d
159+
style RPC fill:#fecaca,stroke:#ef4444,stroke-width:2px,color:#7f1d1d
160+
style MONITOR fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
161+
style FailureLogStorage fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#374151
162+
style HistoricalDataStorage fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#374151
163+
```
164+
165+
---
4166

5167
## RPC Provider Failover and Circuit Breaker Logic
6168

@@ -21,24 +183,22 @@ sequenceDiagram
21183
22184
# Describe failure loop inside the blockchain_client module
23185
activate blockchain_client
24-
alt RPC Loop (for each provider)
186+
loop For each provider in pool
25187
26-
# Attempt RPC call
27-
blockchain_client->>blockchain_client: _execute_rpc_call() with provider A
28-
note right of blockchain_client: Fails after 5 retries
188+
# Attempt RPC call
189+
blockchain_client->>blockchain_client: _execute_rpc_call() with next provider
190+
note right of blockchain_client: Fails after 3 attempts
29191
30-
# Log failure
192+
# Log failure and rotate
31193
blockchain_client-->>blockchain_client: raises ConnectionError
32-
note right of blockchain_client: Catches error, logs rotation
194+
note right of blockchain_client: Catches error, rotates to next provider
33195
34-
# Retry RPC call
35-
blockchain_client->>blockchain_client: _execute_rpc_call() with provider B
36-
note right of blockchain_client: Fails after 5 retries
196+
# Send rotation notification
197+
blockchain_client->>slack_notifier: send_info_notification()
198+
note right of slack_notifier: RPC provider rotation alert
37199
38-
# Log final failure
39-
blockchain_client-->>blockchain_client: raises ConnectionError
40-
note right of blockchain_client: All providers tried and failed
41200
end
201+
note right of blockchain_client: All providers exhausted
42202
43203
# Raise error back to main_oracle oracle and exit blockchain_client module
44204
blockchain_client-->>main_oracle: raises Final ConnectionError
@@ -51,6 +211,6 @@ sequenceDiagram
51211
main_oracle->>slack_notifier: send_failure_notification()
52212
53213
# Document restart process
54-
note right of main_oracle: sys.exit(1)
55-
note right of main_oracle: Docker will restart. CircuitBreaker can halt via sys.exit(0)
214+
note right of main_oracle: sys.exit(1) triggers Docker restart
215+
note right of main_oracle: Circuit breaker uses sys.exit(0) to prevent restart
56216
```

0 commit comments

Comments
 (0)