Skip to content

Latest commit

 

History

History
238 lines (181 loc) · 9.16 KB

File metadata and controls

238 lines (181 loc) · 9.16 KB

Indexer Architecture

godex implements a high-performance producer-consumer pipeline with concurrent fetching, ordered processing, and fault-tolerant storage designed for production blockchain indexing.

System Overview

flowchart LR
  subgraph SDKCore["SDK Core"]
    subgraph Processor["Processor (per chain)"]
      F["Fetchers\n- concurrent RPC\n- batch requests\n- rate limiting"]
      A["Arbiter\n- ordered processing\n- reorg detection\n- LRU hash cache"]
      DEC["Decoder\n- ABI-based\n- event transformation"]
      RPC["RPC Client\n(HTTP + retry)"]
      F -->|"GetLogs/GetBlocks"| RPC
    end

    SINK_IF["Sink interface\n(Store, Rollback, LoadCursor)"]
  end

  subgraph Adapters["Adapters / Implementations"]
    PSINK["PostgresSink\n- atomic storage\n- tx rollback\n- cursor persistence"]
    METRICS["Metrics (Prometheus)\n- counters/gauges/histograms"]
  end

  subgraph UserApp["User Application"]
    APP["Application\n- configures chains\n- provides decoder\n- handles events"]
  end

  %% Data flow
  APP -->|"NewProcessor(metrics, sink)"| Processor
  APP -->|"AddChain(chain, opts, decoder)"| Processor
  F -->|"raw logs + timestamps"| A
  A -->|"decoded events"| DEC
  DEC -->|"structured events"| SINK_IF
  SINK_IF --> PSINK

  %% Metrics flow
  F -->|"ObservedBlockFetchDuration"| METRICS
  A -->|"IncReorgs"| METRICS
  PSINK -->|"IncSinkWrites/Errors\nObservedSinkWriteDuration\nSetIndexedHeight"| METRICS

  %% External
  RPC ---|"JSON-RPC calls"| CHAIN["Blockchain node(s)"]
Loading

Core Components

Component Responsibility Implementation
Processor Orchestrates multi-chain indexing lifecycle with error isolation Concurrent per-chain processing with shared resources
Fetchers Concurrent RPC workers fetching logs and timestamps Rate-limited batch requests with individual timeouts and retry logic
Arbiter Maintains block order and coordinates processing pipeline LRU cache for reorg detection, bounded buffering with context cancellation
Decoder Router Intelligent event routing to appropriate decoders Match-based routing with support for multiple contract types
Sink Persistent storage with atomic rollback support Transactional writes with reorg recovery and cursor management

Processing Flow

  1. Initialization: Load cursor from sink or use StartBlock configuration
  2. Head Determination: Fetch latest block height via RPC.Head() with retry logic
  3. Range Planning: Divide work into windows of RangeSize blocks
  4. Concurrent Fetching: FetcherConcurrency workers process ranges in parallel
  5. Ordered Processing: Arbiter processes fetch results sequentially for reorg safety
  6. Event Decoding: Router routes logs to appropriate decoders
  7. Storage: Decoded events stored atomically via sink with transaction rollback support

Core Interfaces

Sink Interface

Persistent storage abstraction with transactional rollback support.

type Sink interface {
    Store(ctx context.Context, events []types.Event) error
    Rollback(ctx context.Context, chainId string, toBlock uint64, blockHash string) error
    LoadCursor(ctx context.Context, chainId string) (blockNum uint64, blockHash string, err error)
    UpdateCursor(ctx context.Context, chainId string, newBlock uint64, blockHash string) error
}

Decoder Interface

Event transformation with pluggable ABI support.

type Decoder interface {
    Decode(name string, chainId string, log types.Log) (*types.Event, error)
    DecodeBatch(logs []types.Log) (*[]types.Event, error)
}

DecoderRouter Interface

Intelligent routing of logs to appropriate decoders based on configurable conditions.

type DecoderRouter struct {
    routes []DecoderRoute
}

func NewDecoderRouter() *DecoderRouter
func (r *DecoderRouter) Register(match MatchFunc, abiName string, dec Decoder) *DecoderRouter
func (r *DecoderRouter) Decode(chainId string, log types.Log) (*types.Event, error)

RPC Interface

Blockchain node communication with batching and rate limiting.

type RPC interface {
    Head(ctx context.Context) (string, error)
    GetBlock(ctx context.Context, blockNumber string) (types.Block, error)
    GetBlocks(ctx context.Context, blockNumbers []string) (map[string]types.Block, error)
    GetLogs(ctx context.Context, filter types.Filter) ([]types.Log, error)
    GetBlockReceipts(ctx context.Context, blockNumber string) ([]types.Receipt, error)
}

Metrics Interface

Observability and monitoring interface.

type Metrics interface {
    IncBlocksProcessed(chainId string, n uint64)
    ObservedBlockLag(chainId string, lag uint64)
    ObservedBlockFetchDuration(chainId string, d time.Duration, success bool)
    SetIndexedHeight(chainId string, height uint64)
    IncSinkWrites(chainId string, n uint64)
    IncSinkErrors(chainId string)
    ObservedSinkWriteDuration(chainId string, d time.Duration, success bool)
    SetProcessorConcurrency(chainId string, n uint64)
    IncReorgs(chainId string)
}

Concurrency Model

Producer-Consumer Pattern

Producer Layer (Fetchers):

  • Concurrent workers fetch logs and timestamps using batch RPC requests
  • Rate-limited and retry-enabled communication with blockchain nodes
  • Individual context timeouts prevent indefinite blocking
  • Bounded buffering provides natural backpressure

Consumer Layer (Arbiter):

  • Single-threaded coordinator ensures in-order processing for reorg safety
  • LRU cache maintains block hash history for efficient reorg detection
  • Direct integration with decoder router and sink for atomic event processing
  • Context-aware processing with graceful cancellation support

Multi-Chain Processing

Chain Isolation:

  • Each chain maintains independent state and cursor tracking
  • Individual chain failures do not affect other chains
  • Shared resources (decoder, sink) accessed safely via appropriate synchronization

Configuration Flexibility:

  • Different options per chain (range size, concurrency, confirmation depth)
  • Chain-specific retry configurations and rate limits
  • Independent progress tracking and metrics collection

Reorganization Handling

Blockchain reorganizations are automatically detected and resolved with minimal data loss and efficient recovery.

Detection Mechanism:

  1. Maintains LRU cache of processed block hashes (BlockHashCache)
  2. Verifies parent hash continuity during sequential window processing
  3. Detects divergence when block.ParentHash != cachedHash[block.Number-1]

Recovery Process:

  1. Ancestor Search: Binary search backward through cached hashes to find common ancestor
  2. Sink Rollback: Call sink.Rollback(chainId, ancestor, blockHash) to remove orphaned events atomically
  3. State Reset: Update cursor to ancestor block and clear future hash cache entries
  4. Resume Processing: Restart from ancestor + 1 with fresh batch processing

Performance Characteristics:

  • O(1) hash lookups via LRU cache
  • Bounded memory usage with configurable cache size
  • Minimal RPC overhead (only fetches headers during reorg detection)

Error Handling Architecture

Error Classification

  • Transient Errors: Network timeouts, RPC rate limits - automatic retry with exponential backoff
  • Permanent Errors: Invalid configuration, authentication failures - immediate chain termination
  • Reorg Errors: Blockchain reorganizations - graceful rollback and recovery from ancestor
  • Non-Recoverable Errors: Corrupted data, schema mismatches - chain isolation and logging

Failure Isolation

  • Chain Independence: Individual chain failures do not affect concurrent chain processing
  • Resource Cleanup: Context cancellation ensures prompt cleanup of goroutines and connections
  • Graceful Degradation: Failed chains log errors while others continue processing
  • Atomic Operations: Sink operations use transactions with automatic rollback on failures

Performance Architecture

Batch Processing

Events are processed in batches rather than individually:

  • Reduces transaction overhead
  • Enables bulk operations (COPY mode in PostgreSQL)
  • Better throughput

Connection Pooling

The PostgreSQL adapter uses pgxpool for connection management:

  • Connection reuse reduces overhead
  • Automatic connection lifecycle management
  • Configurable pool size based on workload

Memory Management

  • Bounded Caching: LRU block hash cache prevents unbounded memory growth
  • Window Processing: Natural backpressure limits concurrent memory usage
  • Channel Buffering: Sized channels provide backpressure without excessive queuing
  • Resource Cleanup: Context cancellation ensures prompt resource release

Design Principles

  1. Atomicity: All operations are transactional
  2. Pluggability: Interface-based design allows custom implementations
  3. Performance: Adaptive strategies optimize for different workloads
  4. Consistency: Cursors and events always consistent
  5. Extensibility: Handler pattern and migration system enable customization

For detailed documentation on specific components, see: