feat(docs): add Agent Guide for indexer-core usage and integration (#113)

daithihearn · web-flow · commit 51e6bde70522 · 2026-03-12T00:31:23.000+01:00
* feat(docs): add Agent Guide for `indexer-core` usage and integration

* fix(docs): update links in AGENTS.md and CLAUDE.md for consistency
diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
@@ -1,135 +1,5 @@
 # CLAUDE.md
 
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+Use [`AGENTS.md`](../AGENTS.md) as the canonical instruction file for this repository.
 
-## Project Overview
-
-This is a Kotlin library for building blockchain indexers for VeChainThor. It provides parallel processing, dependency management, and automatic retry logic for indexing blockchain data.
-
-## Build Commands
-
-```bash
-# Build the project
-./gradlew build
-
-# Run tests
-./gradlew test
-
-# Run a specific test class
-./gradlew test --tests "org.vechain.indexer.utils.IndexerOrderUtilsTest"
-
-# Check code coverage
-./gradlew jacocoTestReport
-# Coverage report: build/reports/jacoco/test/html/index.html
-
-# Format code (ktfmt with Google style)
-./gradlew spotlessApply
-
-# Check formatting
-./gradlew spotlessCheck
-
-# Clean build artifacts
-./gradlew clean
-```
-
-## Core Architecture
-
-### Indexer Types
-
-The library provides two main indexer implementations via `IndexerFactory.build()`:
-
-1. **BlockIndexer**: Full block-by-block processing
-   - Used when `includeFullBlock()` is set or when `dependsOn()` is configured
-   - Can inspect transaction call data via `callDataClauses()`
-   - Processes reverted transactions
-   - Required for dependent indexers
-
-2. **LogsIndexer**: Fast event-based syncing
-   - Used by default when no dependencies and full block not required
-   - Fetches only event logs and transfer logs via Thor API
-   - More efficient for event-driven indexing
-   - Implements `fastSync()` to quickly catch up to finalized blocks
-
-### Dependency Management & Sequential Processing
-
-The `IndexerRunner` orchestrates multiple indexers using topological sorting (`IndexerOrderUtils.topologicalOrder()`):
-
-- All indexers placed in single group, ordered by dependencies (dependencies before dependents)
-- Indexers process same block **sequentially** within group, honoring dependency order
-- IndexerRunner uses channels to buffer blocks and coordinate processing
-- Example: If `IndexerA` depends on `IndexerB`, then `IndexerB` processes block N before `IndexerA` processes block N
-- Only single-dependency chains supported (each indexer can depend on at most one other)
-
-### Lifecycle & States
-
-Indexer states (defined in `Status` enum):
-- `NOT_INITIALISED` → `INITIALISED` → `FAST_SYNCING` → `SYNCING` → `FULLY_SYNCED`
-- `SHUT_DOWN`: Terminal state
-
-Initialization flow:
-1. `initialise()`: Determines starting block, calls `rollback()` on processor
-2. `fastSync()`: (LogsIndexer only) Catches up to finalized block using log events
-3. `processBlock()`: Main processing loop with reorg detection
-
-### Reorg Detection
-
-Reorg detection in `BlockIndexer.checkForReorg()`:
-- Compares `block.parentID` with `previousBlock.id`
-- On detection: logs error, calls `rollback()`, throws `ReorgException`
-- Only checks when `currentBlockNumber > startBlock` and `previousBlock != null`
-
-### Event Processing
-
-Event processing pipeline (`CombinedEventProcessor`):
-1. **ABI Events**: Configured via `abis()` - loads JSON ABI files
-2. **Business Events**: Configured via `businessEvents()` - custom event definitions with conditional logic
-3. **VET Transfers**: Included by default unless `excludeVetTransfers()` is called
-
-Events are decoded and returned as `IndexedEvent` objects to the `IndexerProcessor.process()` method.
-
-### IndexerProcessor Interface
-
-Implementations must provide:
-- `getLastSyncedBlock()`: Returns last successfully processed block (or null)
-- `rollback(blockNumber)`: Reverts data for specified block
-- `process(entry)`: Handles `IndexingResult.BlockResult` (full block) or `IndexingResult.LogResult` (log batch)
-
-## Code Style
-
-- **Formatting**: ktfmt with Google style, 4-space indents (enforced by Spotless)
-- **Language**: Kotlin with Java 21 target
-- **Testing**: JUnit 5, MockK for mocking, Strikt for assertions
-
-## Important Implementation Details
-
-### IndexerFactory Configuration
-
-The factory uses a builder pattern. Key methods:
-- `name()`, `thorClient()`, `processor()`: Required
-- `startBlock()`: Default is 0
-- `dependsOn()`: Forces BlockIndexer (needed for dependency coordination). Single-parent only.
-- `includeFullBlock()`: Forces BlockIndexer (enables access to gas, reverted txs)
-- `blockBatchSize()`: For LogsIndexer, controls log fetch batch size (default 100). For IndexerRunner, controls channel buffer (default 1).
-- `logFetchLimit()`: Pagination limit for Thor API calls (default 1000)
-
-### Retry Logic
-
-`IndexerRunner.retryUntilSuccess()` wraps:
-- Indexer initialization
-- Block fetching
-- Block processing
-
-On failure: logs error, waits 1 second, retries indefinitely (until success or cancellation).
-
-## Testing Notes
-
-- Mock Thor client for unit tests
-- Use `TestableLogsIndexer` pattern to test internal sync logic
-- Verify topological ordering for dependency chains in `IndexerOrderUtilsTest`
-- Test reorg scenarios by providing blocks with mismatched `parentID`
-
-## Preferences
-Be extremely concise. Sacrifice grammar for the sake of concision.
-Always prefer simple solution over complex ones.
-When unsure, ask for clarification.
-run `make format` after making code changes to ensure proper formatting.
+Do not duplicate or extend project guidance here unless Claude-specific behavior genuinely requires it.
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,144 @@
+# Agent Guide
+
+This file is the canonical instruction entry point for coding agents working with `indexer-core`.
+
+It is intentionally lean. Use it to build a correct mental model quickly, then open the linked repo docs instead of inferring behavior from scattered source files.
+
+## What This Library Is
+
+`indexer-core` is a Kotlin library for building VeChainThor indexers.
+
+At a high level it provides:
+
+- `IndexerProcessor` as the application persistence boundary
+- `IndexerFactory` as the only supported way to configure and build indexers
+- `IndexerRunner` to initialise, fast-sync when possible, coordinate dependencies, and keep indexers running through retries and reorg recovery
+- two runtime modes:
+  - `LogsIndexer` for fast log-based catch-up when you only need decoded events / transfers
+  - `BlockIndexer` when you need full block context or dependency ordering
+
+Do not ask users to construct indexers manually from implementation classes unless they are working on the library internals themselves. For normal usage, all indexers should be built with `IndexerFactory`.
+
+## Who This Guide Is For
+
+This guide is for both:
+
+- agents changing `indexer-core` itself
+- agents helping a consumer integrate `indexer-core` into another service
+
+If the task is library maintenance, preserve public behavior documented in the repo docs unless the change explicitly updates that behavior.
+
+If the task is consumer guidance, optimize for correct mode selection and integration advice before discussing internals.
+
+## Required Onboarding Path
+
+Before making claims about library behavior, read in this order:
+
+1. [`README.md`](README.md)
+2. [`docs/README.md`](docs/README.md)
+3. one targeted guide based on the task:
+   - runtime model and lifecycle: [`docs/IndexerOverview.md`](docs/IndexerOverview.md)
+   - log-based mode and fast sync: [`docs/LogsIndexerOverview.md`](docs/LogsIndexerOverview.md)
+   - ABI loading and decoded events: [`docs/EventsAndABIHandling.md`](docs/EventsAndABIHandling.md)
+   - business event design: [`docs/BusinessEvents.md`](docs/BusinessEvents.md)
+   - upgrade / compatibility questions: [`docs/MIGRATION-8.0.0.md`](docs/MIGRATION-8.0.0.md)
+
+The repo markdown docs are the source of truth. Prefer them over memory, ad hoc code reading, or external copies.
+
+## Mental Model To Keep In Mind
+
+- `IndexerProcessor` is where consumers persist progress and domain data.
+- The runtime may emit either `IndexingResult.LogResult` or `IndexingResult.BlockResult`; processors should handle both when relevant to the configuration.
+- Startup rollback is intentional. It is a data-integrity feature, not a bug.
+- Reorg recovery is part of the runtime contract. Consumers are expected to implement deterministic rollback behavior.
+- Dependencies affect execution semantics, not just throughput. Adding `dependsOn(...)` changes how the runtime must coordinate indexers.
+
+## Mode Selection Checklist
+
+Use this checklist before recommending or editing indexer configuration.
+
+Choose the default factory-built log mode when:
+
+- the consumer only needs decoded ABI events, business events, or VET transfers
+- fastest catch-up is the priority
+- there is no same-block dependency on another indexer
+
+Choose `includeFullBlock()` when the consumer needs:
+
+- full block contents
+- reverted transaction visibility
+- gas / fee metadata from full block processing
+- clause inspection results from `callDataClauses(...)`
+
+Choose `dependsOn(...)` when:
+
+- one indexer must finish a block before another processes that same block
+
+Important:
+
+- `LogsIndexer` and `BlockIndexer` are not interchangeable modes.
+- `dependsOn(...)` forces block-based execution semantics.
+- `includeFullBlock()` forces block-based execution semantics.
+
+Choose business events when:
+
+- downstream consumers care about higher-level actions rather than every raw event
+
+Choose raw ABI events when:
+
+- the consumer needs each decoded event individually
+- there is no stable semantic grouping worth encoding as a business event
+
+## Guardrails
+
+- Build indexers through `IndexerFactory`, not by manually wiring implementation classes in application code.
+- Do not describe `LogsIndexer` and `BlockIndexer` as equivalent choices with different performance profiles. They expose different runtime behavior and different data.
+- Do not treat startup rollback as suspicious behavior. It is part of the library’s safety model.
+- Do not rely on stale documentation copies. The repo docs are authoritative.
+- Do not present internal implementation details as stable public API unless they are explicitly documented as such.
+
+## Common Agent Tasks
+
+Optimize guidance for these common tasks:
+
+- explaining how to integrate `indexer-core` into another service
+- changing the library itself
+- debugging behavior differences between `LogsIndexer` and `BlockIndexer`
+- designing ABI-driven or business-event-driven indexing setups
+
+Documentation updates matter, but they are secondary to preserving correct runtime behavior and public guidance.
+
+## Verification Expectations
+
+When changing this library:
+
+- run targeted tests for the touched behavior as a minimum
+- run broader `./gradlew test` when the change is cross-cutting or affects shared runtime behavior
+- run formatting checks or formatting fixes when Kotlin code changes
+
+Minimum standard before claiming completion:
+
+- the changed behavior is covered by tests or an existing test path was exercised
+- any affected public guidance remains consistent with the repo docs
+- the response states clearly if full verification was not run
+
+Useful commands:
+
+```bash
+./gradlew test
+./gradlew test --tests "org.vechain.indexer.SomeTest"
+./gradlew spotlessCheck
+./gradlew spotlessApply
+```
+
+## When Working From Source
+
+The codebase is useful for confirmation, but agents should not need to reverse-engineer the library from source just to understand its purpose.
+
+Read source after the docs when you need to:
+
+- confirm an implementation detail
+- debug a behavioral discrepancy
+- update internals while preserving the documented contract
+
+If source and docs appear to disagree, call that out explicitly instead of silently choosing one.