Gemini Architecture

Introduction

Gemini is designed around the concept of Jobs. Each job performs a specific function, such as applying mutations or validating data consistency between clusters.

Jobs

Gemini has two primary job types:

MutationJob (pkg/jobs/mutation.go): Applies mutations (INSERT, DELETE) to both clusters. Each mutation worker continuously generates and executes statements until the stop condition is met. The job tracks success/failure metrics and respects error budgets.
ValidationJob (pkg/jobs/validation.go): Reads rows from both clusters and compares them. When differences are detected, an error is raised. Validation supports configurable retry attempts with stabilization delays to handle eventual consistency.

Note: UPDATE statements are currently disabled and converted to INSERTs internally. DDL operations (ALTER) are also temporarily disabled pending v2 stabilization.

Modes of Operation

Mode	Description
`mixed`	(Default) Runs both mutation and validation workers concurrently
`write`	Only mutation workers, no validation
`read`	Only validation workers on existing data

Concurrency Model

Gemini uses separate concurrency settings for mutations and reads:

--concurrency / --mutation-concurrency: Number of mutation workers
--read-concurrency: Number of validation workers

In mixed mode, both mutation and validation workers run concurrently. Each worker operates on a shared partition pool but uses distribution functions to select partitions, avoiding direct conflicts.

Workers are managed using Go's errgroup for coordinated startup and shutdown. When the error budget is reached or duration expires, a soft stop is signaled and workers gracefully terminate.

Partitions System

The partition system (pkg/partitions/) is the core component that manages partition key generation and tracking.

Partitions Structure

type Partitions struct {
    table   *typedef.Table           // Table definition
    idxFunc distributions.DistributionFunc  // Distribution for partition selection
    deleted *deletedPartitions       // Tracks recently deleted partitions
    r       random.GoRoutineSafeRandom      // Thread-safe random source
    config  typedef.PartitionRangeConfig    // Value range configuration
    parts   partitions               // Slice of partition values
}

Key Operations

Method	Description
`New()`	Creates a partition pool with pre-generated partition keys
`Get(idx)`	Returns partition values at the given index
`Next()`	Returns the next partition using the distribution function
`Extend()`	Adds a new partition to the pool
`Replace(idx)`	Generates new values for a partition, tracking the old ones as deleted
`Deleted()`	Returns a channel of recently deleted partition keys for validation

Partition Key Generation

Partition keys are generated by delegating to the column types defined in the table schema:

func generateValue(r utils.Random, table *typedef.Table, config typedef.RangeConfig) []any {
    values := make([]any, 0, table.PartitionKeys.LenValues())
    for _, pk := range table.PartitionKeys {
        values = pk.Type.GenValueOut(values, r, config)
    }
    return values
}

Each partition key column type is responsible for generating its own random values within the configured ranges.

Deleted Partitions Tracking

When a partition is deleted, its keys are tracked in a time-bucketed heap structure (deletedPartitions). This allows validation workers to verify that deleted data is actually removed from both clusters after a configurable delay.

Key features:

Min-heap sorted by "ready at" time for efficient processing
Background goroutine emits ready partitions via channel
Configurable time buckets control when deleted partitions become eligible for validation
Memory optimized with pre-allocated backing arrays

Distribution Functions

Partition selection supports multiple distributions (pkg/distributions/):

Distribution	Description
`uniform`	Equal probability for all partitions
`zipf`	Power-law distribution - some partitions accessed more frequently
`normal`	Gaussian distribution around a mean
`lognormal`	Log-normal distribution

Statement Generation

The statements.Generator (pkg/statements/) creates CQL statements using partition values:

type Generator struct {
    generator        partitions.Interface  // Partition value source
    random           utils.Random
    table            *typedef.Table
    ratioController  *RatioController     // Controls statement type ratios
    // ...
}

Statement types include:

SELECT: Single partition, multiple partition, clustering range, index queries
INSERT: Regular and JSON format
DELETE: Whole partition, single row, single column, multiple partitions

The RatioController manages the probability distribution of different statement types based on user configuration.

Data Structures

Schema (`pkg/typedef/schema.go`)

Top-level structure containing the keyspace definition and list of tables:

type Schema struct {
    Keyspace Keyspace     `json:"keyspace"`
    Tables   []*Table     `json:"tables"`
    Config   SchemaConfig `json:"-"`
}

Table (`pkg/typedef/table.go`)

Represents a CQL table with all its components:

type Table struct {
    Name              string
    PartitionKeys     Columns
    ClusteringKeys    Columns
    Columns           Columns
    Indexes           []IndexDef
    MaterializedViews []MaterializedView
    KnownIssues       KnownIssues
    TableOptions      []string
}

Columns (`pkg/typedef/columns.go`)

A slice of ColumnDef representing a set of columns (partition keys, clustering keys, or regular columns):

type Columns []ColumnDef

type ColumnDef struct {
    Name string
    Type Type
}

Types (`pkg/typedef/types.go`)

Types are responsible for generating random values. There are two categories:

Simple Types: int, bigint, text, boolean, decimal, uuid, timestamp, etc.
Complex Types: MapType, ListType, SetType, TupleType, UDTType - composed of simple types

Each type implements value generation:

type Type interface {
    GenValue(r Random, config RangeConfig) []any
    GenValueOut(out []any, r Random, config RangeConfig) []any
    LenValue() int
    // ...
}

Package Overview

Package	Purpose
`pkg/jobs`	Job definitions (mutation, validation)
`pkg/partitions`	Partition key management and deleted tracking
`pkg/statements`	CQL statement generation
`pkg/typedef`	Core type definitions (Schema, Table, Columns, Types)
`pkg/schema`	Schema loading and generation
`pkg/distributions`	Partition selection distributions
`pkg/store`	Database interaction layer (oracle + test clusters)
`pkg/status`	Global status tracking (ops counters, errors)
`pkg/stop`	Graceful shutdown coordination
`pkg/metrics`	Prometheus metrics
`pkg/stmtlogger`	Statement logging for debugging

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini Architecture

Introduction

Jobs

Modes of Operation

Concurrency Model

Partitions System

Partitions Structure

Key Operations

Partition Key Generation

Deleted Partitions Tracking

Distribution Functions

Statement Generation

Data Structures

Schema (`pkg/typedef/schema.go`)

Table (`pkg/typedef/table.go`)

Columns (`pkg/typedef/columns.go`)

Types (`pkg/typedef/types.go`)

Package Overview

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Gemini Architecture

Introduction

Jobs

Modes of Operation

Concurrency Model

Partitions System

Partitions Structure

Key Operations

Partition Key Generation

Deleted Partitions Tracking

Distribution Functions

Statement Generation

Data Structures

Schema (pkg/typedef/schema.go)

Table (pkg/typedef/table.go)

Columns (pkg/typedef/columns.go)

Types (pkg/typedef/types.go)

Package Overview

Schema (`pkg/typedef/schema.go`)

Table (`pkg/typedef/table.go`)

Columns (`pkg/typedef/columns.go`)

Types (`pkg/typedef/types.go`)