Gemini is designed around the concept of Jobs. Each job performs a specific function, such as applying mutations or validating data consistency between clusters.
Gemini has two primary job types:
-
MutationJob (
pkg/jobs/mutation.go): Applies mutations (INSERT, DELETE) to both clusters. Each mutation worker continuously generates and executes statements until the stop condition is met. The job tracks success/failure metrics and respects error budgets. -
ValidationJob (
pkg/jobs/validation.go): Reads rows from both clusters and compares them. When differences are detected, an error is raised. Validation supports configurable retry attempts with stabilization delays to handle eventual consistency.
Note: UPDATE statements are currently disabled and converted to INSERTs internally. DDL operations (ALTER) are also temporarily disabled pending v2 stabilization.
| Mode | Description |
|---|---|
mixed |
(Default) Runs both mutation and validation workers concurrently |
write |
Only mutation workers, no validation |
read |
Only validation workers on existing data |
Gemini uses separate concurrency settings for mutations and reads:
--concurrency/--mutation-concurrency: Number of mutation workers--read-concurrency: Number of validation workers
In mixed mode, both mutation and validation workers run concurrently. Each worker operates on a shared partition pool but uses distribution functions to select partitions, avoiding direct conflicts.
Workers are managed using Go's errgroup for coordinated startup and shutdown. When the error budget is reached or duration expires, a soft stop is signaled and workers gracefully terminate.
The partition system (pkg/partitions/) is the core component that manages partition key generation and tracking.
type Partitions struct {
table *typedef.Table // Table definition
idxFunc distributions.DistributionFunc // Distribution for partition selection
deleted *deletedPartitions // Tracks recently deleted partitions
r random.GoRoutineSafeRandom // Thread-safe random source
config typedef.PartitionRangeConfig // Value range configuration
parts partitions // Slice of partition values
}| Method | Description |
|---|---|
New() |
Creates a partition pool with pre-generated partition keys |
Get(idx) |
Returns partition values at the given index |
Next() |
Returns the next partition using the distribution function |
Extend() |
Adds a new partition to the pool |
Replace(idx) |
Generates new values for a partition, tracking the old ones as deleted |
Deleted() |
Returns a channel of recently deleted partition keys for validation |
Partition keys are generated by delegating to the column types defined in the table schema:
func generateValue(r utils.Random, table *typedef.Table, config typedef.RangeConfig) []any {
values := make([]any, 0, table.PartitionKeys.LenValues())
for _, pk := range table.PartitionKeys {
values = pk.Type.GenValueOut(values, r, config)
}
return values
}Each partition key column type is responsible for generating its own random values within the configured ranges.
When a partition is deleted, its keys are tracked in a time-bucketed heap structure (deletedPartitions). This allows validation workers to verify that deleted data is actually removed from both clusters after a configurable delay.
Key features:
- Min-heap sorted by "ready at" time for efficient processing
- Background goroutine emits ready partitions via channel
- Configurable time buckets control when deleted partitions become eligible for validation
- Memory optimized with pre-allocated backing arrays
Partition selection supports multiple distributions (pkg/distributions/):
| Distribution | Description |
|---|---|
uniform |
Equal probability for all partitions |
zipf |
Power-law distribution - some partitions accessed more frequently |
normal |
Gaussian distribution around a mean |
lognormal |
Log-normal distribution |
The statements.Generator (pkg/statements/) creates CQL statements using partition values:
type Generator struct {
generator partitions.Interface // Partition value source
random utils.Random
table *typedef.Table
ratioController *RatioController // Controls statement type ratios
// ...
}Statement types include:
- SELECT: Single partition, multiple partition, clustering range, index queries
- INSERT: Regular and JSON format
- DELETE: Whole partition, single row, single column, multiple partitions
The RatioController manages the probability distribution of different statement types based on user configuration.
Top-level structure containing the keyspace definition and list of tables:
type Schema struct {
Keyspace Keyspace `json:"keyspace"`
Tables []*Table `json:"tables"`
Config SchemaConfig `json:"-"`
}Represents a CQL table with all its components:
type Table struct {
Name string
PartitionKeys Columns
ClusteringKeys Columns
Columns Columns
Indexes []IndexDef
MaterializedViews []MaterializedView
KnownIssues KnownIssues
TableOptions []string
}A slice of ColumnDef representing a set of columns (partition keys, clustering keys, or regular columns):
type Columns []ColumnDef
type ColumnDef struct {
Name string
Type Type
}Types are responsible for generating random values. There are two categories:
- Simple Types:
int,bigint,text,boolean,decimal,uuid,timestamp, etc. - Complex Types:
MapType,ListType,SetType,TupleType,UDTType- composed of simple types
Each type implements value generation:
type Type interface {
GenValue(r Random, config RangeConfig) []any
GenValueOut(out []any, r Random, config RangeConfig) []any
LenValue() int
// ...
}| Package | Purpose |
|---|---|
pkg/jobs |
Job definitions (mutation, validation) |
pkg/partitions |
Partition key management and deleted tracking |
pkg/statements |
CQL statement generation |
pkg/typedef |
Core type definitions (Schema, Table, Columns, Types) |
pkg/schema |
Schema loading and generation |
pkg/distributions |
Partition selection distributions |
pkg/store |
Database interaction layer (oracle + test clusters) |
pkg/status |
Global status tracking (ops counters, errors) |
pkg/stop |
Graceful shutdown coordination |
pkg/metrics |
Prometheus metrics |
pkg/stmtlogger |
Statement logging for debugging |