Gemini can either generate a random schema automatically or use a custom schema defined in a JSON file. This guide covers both approaches.
UPDATE statements are currently disabled in Gemini. When the statement ratio includes updates, they are internally converted to INSERT statements. Full UPDATE support will return in v2.1.0 once Gemini v2 is fully stable.
To work around this, focus on INSERT and DELETE ratios:
--statement-ratios='{"mutation":{"insert":0.95,"update":0.0,"delete":0.05}}'By default, Gemini generates a random schema based on CLI parameters:
./gemini \
--max-tables=2 \
--max-partition-keys=4 \
--min-partition-keys=1 \
--max-clustering-keys=3 \
--min-clustering-keys=0 \
--max-columns=10 \
--min-columns=3 \
--test-cluster=... \
--oracle-cluster=...| Parameter | Default | Description |
|---|---|---|
--max-tables |
1 | Maximum number of tables to generate |
--max-partition-keys |
8 | Maximum partition key columns |
--min-partition-keys |
1 | Minimum partition key columns |
--max-clustering-keys |
5 | Maximum clustering key columns |
--min-clustering-keys |
0 | Minimum clustering key columns |
--max-columns |
12 | Maximum regular columns |
--min-columns |
5 | Minimum regular columns |
--dataset-size |
large | Size preset: small or large |
--cql-features |
normal | Feature set: basic, normal, or all |
- basic - Simple types only (int, text, boolean, etc.)
- normal - Adds collections (list, set, map) and tuples
- all - Adds UDTs (User-Defined Types) and complex nested types
Use --schema-seed to generate the same schema across runs:
# First run - note the schema seed from output
./gemini --test-cluster=... --oracle-cluster=...
# Later - reproduce exact schema
./gemini --schema-seed=12345 --test-cluster=... --oracle-cluster=...For precise control, provide a JSON schema file with --schema:
./gemini --schema=my_schema.json --test-cluster=... --oracle-cluster=...{
"keyspace": {
"name": "my_keyspace",
"replication": {
"class": "NetworkTopologyStrategy",
"replication_factor": 3
},
"oracle_replication": {
"class": "NetworkTopologyStrategy",
"replication_factor": 1
}
},
"tables": [
{
"name": "users",
"partition_keys": [
{"name": "user_id", "type": "uuid"}
],
"clustering_keys": [
{"name": "created_at", "type": "timestamp"}
],
"columns": [
{"name": "name", "type": "text"},
{"name": "age", "type": "int"}
]
}
]
}Simple types:
{"name": "user_id", "type": "int"}Set collection:
{
"name": "tags",
"type": {
"complex_type": "set",
"value_type": "text",
"frozen": false
}
}Map type:
{
"name": "metadata",
"type": {
"complex_type": "map",
"key_type": "text",
"value_type": "int",
"frozen": false
}
}List type:
{
"name": "scores",
"type": {
"complex_type": "list",
"value_type": "double",
"frozen": true
}
}Tuple type:
{
"name": "coordinates",
"type": {
"complex_type": "tuple",
"value_types": ["double", "double"],
"frozen": false
}
}UDT (User-Defined Type):
{
"name": "address",
"type": {
"complex_type": "udt",
"type_name": "address_type",
"frozen": true,
"value_types": {
"street": "text",
"city": "text",
"zip": "int"
}
}
}Gemini supports CQL data types that are compatible with the Go driver (gocql). The following tables describe what's supported and what isn't.
| Type | Description | Partition Key | Clustering Key | Map Key |
|---|---|---|---|---|
ascii |
ASCII string | ✓ | ✓ | ✓ |
bigint |
64-bit signed integer | ✓ | ✓ | ✓ |
boolean |
True/false | ✓ | ✓ | ✓ |
date |
Date without time | ✓ | ✓ | ✓ |
double |
64-bit floating point | ✓ | ✓ | ✓ |
float |
32-bit floating point | ✓ | ✓ | ✓ |
inet |
IP address | ✓ | ✓ | ✓ |
int |
32-bit signed integer | ✓ | ✓ | ✓ |
smallint |
16-bit signed integer | ✓ | ✓ | ✓ |
text |
UTF-8 string | ✓ | ✓ | ✓ |
time |
Time without date | ✓ | ✓ | ✓ |
timestamp |
Date and time | ✓ | ✓ | ✓ |
timeuuid |
Type 1 UUID (time-based) | ✓ | ✓ | ✓ |
tinyint |
8-bit signed integer | ✓ | ✓ | ✓ |
uuid |
UUID | ✓ | ✓ | ✓ |
varchar |
UTF-8 string (alias for text) | ✓ | ✓ | ✓ |
| Type | Description | Partition Key | Clustering Key | Map Key | Reason |
|---|---|---|---|---|---|
blob |
Binary data | ✓ | ✓ | ✗ | Go maps cannot use byte slices as keys (slices are not comparable) |
decimal |
Variable-precision decimal | ✓ | ✓ | ✗ | Uses *inf.Dec pointer type which is not comparable in Go maps |
duration |
Time duration | ✗ | ✗ | ✗ | CQL restriction: duration cannot be used in primary keys or as map keys |
varint |
Arbitrary-precision integer | ✓ | ✓ | ✗ | Uses *big.Int pointer type which is not comparable in Go maps |
| Type | Description | As Column | As Partition Key |
|---|---|---|---|
list<T> |
Ordered collection | ✓ | Only if frozen |
set<T> |
Unique unordered collection | ✓ | Only if frozen |
map<K,V> |
Key-value pairs | ✓ | Only if frozen |
tuple<T1,T2,...> |
Fixed-length typed sequence | ✓ | Only if frozen |
udt |
User-defined type | ✓ | Only if frozen |
counter |
Distributed counter | ✓ | ✗ |
The following ScyllaDB 2025.1 types are NOT supported by Gemini:
| Type | Reason |
|---|---|
vector<T, N> |
Vector type for ML/AI workloads - not implemented in gocql driver |
frozen<T> (standalone) |
Frozen is a modifier, not a standalone type |
Gemini uses Go maps internally to track partition keys and compare results between oracle and test clusters. Go maps require keys to be "comparable" types. The following types cannot be used as map keys:
-
blob- Represented as[]byte(byte slice) in Go. Slices are not comparable because they are reference types with pointer semantics. -
decimal- Represented as*inf.Dec(pointer to Decimal). Pointers compare by address, not value, making them unsuitable for map keys. -
varint- Represented as*big.Int(pointer to arbitrary-precision integer). Same pointer comparison issue as decimal. -
duration- CQL itself prohibits duration in primary keys and map keys due to its complex internal representation (months, days, nanoseconds).
Gemini can use these types for partition keys:
- All simple types except
duration - Frozen complex types (frozen list, frozen set, frozen map, frozen tuple, frozen UDT)
Gemini can use these types for clustering keys:
- All simple types except
duration blobis allowed (unlike partition keys in some configurations)
-
Go Language Limitations: Go's map type requires comparable keys. Types backed by pointers (
*big.Int,*inf.Dec) or slices ([]byte) cannot be directly compared. -
CQL Restrictions: Some restrictions come from CQL itself -
durationcannot be part of a primary key because it lacks a natural total ordering. -
Driver Limitations: The gocql driver maps CQL types to Go types, inheriting Go's type system constraints.
For multiple datacenters:
{
"class": "NetworkTopologyStrategy",
"datacenter1": 3,
"datacenter2": 2
}# NetworkTopologyStrategy
--replication-strategy=network
# Custom (JSON inline)
--replication-strategy="{'class':'NetworkTopologyStrategy','dc1':3, 'replication_factor': 3}"
--oracle-replication-strategy="{'class': 'NetworkTopologyStrategy', 'dc1': 1, 'replication_factory': 1}"{
"keyspace": {
"name": "ecommerce",
"replication": {
"class": "NetworkTopologyStrategy",
"datacenter1": 3
},
"oracle_replication": {
"class": "SimpleStrategy",
"replication_factor": 1
}
},
"tables": [
{
"name": "orders",
"partition_keys": [
{"name": "customer_id", "type": "uuid"}
],
"clustering_keys": [
{"name": "order_date", "type": "timestamp"},
{"name": "order_id", "type": "timeuuid"}
],
"columns": [
{"name": "total", "type": "decimal"},
{"name": "status", "type": "text"},
{
"name": "items",
"type": {
"complex_type": "list",
"value_type": "text",
"frozen": false
}
}
]
}
]
}