Troubleshooting Guide

Common issues and how to resolve them.

Connection Issues

Cannot connect to cluster

Symptoms:

failed to connect to cluster: dial tcp: connection refused

Solutions:

Verify cluster is running: cqlsh <IP>
Check firewall rules (port 9042)
For Docker, use host network or correct IP
Increase connect timeout: --connect-timeout=60s

Authentication failed

Symptoms:

authentication failed: invalid credentials

Solutions:

Verify username/password

Use correct flags:

--test-username=user --test-password=pass
--oracle-username=user --oracle-password=pass

Runtime Errors

Timeouts during test

Symptoms:

request timeout: operation timed out

Solutions:

Increase timeout: --request-timeout=60s
Reduce concurrency: --concurrency=5
Check cluster load and resources
Verify network latency between Gemini and clusters

Out of memory

Symptoms:

runtime: out of memory

Solutions:

Reduce partition count: --partition-count=500000
Use smaller dataset: --dataset-size=small
Reduce concurrency: --concurrency=5
Reduce the number of tables: --max-tables=1
Simplify schema: fewer partition keys and clustering keys use less memory

Too many open files

Symptoms:

too many open files

Solutions:

Increase ulimit: ulimit -n 65535
Reduce concurrency
Check for connection leaks in cluster

Validation Errors

False positives during startup

Symptoms: Errors during the first few seconds of the test.

Solutions:

Add warmup period: --warmup=2m
Ensure clusters are fully synchronized before testing
Check for ongoing compactions or repairs

Consistency errors

Symptoms: Intermittent mismatches that resolve on retry.

Solutions:

Use stronger consistency: --consistency=ALL
Increase retry attempts: --max-mutation-retries=20
Add delay between mutations: --minimum-delay=100ms
Check cluster replication status

Row count differences

Symptoms:

oracle returned 5 rows, test returned 3 rows

Causes:

Failed mutations on one cluster
Replication lag
Data corruption

Investigation:

Check statement logs for failed operations
Query both clusters manually
See Investigation Guide

Schema Issues

Schema file parsing error

Symptoms:

cannot parse schema file: invalid JSON

Solutions:

Validate JSON syntax
Check for trailing commas
Ensure all required fields are present
See Schema Guide for format

Unsupported type

Symptoms:

unsupported column type

Solutions:

Check CQL feature level: --cql-features=all
Verify type is supported by your Scylla version
Use simple types for partition keys

Performance Issues

Slow throughput

Symptoms: Low operations per second.

Solutions:

Increase concurrency: --concurrency=50
Increase IO worker pool: --io-worker-pool=256
Use token-aware policy (default)
Check cluster resource utilization
Simplify schema - some schema choices cause significant overhead:
- Reduce partition key count (--max-partition-keys=2)
- Reduce clustering key count (--max-clustering-keys=2)
- Avoid large column types (blobs, large text)
- Use --cql-features=basic to avoid expensive collection types
Add more power to Gemini runner - Gemini itself can become CPU-bound with high concurrency

High latency

Symptoms: High response times from clusters.

Solution: Reduce batch sizes - Large batches increase latency. Gemini doesn't have a direct batch size flag, but you can reduce the amount of data per operation by:

Using simpler column types (avoid large blobs/text)
Reducing the number of columns per table (--max-columns=5)
Using smaller partition sizes

Troubleshooting steps:

Check network latency between Gemini and clusters
Monitor cluster metrics for overload (CPU, memory, disk I/O)
Use local datacenter if multi-DC: --host-selection-policy=token-aware

Statement Logger Issues

Large log files

Solutions:

Enable compression: --statement-log-file-compression=gzip
Use shorter test duration
Only error context is written to files (not all statements)

Cannot read compressed logs

# For gzip
zcat test.json.gz | jq '.'

# For zstd
zstd -d -c test.json.zst | jq '.'

Missing statement logs

Causes:

Logs only created when errors occur
File path not writable

Solutions:

Ensure directory exists and is writable
Statement logs contain error context, not all statements

Query ScyllaDB logs table directly:

cqlsh -e "SELECT * FROM ks_logs.table1_statements LIMIT 10;"

Docker Issues

Container cannot reach clusters

Solutions:

Use host network:

docker run --network=host scylladb/gemini:latest ...

Use container IP addresses
Ensure clusters are accessible from Docker network

Logs not persisted

Solutions: Mount a volume:

docker run -v $(pwd)/logs:/logs scylladb/gemini:latest \
  --test-statement-log-file=/logs/test.json \
  ...

Getting Help

Check logs: cat gemini.log | jq 'select(.level == "error")'
Enable debug logging: --level=debug
Reproduce with specific seed: --seed=... --schema-seed=...
File an issue: https://github.com/scylladb/gemini/issues

Include:

Gemini version: ./gemini --version
Command used
Error message
Seed values from output
Scylla/Cassandra versions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubleshooting Guide

Connection Issues

Cannot connect to cluster

Authentication failed

Runtime Errors

Timeouts during test

Out of memory

Too many open files

Validation Errors

False positives during startup

Consistency errors

Row count differences

Schema Issues

Schema file parsing error

Unsupported type

Performance Issues

Slow throughput

High latency

Statement Logger Issues

Large log files

Cannot read compressed logs

Missing statement logs

Docker Issues

Container cannot reach clusters

Logs not persisted

Getting Help

FilesExpand file tree

troubleshooting.md

Latest commit

History

troubleshooting.md

File metadata and controls

Troubleshooting Guide

Connection Issues

Cannot connect to cluster

Authentication failed

Runtime Errors

Timeouts during test

Out of memory

Too many open files

Validation Errors

False positives during startup

Consistency errors

Row count differences

Schema Issues

Schema file parsing error

Unsupported type

Performance Issues

Slow throughput

High latency

Statement Logger Issues

Large log files

Cannot read compressed logs

Missing statement logs

Docker Issues

Container cannot reach clusters

Logs not persisted

Getting Help