Common issues and how to resolve them.
Symptoms:
failed to connect to cluster: dial tcp: connection refused
Solutions:
- Verify cluster is running:
cqlsh <IP> - Check firewall rules (port 9042)
- For Docker, use host network or correct IP
- Increase connect timeout:
--connect-timeout=60s
Symptoms:
authentication failed: invalid credentials
Solutions:
- Verify username/password
- Use correct flags:
--test-username=user --test-password=pass --oracle-username=user --oracle-password=pass
Symptoms:
request timeout: operation timed out
Solutions:
- Increase timeout:
--request-timeout=60s - Reduce concurrency:
--concurrency=5 - Check cluster load and resources
- Verify network latency between Gemini and clusters
Symptoms:
runtime: out of memory
Solutions:
- Reduce partition count:
--partition-count=500000 - Use smaller dataset:
--dataset-size=small - Reduce concurrency:
--concurrency=5 - Reduce the number of tables:
--max-tables=1 - Simplify schema: fewer partition keys and clustering keys use less memory
Symptoms:
too many open files
Solutions:
- Increase ulimit:
ulimit -n 65535 - Reduce concurrency
- Check for connection leaks in cluster
Symptoms: Errors during the first few seconds of the test.
Solutions:
- Add warmup period:
--warmup=2m - Ensure clusters are fully synchronized before testing
- Check for ongoing compactions or repairs
Symptoms: Intermittent mismatches that resolve on retry.
Solutions:
- Use stronger consistency:
--consistency=ALL - Increase retry attempts:
--max-mutation-retries=20 - Add delay between mutations:
--minimum-delay=100ms - Check cluster replication status
Symptoms:
oracle returned 5 rows, test returned 3 rows
Causes:
- Failed mutations on one cluster
- Replication lag
- Data corruption
Investigation:
- Check statement logs for failed operations
- Query both clusters manually
- See Investigation Guide
Symptoms:
cannot parse schema file: invalid JSON
Solutions:
- Validate JSON syntax
- Check for trailing commas
- Ensure all required fields are present
- See Schema Guide for format
Symptoms:
unsupported column type
Solutions:
- Check CQL feature level:
--cql-features=all - Verify type is supported by your Scylla version
- Use simple types for partition keys
Symptoms: Low operations per second.
Solutions:
- Increase concurrency:
--concurrency=50 - Increase IO worker pool:
--io-worker-pool=256 - Use token-aware policy (default)
- Check cluster resource utilization
- Simplify schema - some schema choices cause significant overhead:
- Reduce partition key count (
--max-partition-keys=2) - Reduce clustering key count (
--max-clustering-keys=2) - Avoid large column types (blobs, large text)
- Use
--cql-features=basicto avoid expensive collection types
- Reduce partition key count (
- Add more power to Gemini runner - Gemini itself can become CPU-bound with high concurrency
Symptoms: High response times from clusters.
Solution: Reduce batch sizes - Large batches increase latency. Gemini doesn't have a direct batch size flag, but you can reduce the amount of data per operation by:
- Using simpler column types (avoid large blobs/text)
- Reducing the number of columns per table (
--max-columns=5) - Using smaller partition sizes
Troubleshooting steps:
- Check network latency between Gemini and clusters
- Monitor cluster metrics for overload (CPU, memory, disk I/O)
- Use local datacenter if multi-DC:
--host-selection-policy=token-aware
Solutions:
- Enable compression:
--statement-log-file-compression=gzip - Use shorter test duration
- Only error context is written to files (not all statements)
# For gzip
zcat test.json.gz | jq '.'
# For zstd
zstd -d -c test.json.zst | jq '.'Causes:
- Logs only created when errors occur
- File path not writable
Solutions:
- Ensure directory exists and is writable
- Statement logs contain error context, not all statements
- Query ScyllaDB logs table directly:
cqlsh -e "SELECT * FROM ks_logs.table1_statements LIMIT 10;"
Solutions:
- Use host network:
docker run --network=host scylladb/gemini:latest ...
- Use container IP addresses
- Ensure clusters are accessible from Docker network
Solutions: Mount a volume:
docker run -v $(pwd)/logs:/logs scylladb/gemini:latest \
--test-statement-log-file=/logs/test.json \
...- Check logs:
cat gemini.log | jq 'select(.level == "error")' - Enable debug logging:
--level=debug - Reproduce with specific seed:
--seed=... --schema-seed=... - File an issue: https://github.com/scylladb/gemini/issues
Include:
- Gemini version:
./gemini --version - Command used
- Error message
- Seed values from output
- Scylla/Cassandra versions