This project demonstrates how the same real-world dataset β the 2018 Formula 1 season β
can be modelled, loaded, queried and analyzed across three fundamentally different database systems:
- PostgreSQL (relational, ACID, structured schema, advanced SQL).
- Redis (in-memory keyβvalue store, Lua server-side compute, no schema).
- Cassandra (distributed NoSQL column store, tunable consistency, partition-oriented data modeling).
The goal is to compare the data modelling philosophies, ETL workflows, and
analytical capabilities of each system by implementing:
- Dockerized database environments.
- Dataset initialization scripts.
- Querying and analytics logic.
- Fully reproducible infrastructure.
- A unified domain model based on Formula 1 Teams, Drivers, Cars, Races and Race Results.
Across all three databases, the project aims to:
- Relational schema in PostgreSQL.
- Hash-based keyspace modelling in Redis.
- Partition + clustering keys in Cassandra.
- PostgreSQL: CSV β staging β validated tables via PL/pgSQL procedures.
- Redis: manual dataset creation via command scripts (no file access available).
- Cassandra: CQL COPY-based ingestion (requires files inside the container).
- PostgreSQL: expressive multi-join SQL queries.
- Redis: analytical pipelines written in Lua (
EVALscripts). - Cassandra: wide-row scans using CQL + ALLOW FILTERING for demonstration.
Each system has:
.envfor configuration.docker-compose.ymlfor infrastructure.run.shandstop.shscripts for lifecycle management.
postgresql-redis-cassandra-docker-comparison/
βββ postgresql/ # Full relational F1 database with ETL + SQL analytics
βββ redis/ # In-memory key-value F1 model + Lua analytical queries
βββ cassandra/ # Wide-column F1 schema + CQL COPY ingestion + analytics
βββ data/ # CSV source data used across all systems
βββ README.md # Top-level documentation (this file)
Each subproject has its own README.md describing its architecture in detail
as well as its own scripts, queries and Docker environment.
A full production-style relational model using:
- Strict schema (
PK,FK,CHECK,UNIQUE). - PL/pgSQL ETL pipeline.
- CSV β
COPYβ staging β validated inserts. - Analytical SQL queries with
JOINs, CTEs, aggregates.
This part demonstrates:
- Traditional normalized database design.
- Constraint-based data integrity.
- Powerful SQL analytics.
- Dockerized reproducibility.
Full documentation is inside postgresql/README.md.
Because Redis has:
- no schema.
- no
JOINs. - no CSV loading.
- no relational constraints.
The entire dataset is modelled manually using:
- Redis Hashes (e.g.,
drivers:1,race_results:10:2). - Key naming conventions.
- Script-based initialization (hundreds of
HSETcommands). - Lua-based analytical queries recreated from scratch.
Analytics include:
SCAN-based iteration.- Aggregations in Lua tables.
- Sorting, grouping, filtering.
- Server-side Lua execution for performance.
This demonstrates Redis as:
- An ultra-fast in-memory compute layer.
- A store where all query logic must be implemented manually.
Full documentation is inside redis/README.md.
Cassandra is a distributed, highly scalable NoSQL store where:
PRIMARY KEY= partition key + clustering key.- No
JOINs, noFK, noCHECK, noUNIQUE. - Queries must align with the data model.
ALLOW FILTERINGis permitted only for small datasets (like this project).
The project demonstrates:
- Wide-row modelling (
Race_Resultspartitioned by track) - Keyspace creation.
- CSV ingestion via CQL COPY.
- Analytical queries using native Cassandra capabilities.
This highlights Cassandra's strengths and constraints:
- Optimized for high write throughput and horizontal scalability
- Analytics require careful modelling or external engines
Full documentation is inside cassandra/README.md.
All three systems implement logically equivalent analytical queries, including:
- Rainy & hot races.
- Teams with lowest scores.
- Driver physical statistics / averages.
- Engine performance analysis.
- Race record discovery.
But each system executes them in a completely different way:
| Task | PostgreSQL | Redis | Cassandra |
|---|---|---|---|
| Filtering | WHERE |
Lua conditions | ALLOW FILTERING |
| Aggregation | SUM, AVG, COUNT |
Lua summation | Built-in aggregates |
| Joins | JOIN |
Manual lookups by key | Cannot JOIN |
| Input data | Automatic CSV COPY β staging |
Manual HSET population |
CQL COPY |
| Integrity | Strict constraints | None | None |
| Query language | SQL | Lua + Redis API | CQL |
This demonstrates the trade-offs of each system and how the same problem
must often be solved with radically different techniques.
Every database runs fully isolated in its own container, using:
- Persistent named volumes.
- Injected environment variables.
- Simple startup/shutdown automation.
./run.sh
./stop.shThis ensures:
- Clean reproducibility.
- No local installation required.
- Zero configuration drifting.
This project is a comprehensive multi-database study showing how a single dataset can be
approached through three entirely different database technologies, each requiring its own
data modelling, ingestion strategy, and analytical techniques.
Ability to work with relational, in-memory, and distributed NoSQL systems.
PostgreSQL: normalized relational schema.
Redis: manual key-value modelling.
Cassandra: partitioning and clustering strategy.
PL/pgSQL procedures, Redis scripting, Cassandra COPY ingestion.
Implementing equivalent analytics using completely different toolchains.
Fully Dockerized, reproducible, environment-isolated, easy to run.
Performance, scalability, flexibility, complexity, consistency.