🏎️ PostgreSQL · Redis · Cassandra — Formula 1 2018 DB Comparison 📊

🔎 A Multi-Database Analytics & Modelling Project

This project demonstrates how the same real-world dataset — the 2018 Formula 1 season —
can be modelled, loaded, queried and analyzed across three fundamentally different database systems:

PostgreSQL (relational, ACID, structured schema, advanced SQL).
Redis (in-memory key–value store, Lua server-side compute, no schema).
Cassandra (distributed NoSQL column store, tunable consistency, partition-oriented data modeling).

The goal is to compare the data modelling philosophies, ETL workflows, and
analytical capabilities of each system by implementing:

Dockerized database environments.
Dataset initialization scripts.
Querying and analytics logic.
Fully reproducible infrastructure.
A unified domain model based on Formula 1 Teams, Drivers, Cars, Races and Race Results.

🎯 Project Objectives

Across all three databases, the project aims to:

✔️ Model the same F1 dataset using different paradigms

Relational schema in PostgreSQL.
Hash-based keyspace modelling in Redis.
Partition + clustering keys in Cassandra.

✔️ Implement ETL pipelines unique to each system

PostgreSQL: CSV → staging → validated tables via PL/pgSQL procedures.
Redis: manual dataset creation via command scripts (no file access available).
Cassandra: CQL COPY-based ingestion (requires files inside the container).

✔️ Provide analytical capabilities in native style

PostgreSQL: expressive multi-join SQL queries.
Redis: analytical pipelines written in Lua (EVAL scripts).
Cassandra: wide-row scans using CQL + ALLOW FILTERING for demonstration.

✔️ Establish a clean, reproducible Docker-based environment

Each system has:

.env for configuration.
docker-compose.yml for infrastructure.
run.sh and stop.sh scripts for lifecycle management.

🗂️ Project Structure

postgresql-redis-cassandra-docker-comparison/
├── postgresql/        # Full relational F1 database with ETL + SQL analytics
├── redis/             # In-memory key-value F1 model + Lua analytical queries
├── cassandra/         # Wide-column F1 schema + CQL COPY ingestion + analytics
├── data/              # CSV source data used across all systems
└── README.md          # Top-level documentation (this file)

Each subproject has its own README.md describing its architecture in detail
as well as its own scripts, queries and Docker environment.

🧱 PostgreSQL Module Overview

A full production-style relational model using:

Strict schema (PK, FK, CHECK, UNIQUE).
PL/pgSQL ETL pipeline.
CSV → COPY → staging → validated inserts.
Analytical SQL queries with JOINs, CTEs, aggregates.

This part demonstrates:

Traditional normalized database design.
Constraint-based data integrity.
Powerful SQL analytics.
Dockerized reproducibility.

Full documentation is inside postgresql/README.md.

🌿 Redis Module Overview

Because Redis has:

no schema.
no JOINs.
no CSV loading.
no relational constraints.

The entire dataset is modelled manually using:

Redis Hashes (e.g., drivers:1, race_results:10:2).
Key naming conventions.
Script-based initialization (hundreds of HSET commands).
Lua-based analytical queries recreated from scratch.

Analytics include:

SCAN-based iteration.
Aggregations in Lua tables.
Sorting, grouping, filtering.
Server-side Lua execution for performance.

This demonstrates Redis as:

An ultra-fast in-memory compute layer.
A store where all query logic must be implemented manually.

Full documentation is inside redis/README.md.

👁 Cassandra Module Overview

Cassandra is a distributed, highly scalable NoSQL store where:

PRIMARY KEY = partition key + clustering key.
No JOINs, no FK, no CHECK, no UNIQUE.
Queries must align with the data model.
ALLOW FILTERING is permitted only for small datasets (like this project).

The project demonstrates:

Wide-row modelling (Race_Results partitioned by track)
Keyspace creation.
CSV ingestion via CQL COPY.
Analytical queries using native Cassandra capabilities.

This highlights Cassandra's strengths and constraints:

Optimized for high write throughput and horizontal scalability
Analytics require careful modelling or external engines

Full documentation is inside cassandra/README.md.

📊 Analytical Comparisons Across Databases

All three systems implement logically equivalent analytical queries, including:

Rainy & hot races.
Teams with lowest scores.
Driver physical statistics / averages.
Engine performance analysis.
Race record discovery.

But each system executes them in a completely different way:

Task	PostgreSQL	Redis	Cassandra
Filtering	`WHERE`	Lua conditions	`ALLOW FILTERING`
Aggregation	`SUM`, `AVG`, `COUNT`	Lua summation	Built-in aggregates
Joins	`JOIN`	Manual lookups by key	Cannot `JOIN`
Input data	Automatic CSV `COPY` → staging	Manual `HSET` population	CQL `COPY`
Integrity	Strict constraints	None	None
Query language	SQL	Lua + Redis API	CQL

This demonstrates the trade-offs of each system and how the same problem
must often be solved with radically different techniques.

🐳 Dockerized Multi-Database Infrastructure

Every database runs fully isolated in its own container, using:

Persistent named volumes.
Injected environment variables.
Simple startup/shutdown automation.

./run.sh
./stop.sh

This ensures:

Clean reproducibility.
No local installation required.
Zero configuration drifting.

🏁 Summary & Key Takeaways

This project is a comprehensive multi-database study showing how a single dataset can be
approached through three entirely different database technologies, each requiring its own
data modelling, ingestion strategy, and analytical techniques.

🚀 What the project demonstrates

✔️ Cross-technology Data Engineering skills

Ability to work with relational, in-memory, and distributed NoSQL systems.

✔️ Schema design and modelling expertise

PostgreSQL: normalized relational schema.

Redis: manual key-value modelling.

Cassandra: partitioning and clustering strategy.

✔️ ETL mastery across database paradigms

PL/pgSQL procedures, Redis scripting, Cassandra COPY ingestion.

✔️ Analytical processing in SQL, Lua, and CQL

Implementing equivalent analytics using completely different toolchains.

✔️ Production-grade DevOps setup

Fully Dockerized, reproducible, environment-isolated, easy to run.

✔️ Understanding of trade-offs in modern database systems

Performance, scalability, flexibility, complexity, consistency.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏎️ PostgreSQL · Redis · Cassandra — Formula 1 2018 DB Comparison 📊

🔎 A Multi-Database Analytics & Modelling Project

🎯 Project Objectives

✔️ Model the same F1 dataset using different paradigms

✔️ Implement ETL pipelines unique to each system

✔️ Provide analytical capabilities in native style

✔️ Establish a clean, reproducible Docker-based environment

🗂️ Project Structure

🧱 PostgreSQL Module Overview

🌿 Redis Module Overview

👁 Cassandra Module Overview

📊 Analytical Comparisons Across Databases

🐳 Dockerized Multi-Database Infrastructure

🏁 Summary & Key Takeaways

🚀 What the project demonstrates

✔️ Cross-technology Data Engineering skills

✔️ Schema design and modelling expertise

✔️ ETL mastery across database paradigms

✔️ Analytical processing in SQL, Lua, and CQL

✔️ Production-grade DevOps setup

✔️ Understanding of trade-offs in modern database systems

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
cassandra		cassandra
data		data
postgresql		postgresql
redis		redis
README.md		README.md
report.pdf		report.pdf

bursasha/postgresql-redis-cassandra-docker-comparison

Folders and files

Latest commit

History

Repository files navigation

🏎️ PostgreSQL · Redis · Cassandra — Formula 1 2018 DB Comparison 📊

🔎 A Multi-Database Analytics & Modelling Project

🎯 Project Objectives

✔️ Model the same F1 dataset using different paradigms

✔️ Implement ETL pipelines unique to each system

✔️ Provide analytical capabilities in native style

✔️ Establish a clean, reproducible Docker-based environment

🗂️ Project Structure

🧱 PostgreSQL Module Overview

🌿 Redis Module Overview

👁 Cassandra Module Overview

📊 Analytical Comparisons Across Databases

🐳 Dockerized Multi-Database Infrastructure

🏁 Summary & Key Takeaways

🚀 What the project demonstrates

✔️ Cross-technology Data Engineering skills

✔️ Schema design and modelling expertise

✔️ ETL mastery across database paradigms

✔️ Analytical processing in SQL, Lua, and CQL

✔️ Production-grade DevOps setup

✔️ Understanding of trade-offs in modern database systems

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages