Skip to content

Multi-database comparison project: PostgreSQL vs Redis vs Cassandra using the same Formula 1 dataset (modelling, ETL, analytics, Docker) πŸ—„οΈ

Notifications You must be signed in to change notification settings

bursasha/postgresql-redis-cassandra-docker-comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏎️ PostgreSQL Β· Redis Β· Cassandra β€” Formula 1 2018 DB Comparison πŸ“Š


πŸ”Ž A Multi-Database Analytics & Modelling Project

This project demonstrates how the same real-world dataset β€” the 2018 Formula 1 season β€”
can be modelled, loaded, queried and analyzed across three fundamentally different database systems:

  • PostgreSQL (relational, ACID, structured schema, advanced SQL).
  • Redis (in-memory key–value store, Lua server-side compute, no schema).
  • Cassandra (distributed NoSQL column store, tunable consistency, partition-oriented data modeling).

The goal is to compare the data modelling philosophies, ETL workflows, and
analytical capabilities of each system by implementing:

  • Dockerized database environments.
  • Dataset initialization scripts.
  • Querying and analytics logic.
  • Fully reproducible infrastructure.
  • A unified domain model based on Formula 1 Teams, Drivers, Cars, Races and Race Results.

🎯 Project Objectives

Across all three databases, the project aims to:

βœ”οΈ Model the same F1 dataset using different paradigms

  • Relational schema in PostgreSQL.
  • Hash-based keyspace modelling in Redis.
  • Partition + clustering keys in Cassandra.

βœ”οΈ Implement ETL pipelines unique to each system

  • PostgreSQL: CSV β†’ staging β†’ validated tables via PL/pgSQL procedures.
  • Redis: manual dataset creation via command scripts (no file access available).
  • Cassandra: CQL COPY-based ingestion (requires files inside the container).

βœ”οΈ Provide analytical capabilities in native style

  • PostgreSQL: expressive multi-join SQL queries.
  • Redis: analytical pipelines written in Lua (EVAL scripts).
  • Cassandra: wide-row scans using CQL + ALLOW FILTERING for demonstration.

βœ”οΈ Establish a clean, reproducible Docker-based environment

Each system has:

  • .env for configuration.
  • docker-compose.yml for infrastructure.
  • run.sh and stop.sh scripts for lifecycle management.

πŸ—‚οΈ Project Structure

postgresql-redis-cassandra-docker-comparison/
β”œβ”€β”€ postgresql/        # Full relational F1 database with ETL + SQL analytics
β”œβ”€β”€ redis/             # In-memory key-value F1 model + Lua analytical queries
β”œβ”€β”€ cassandra/         # Wide-column F1 schema + CQL COPY ingestion + analytics
β”œβ”€β”€ data/              # CSV source data used across all systems
└── README.md          # Top-level documentation (this file)

Each subproject has its own README.md describing its architecture in detail
as well as its own scripts, queries and Docker environment.


🧱 PostgreSQL Module Overview

A full production-style relational model using:

  • Strict schema (PK, FK, CHECK, UNIQUE).
  • PL/pgSQL ETL pipeline.
  • CSV β†’ COPY β†’ staging β†’ validated inserts.
  • Analytical SQL queries with JOINs, CTEs, aggregates.

This part demonstrates:

  • Traditional normalized database design.
  • Constraint-based data integrity.
  • Powerful SQL analytics.
  • Dockerized reproducibility.

Full documentation is inside postgresql/README.md.


🌿 Redis Module Overview

Because Redis has:

  • no schema.
  • no JOINs.
  • no CSV loading.
  • no relational constraints.

The entire dataset is modelled manually using:

  • Redis Hashes (e.g., drivers:1, race_results:10:2).
  • Key naming conventions.
  • Script-based initialization (hundreds of HSET commands).
  • Lua-based analytical queries recreated from scratch.

Analytics include:

  • SCAN-based iteration.
  • Aggregations in Lua tables.
  • Sorting, grouping, filtering.
  • Server-side Lua execution for performance.

This demonstrates Redis as:

  • An ultra-fast in-memory compute layer.
  • A store where all query logic must be implemented manually.

Full documentation is inside redis/README.md.


πŸ‘ Cassandra Module Overview

Cassandra is a distributed, highly scalable NoSQL store where:

  • PRIMARY KEY = partition key + clustering key.
  • No JOINs, no FK, no CHECK, no UNIQUE.
  • Queries must align with the data model.
  • ALLOW FILTERING is permitted only for small datasets (like this project).

The project demonstrates:

  • Wide-row modelling (Race_Results partitioned by track)
  • Keyspace creation.
  • CSV ingestion via CQL COPY.
  • Analytical queries using native Cassandra capabilities.

This highlights Cassandra's strengths and constraints:

  • Optimized for high write throughput and horizontal scalability
  • Analytics require careful modelling or external engines

Full documentation is inside cassandra/README.md.


πŸ“Š Analytical Comparisons Across Databases

All three systems implement logically equivalent analytical queries, including:

  1. Rainy & hot races.
  2. Teams with lowest scores.
  3. Driver physical statistics / averages.
  4. Engine performance analysis.
  5. Race record discovery.

But each system executes them in a completely different way:

Task PostgreSQL Redis Cassandra
Filtering WHERE Lua conditions ALLOW FILTERING
Aggregation SUM, AVG, COUNT Lua summation Built-in aggregates
Joins JOIN Manual lookups by key Cannot JOIN
Input data Automatic CSV COPY β†’ staging Manual HSET population CQL COPY
Integrity Strict constraints None None
Query language SQL Lua + Redis API CQL

This demonstrates the trade-offs of each system and how the same problem
must often be solved with radically different techniques.


🐳 Dockerized Multi-Database Infrastructure

Every database runs fully isolated in its own container, using:

  • Persistent named volumes.
  • Injected environment variables.
  • Simple startup/shutdown automation.
./run.sh
./stop.sh

This ensures:

  • Clean reproducibility.
  • No local installation required.
  • Zero configuration drifting.

🏁 Summary & Key Takeaways

This project is a comprehensive multi-database study showing how a single dataset can be
approached through three entirely different database technologies, each requiring its own
data modelling, ingestion strategy, and analytical techniques.

πŸš€ What the project demonstrates

βœ”οΈ Cross-technology Data Engineering skills

Ability to work with relational, in-memory, and distributed NoSQL systems.

βœ”οΈ Schema design and modelling expertise

PostgreSQL: normalized relational schema.

Redis: manual key-value modelling.

Cassandra: partitioning and clustering strategy.

βœ”οΈ ETL mastery across database paradigms

PL/pgSQL procedures, Redis scripting, Cassandra COPY ingestion.

βœ”οΈ Analytical processing in SQL, Lua, and CQL

Implementing equivalent analytics using completely different toolchains.

βœ”οΈ Production-grade DevOps setup

Fully Dockerized, reproducible, environment-isolated, easy to run.

βœ”οΈ Understanding of trade-offs in modern database systems

Performance, scalability, flexibility, complexity, consistency.

About

Multi-database comparison project: PostgreSQL vs Redis vs Cassandra using the same Formula 1 dataset (modelling, ETL, analytics, Docker) πŸ—„οΈ

Topics

Resources

Stars

Watchers

Forks