This project implements an end-to-end real-time fraud detection pipeline using:
- Kafka → Streaming ingestion
- Spark Structured Streaming → Real-time ML inference
- LightGBM model → Fraud classification
- Cassandra → Fast operational storage
- MinIO (S3) → Raw micro-batch storage
- Superset → BI dashboards
The system continuously consumes transactions from Kafka, enriches them, applies a trained LightGBM model, labels each transaction as fraudulent or not, and writes results to Cassandra and MinIO. Superset connects to Cassandra (via Trino) for dashboard visualization.
- Start entire system
make all- Start/stop producer only
make run
make stop| Component | URL |
|---|---|
| Superset UI | http://localhost:8088 |
| MinIO Console | http://localhost:9001 |
| Spark UI | http://localhost:4040 |
| Trino UI | http://localhost:8085 |
Superset login
username: admin
password: admin
