Skip to content

Camph0r/bigdata-fraud-pipeline

Repository files navigation

Fraud Detection Pipeline

This project implements an end-to-end real-time fraud detection pipeline using:

  • Kafka → Streaming ingestion
  • Spark Structured Streaming → Real-time ML inference
  • LightGBM model → Fraud classification
  • Cassandra → Fast operational storage
  • MinIO (S3) → Raw micro-batch storage
  • Superset → BI dashboards

The system continuously consumes transactions from Kafka, enriches them, applies a trained LightGBM model, labels each transaction as fraudulent or not, and writes results to Cassandra and MinIO. Superset connects to Cassandra (via Trino) for dashboard visualization.

Architecture

Architecture

How to Run

  1. Start entire system
make all
  1. Start/stop producer only
make run
make stop

Accessing Components

Component URL
Superset UI http://localhost:8088
MinIO Console http://localhost:9001
Spark UI http://localhost:4040
Trino UI http://localhost:8085

Superset login

username: admin

password: admin

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors