The Fraud Detection MLOps Pipeline is an end-to-end system designed to identify potentially fraudulent financial transactions with high accuracy and scalability. This project integrates Machine Learning (ML) with MLOps principles to ensure robust experimentation, deployment, and real-time monitoring of fraud detection models.
DEMO LINK - LINK
- Project Overview
- Tech Stack
- Architecture Diagrams
- Features
- Directory Structure
- Setup Instructions
- Running the Streamlit App
- Running the FastAPI Service
- Experiment Tracking with MLflow
- Monitoring with Prometheus & Grafana
- Model Details
- Results & Metrics
- Screenshots
- Future Work
- Author / Contact
- This project implements a complete MLOps pipeline for fraud detection using transactional data. It covers the entire ML lifecycle
- Build a modular FraudPipeline capable of feature engineering, preprocessing, resampling (SMOTE), and threshold tuning.
- Track experiments using MLflow for reproducibility and comparative analysis.
- Deploy the model using FastAPI for REST API services and Streamlit for an interactive UI.
- Containerize and orchestrate services using Docker and Kubernetes (Minikube).
- Monitor system health and metrics using Prometheus and Grafana dashboards.
Detect fraudulent transactions in real-time with high recall while minimizing false positives.
- Python 3.12+
- Scikit-learn: Model building, preprocessing, metrics.
- Imbalanced-learn: SMOTE for class imbalance handling.
- Pandas / NumPy: Data manipulation and numerical operations.
- MLflow: Experiment tracking, logging metrics, model registry.
- FastAPI: Serving the fraud detection model via REST API.
- Streamlit: Interactive web UI for predictions and model insights.
- Docker: Containerization of the FastAPI and Streamlit apps.
- Kubernetes (Minikube): Local orchestration and scaling of microservices.
- Prometheus: Metrics scraping for FastAPI endpoints.
- Grafana: Visualization dashboards for system and API monitoring.
The complete pipeline involves:
- Data Ingestion & Preprocessing
- Model Training & Threshold Optimization
- Experiment Tracking with MLflow
- Model Deployment via FastAPI & Streamlit
- Containerization with Docker
- Orchestration using Kubernetes (Minikube)
- Monitoring using Prometheus + Grafana
- Feature Engineering: Interaction, ratio, binning, time-of-day categorization.
- Preprocessing: Imputation, encoding, log transform, scaling.
- Resampling: SMOTE to address class imbalance.
- Model Training: Logistic Regression (configurable to RandomForest/XGBoost).
- Threshold Tuning: Optimize precision-recall trade-off for fraud detection.
-
Real-Time Fraud Prediction:
- Streamlit UI for quick predictions.
- FastAPI endpoint for programmatic integration.
-
Experiment Tracking:
- MLflow logs parameters, metrics, artifacts (confusion matrix, PR curve).
-
Scalable Deployment:
- Dockerized microservices deployed on Kubernetes (Minikube).
-
Robust Monitoring:
- Prometheus scrapes real-time metrics from FastAPI.
- Grafana dashboards visualize system health and request patterns.
-
Data Handling:
- Automatic preprocessing (missing values, scaling, encoding).
- SMOTE resampling for highly imbalanced fraud datasets.
-
Threshold Optimization:
- Dynamically finds the best threshold balancing recall and precision.
The project follows a modular structure separating API, model, monitoring, and visualization components:
FRAUD_MLOPS_PROJECT/
│
├── API/ # FastAPI microservice
│ ├── main.py # API entry point
│ ├── schemas.py # Pydantic models for request/response
│ ├── services.py # Core service logic
│ └── mlruns/ # MLflow experiment tracking logs
│
├── Data/ # Datasets
│ ├── payment_fraud.csv
│ └── combined_holdout.csv
│
├── Images/ # Project diagrams & screenshots
│ ├── Docker/
│ ├── FastAPI/
│ ├── Grafana/
│ ├── MLFlow/
│ ├── MLOps_Architecture/
│ ├── Model_Architecture/
│ └── Prometheus/
│
├── K8s/ # Kubernetes manifests
│ ├── fraud-api-deployment.yaml
│ ├── fraud-api-service.yaml
│ ├── grafana-deployment.yaml
│ └── prometheus-deployment.yaml
│
├── Notebooks/ # Jupyter Notebooks
│ ├── EDA.ipynb
│ ├── training_model.ipynb
│ ├── test_files.ipynb
│ └── artifacts/ # Trained model artifacts
│ ├── confusion_matrix.png
│ ├── pr_curve.png
│ └── fraud_pipeline_deployed.pkl
│
├── Pages/ # Streamlit multi-page app
│ ├── home.py
│ ├── about_model.py
│ ├── metrics_page.py
│ └── about_me.py
│
├── Src/ # Core ML pipeline code
│ ├── model.py # FraudPipeline, FeatureEngineering, Preprocessing
│ ├── utils.py # Helper functions
│ ├── config.py # Configurations
│ └── artifacts/ # MLflow model logs
│
├── app.py # Streamlit entry point
├── Dockerfile # Docker setup for Streamlit/FastAPI
├── requirements.txt # Dependencies
├── .gitignore
└── README.md
- Python 3.10 or higher
- Docker Desktop
- Minikube (for Kubernetes)
- kubectl CLI
- Prometheus & Grafana (installed via Helm or K8s manifests)
- Clone the repository
git clone https://github.com/MohitGupta0123/Fraud_Detection_MLOps.git
cd Fraud_Detection_MLOps- Create virtual environment & install dependencies
python -m venv .venv
source .venv/bin/activate # Linux/Mac
.venv\Scripts\activate # Windows
pip install -r requirements.txt- Run Streamlit app locally
streamlit run app.py- Run FastAPI service locally
cd API
uvicorn main:app --reload --host 0.0.0.0 --port 8000- Build Docker images
docker build -t fraud-streamlit -f Dockerfile .
docker build -t fraud-fastapi -f Dockerfile ./API- Run containers
docker run -p 8501:8501 fraud-streamlit
docker run -p 8000:8000 fraud-fastapi- Start Minikube
minikube start --driver=docker- Apply Kubernetes manifests
kubectl apply -f K8s/fraud-api-deployment.yaml
kubectl apply -f K8s/fraud-api-service.yaml
kubectl apply -f K8s/prometheus-deployment.yaml
kubectl apply -f K8s/grafana-deployment.yaml- Access services
minikube service fraud-api-service
minikube service prometheus -n monitoring
minikube service grafana -n monitoringThe Streamlit app provides an interactive interface to test fraud detection predictions and visualize model metrics.
streamlit run app.py- Access at:
http://localhost:8501
- Input transaction details (Category, Payment Method, Account Age, etc.)
- Auto-fill examples for Legitimate and Fraudulent transactions
- Real-time prediction with threshold-based confidence
- Navigation to About Model, Metrics, and About Me pages
FastAPI serves the fraud prediction model as a REST API, useful for production-grade deployment and integration with external systems.
cd API
uvicorn main:app --reload --host 0.0.0.0 --port 8000- Access API docs at:
http://localhost:8000/docs
POST /predict– Accepts JSON payload and returns predictionGET /health– Health check endpoint
docker build -t fraud-fastapi -f Dockerfile ./API
docker run -p 8000:8000 fraud-fastapiMLflow is integrated to log experiments, parameters, metrics, and artifacts (PR curve, confusion matrix, models).
-
Automatically tracks during training via
FraudPipeline -
Logs include:
- Parameters: Steps applied, resampling method, model type
- Metrics: Accuracy, Precision, Recall, F1-score, PR-AUC
- Artifacts: PR Curve, Confusion Matrix, Serialized Model
mlflow ui- Opens at
http://127.0.0.1:5000 - Explore experiment runs and compare metrics visually
The deployed FastAPI service exposes metrics for Prometheus, visualized via Grafana dashboards.
- Scrapes FastAPI metrics (request counts, response latency, error rates)
- Runs on port 9090 in
monitoringnamespace
- Visualizes Prometheus data using pre-built dashboards
- Runs on port 3000 in
monitoringnamespace - Import your saved JSON dashboard via Grafana UI
minikube service prometheus -n monitoring
minikube service grafana -n monitoringThe fraud detection model is built using a custom pipeline with multiple stages:
-
Feature Engineering
- Interaction:
Category x PaymentMethod - Ratio:
paymentMethodAgeDays / accountAgeDays - Binning:
accountAgeDaysintonew/medium/old - Time Feature: Categorize
localTimeinto time-of-day bins
- Interaction:
-
Preprocessing
- Imputation for missing values (median/mode)
- One-hot encoding for categorical variables
- Log transformation for skewed features
- Scaling: StandardScaler (skewed) + MinMaxScaler (symmetric)
-
Resampling
- SMOTE to handle extreme class imbalance
-
Model Training
- Logistic Regression (default)
- Supports other models like RandomForest, XGBoost
-
Threshold Tuning
- Optimal threshold found via precision-recall curve
- Current best threshold: 0.8370 (Precision = 0.955, Recall = 0.991)
- Hold-out A: Accuracy 97%, Recall 100%, Precision 25% (imbalanced case)
- Hold-out B: Accuracy 99%, Recall 100%, Precision 50% (imbalanced case)
- Hold-out C: Accuracy 98%, Recall 98%, Precision 98%
- Stored in
Notebooks/artifacts/ - PR Curve demonstrates strong precision-recall balance
- Confusion Matrix confirms minimal false negatives (critical for fraud detection)
- Integrate CI/CD pipelines with GitHub Actions or Jenkins
- Add model registry using MLflow’s registry or Seldon Core
- Deploy cloud-native on AWS/GCP/Azure (EKS/GKE/AKS)
- Implement real-time streaming predictions with Kafka
- Add explainability (SHAP/LIME) for fraud predictions
Author: Mohit Gupta
Feel free to connect for feedback, contributions, or collaborations.















