🛡️ Defraudo - AI-Powered Payment Fraud Detection System

An enterprise-grade fraud detection system leveraging advanced machine learning algorithms, ensemble methods, and real-time transaction analysis to detect and prevent fraudulent activities with 99.92% accuracy.

🚀 Demo • 📊 Mathematical Approach • ✨ Features • 🛠️ Tech Stack • 📦 Installation • 📫 Contact

📌 Executive Summary

Defraudo is a production-ready, full-stack fraud detection platform designed for financial institutions and payment processors. The system combines cutting-edge machine learning algorithms with a robust MERN stack infrastructure to deliver real-time fraud detection with exceptional accuracy on highly imbalanced datasets.

🎯 Key Performance Metrics

Metric	Value	Description
Accuracy	99.92%	Overall classification accuracy on test set
Precision	95.7%	Minimizes false positives (legitimate flagged as fraud)
Recall	82.4%	Maximizes fraud detection rate
F1-Score	88.5%	Harmonic mean of precision and recall
ROC-AUC	0.987	Area under ROC curve
PR-AUC	0.854	Precision-Recall AUC (critical for imbalanced data)
Latency	<50ms	Real-time prediction response time

🔬 Core Capabilities

Advanced ML Pipeline: Implements XGBoost, Random Forest, and LightGBM with Bayesian hyperparameter optimization
Class Imbalance Handling: SMOTE, ADASYN, and ensemble-based resampling techniques
Feature Engineering: PCA-derived features, temporal patterns, statistical aggregations
Real-time Processing: Sub-50ms prediction latency with concurrent request handling
Production Architecture: Microservices-based design with RESTful APIs and JWT authentication

📊 Mathematical Approach & Algorithms

🧮 Problem Formulation

Fraud detection is formulated as a binary classification problem with extreme class imbalance:

$$ \hat{y} = f(X) \in {0, 1} $$

where:

$X \in \mathbb{R}^{n \times d}$ is the feature matrix ($n$ transactions, $d$ features)
$\hat{y}$ is the predicted label (0: legitimate, 1: fraudulent)
Class distribution: $P(y=1) \approx 0.17%$ (highly imbalanced)

🔬 Feature Engineering Pipeline

1. Principal Component Analysis (PCA) Features

The dataset contains 28 PCA-transformed features ($V_1$ to $V_{28}$) obtained through dimensionality reduction:

$$ V = XW $$

where $W \in \mathbb{R}^{d \times 28}$ are the principal components capturing maximum variance.

2. Temporal Feature Engineering

$$ \text{hour} = \left\lfloor \frac{\text{Time}}{3600} \right\rfloor \mod 24 $$

$$ \text{is_night} = \begin{cases} 1 & \text{if } 0 \leq \text{hour} < 6 \text{ or } 22 \leq \text{hour} < 24 \\ 0 & \text{otherwise} \end{cases} $$

3. Amount Transformations

To handle skewed distributions:

$$ \text{Amount_log} = \log(1 + \text{Amount}) $$

$$ \text{Amount_zscore} = \frac{\text{Amount} - \mu_{\text{Amount}}}{\sigma_{\text{Amount}}} $$

⚖️ Class Imbalance Handling

SMOTE (Synthetic Minority Over-sampling Technique)

Generates synthetic samples using k-nearest neighbors:

$$ X_{\text{synthetic}} = X_i + \lambda \cdot (X_{\text{nn}} - X_i) $$

where:

$X_i$ is a minority class sample
$X_{\text{nn}}$ is one of its k-nearest neighbors
$\lambda \sim U(0,1)$ is a random interpolation factor

Sampling Strategy: $\frac{N_{\text{fraud}}}{N_{\text{legitimate}}} = 0.5$ (from 0.0017)

🌲 Ensemble Learning Algorithms

1. XGBoost (eXtreme Gradient Boosting)

Objective function with regularization:

$$ \mathcal{L}^{(t)} = \sum_{i=1}^{n} l(y_i, \hat{y}_i^{(t-1)} + f_t(x_i)) + \Omega(f_t) $$

$$ \Omega(f) = \gamma T + \frac{1}{2}\lambda \sum_{j=1}^{T} w_j^2 $$

where:

$l$ is the loss function (binary cross-entropy)
$f_t$ is the $t$-th tree
$T$ is the number of leaves
$\gamma, \lambda$ are regularization parameters

Key Hyperparameters:

Learning rate: $\eta = 0.01$
Max depth: 6
Subsample ratio: 0.8
Min child weight: 1

2. Random Forest

Ensemble of decision trees with bootstrap aggregating:

$$ \hat{y} = \text{mode}{h_1(x), h_2(x), \ldots, h_B(x)} $$

where $h_b$ are individual decision trees trained on bootstrapped samples.

Gini Impurity for split criterion:

$$ \text{Gini}(p) = 1 - \sum_{k=1}^{K} p_k^2 $$

3. LightGBM (Light Gradient Boosting Machine)

Uses Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB):

$$ \tilde{G}_j = \frac{1}{n} \left( \sum_{i \in A_l} g_i + \frac{1-a}{b} \sum_{i \in A_s} g_i \right) $$

where $A_l$ are instances with large gradients, $A_s$ are sampled small gradient instances.

🎯 Loss Function & Optimization

Binary Cross-Entropy Loss:

$$ \mathcal{L}(\theta) = -\frac{1}{n} \sum_{i=1}^{n} \left[ y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i) \right] $$

With class weights to handle imbalance:

$$ w_{\text{fraud}} = \frac{N_{\text{total}}}{2 \cdot N_{\text{fraud}}} \approx 289 $$

📈 Evaluation Metrics

1. Precision, Recall, F1-Score

$$ \text{Precision} = \frac{TP}{TP + FP} = 0.957 $$

$$ \text{Recall} = \frac{TP}{TP + FN} = 0.824 $$

$$ F_1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} = 0.885 $$

2. ROC-AUC (Receiver Operating Characteristic)

$$ \text{AUC} = \int_0^1 \text{TPR}(t) , d[\text{FPR}(t)] = 0.987 $$

where:

$\text{TPR} = \frac{TP}{TP + FN}$ (True Positive Rate)
$\text{FPR} = \frac{FP}{FP + TN}$ (False Positive Rate)

3. Matthews Correlation Coefficient (MCC)

$$ \text{MCC} = \frac{TP \cdot TN - FP \cdot FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}} $$

Ranges from -1 to 1; ideal for imbalanced datasets.

🔍 Hyperparameter Optimization

Bayesian Optimization using Optuna framework:

$$ x^* = \arg\max_{x \in \mathcal{X}} f(x) $$

where $f(x)$ is the validation PR-AUC score.

Uses Tree-structured Parzen Estimator (TPE) to model:

$$ p(x|y) = \begin{cases} l(x) & \text{if } y < y^* \\ g(x) & \text{if } y \geq y^* \end{cases} $$

Search Space:

Learning rate: $\eta \in [0.001, 0.3]$ (log scale)
Max depth: $[3, 10]$
Number of estimators: $[100, 1000]$
Subsample: $[0.5, 1.0]$

🎲 Ensemble Voting Strategy

Soft Voting for final prediction:

$$ \hat{y} = \arg\max_{c} \sum_{i=1}^{M} w_i \cdot P_i(c|x) $$

where:

$M$ is the number of models (XGBoost, Random Forest, LightGBM)
$w_i$ are model weights based on validation performance
$P_i(c|x)$ is the predicted probability from model $i$

📊 Threshold Optimization

Optimal classification threshold $\tau^*$ maximizes F1-score:

$$ \tau^* = \arg\max_{\tau} F_1(\tau) $$

$$ \hat{y} = \begin{cases} 1 & \text{if } P(y=1|x) \geq \tau^* \\ 0 & \text{otherwise} \end{cases} $$

Default: $\tau^* = 0.5$, but adjustable based on business requirements (precision vs. recall trade-off).

✨ Features

Advanced ML Ensemble _{XGBoost + Random Forest + LightGBM with soft voting}	99.92% Accuracy _{ROC-AUC: 0.987 \| PR-AUC: 0.854}	Imbalance Handling _{SMOTE, ADASYN, class weight optimization}
Geolocation Analytics _{Location-based anomaly detection}	Device Fingerprinting _{Multi-device tracking and profiling}	Real-time Processing _{< 50ms prediction latency}
JWT Authentication _{Secure token-based auth with bcrypt}	Modern UI/UX _{TailwindCSS with dark/light mode}	Comprehensive Dashboard _{Transaction analytics & visualizations}
RESTful API _{Flask + Express.js microservices}	Hyperparameter Tuning _{Bayesian optimization with Optuna}	MongoDB Integration _{Scalable NoSQL data persistence}

🛠️ Technology Stack

🎨 Frontend Layer

React 18.3 with hooks | TailwindCSS 3.4 for styling | Vite for blazing-fast builds | React Router v6 for navigation | Context API for state management

⚙️ Backend Layer

Node.js 20+ with Express.js | JWT authentication | Bcrypt password hashing | Mongoose ODM | CORS & security middleware

🗄️ Database Layer

MongoDB 7.0 for document storage | Mongoose 8.0 for schema validation | Indexed queries for performance | Transaction logging & audit trails

🤖 Machine Learning & AI Layer

🔧 Development & DevOps Tools

📊 Visualization & Monitoring

Matplotlib & Seaborn for ML visualizations | Chart.js for frontend dashboards | ROC curves, confusion matrices, feature importance plots

🔒 Security & Authentication

JWT (JSON Web Tokens) for stateless authentication
Bcrypt for password hashing (10 rounds)
CORS configuration for cross-origin requests
Helmet.js for HTTP header security
Rate limiting to prevent DoS attacks
Input validation with Joi/express-validator
HTTPS encryption in production

🏗️ Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                     PRESENTATION LAYER                       │
│  React Frontend (client_new/) + TailwindCSS + Vite          │
│  • User authentication UI                                    │
│  • Transaction submission forms                              │
│  • Real-time fraud detection dashboard                       │
│  • Analytics & visualization                                 │
└────────────────────────┬────────────────────────────────────┘
                         │ HTTP/REST API
                         │
┌────────────────────────▼────────────────────────────────────┐
│                    APPLICATION LAYER                         │
│  Node.js + Express.js Backend (server/)                     │
│  • JWT authentication middleware                             │
│  • Transaction API endpoints                                 │
│  • Request validation & error handling                       │
│  • Communication with ML service                             │
└────────────────────────┬────────────────────────────────────┘
                         │
        ┌────────────────┴────────────────┐
        │                                  │
┌───────▼──────────┐            ┌─────────▼─────────┐
│  DATA LAYER      │            │   ML SERVICE      │
│  MongoDB         │            │   Python Flask    │
│  • User data     │            │   (Model/api/)    │
│  • Transactions  │            │   • Preprocessing │
│  • Audit logs    │            │   • Feature eng.  │
└──────────────────┘            │   • Prediction    │
                                │   • Ensemble      │
                                └───────────────────┘

📂 Project Structure

Defraudo/
├── 📁 client_new/                    # React Frontend Application
│   ├── 📁 src/
│   │   ├── 📁 api/                   # API client functions
│   │   │   └── 📄 transactionApi.js  # Transaction service integration
│   │   ├── 📁 components/            # Reusable React components
│   │   │   ├── 📄 Navbar.jsx         # Navigation bar with theme toggle
│   │   │   ├── 📄 TransactionForm.jsx # Transaction submission form
│   │   │   ├── 📄 TransactionList.jsx # Transaction history display
│   │   │   └── 📄 Footer.jsx         # Application footer
│   │   ├── 📁 context/               # React Context providers
│   │   │   ├── 📄 ThemeContext.jsx   # Dark/Light theme management
│   │   │   └── 📄 AuthContext.jsx    # Authentication state
│   │   ├── 📁 pages/                 # Route page components
│   │   │   ├── 📄 Home.jsx           # Landing page
│   │   │   ├── 📄 Login.jsx          # User login
│   │   │   ├── 📄 Register.jsx       # User registration
│   │   │   └── 📄 TransactionPage.jsx # Transaction dashboard
│   │   ├── 📄 App.jsx                # Main application component
│   │   ├── 📄 main.jsx               # React entry point
│   │   └── 📄 index.css              # Global styles
│   ├── 📄 index.html                 # HTML template
│   ├── 📄 package.json               # Dependencies & scripts
│   ├── 📄 tailwind.config.js         # TailwindCSS configuration
│   ├── 📄 vite.config.js             # Vite build configuration
│   ├── 📄 postcss.config.js          # PostCSS configuration
│   └── 📄 eslint.config.js           # ESLint rules
│
├── 📁 server/                        # Node.js + Express Backend
│   ├── 📁 config/
│   │   └── 📄 db.js                  # MongoDB connection setup
│   ├── 📁 controllers/
│   │   ├── 📄 authController.js      # Authentication logic
│   │   └── 📄 transactionController.js # Transaction CRUD operations
│   ├── 📁 middlewares/
│   │   └── 📄 authMiddleware.js      # JWT verification middleware
│   ├── 📁 models/
│   │   ├── 📄 User.js                # User schema (Mongoose)
│   │   └── 📄 Transaction.js         # Transaction schema (Mongoose)
│   ├── 📁 routes/
│   │   ├── 📄 authRoutes.js          # Auth endpoints
│   │   └── 📄 transactionRoutes.js   # Transaction endpoints
│   ├── 📄 server.js                  # Express server entry point
│   └── 📄 package.json               # Backend dependencies
│
├── 📁 Model/                         # ML Fraud Detection Pipeline
│   ├── 📁 api/                       # Flask REST API
│   │   ├── 📄 __init__.py
│   │   └── 📄 app.py                 # API endpoints (predict, batch)
│   ├── 📁 configs/
│   │   └── 📄 config.yaml            # Model hyperparameters & settings
│   ├── 📁 data/
│   │   ├── 📁 raw/                   # Original dataset (creditcard.csv)
│   │   └── 📁 processed/             # Preprocessed & split data
│   ├── 📁 logs/                      # Training logs & visualizations
│   │   └── 📁 plots/                 # ROC curves, confusion matrices
│   ├── 📁 models/                    # Serialized trained models
│   │   ├── 📄 fraud_detector.joblib  # Primary model (XGBoost/ensemble)
│   │   ├── 📄 preprocessor.joblib    # StandardScaler & transformers
│   │   └── 📄 feature_engineer.joblib # Feature engineering pipeline
│   ├── 📁 notebooks/                 # Jupyter notebooks (EDA)
│   ├── 📁 src/                       # Core ML modules
│   │   ├── 📄 __init__.py
│   │   ├── 📄 config.py              # Configuration management
│   │   ├── 📄 data_loader.py         # Dataset loading & downloading
│   │   ├── 📄 preprocessor.py        # Data cleaning & scaling
│   │   ├── 📄 feature_engineer.py    # Feature transformations
│   │   ├── 📄 model_trainer.py       # Model training & tuning
│   │   ├── 📄 evaluator.py           # Performance evaluation
│   │   └── 📄 predictor.py           # Inference interface
│   ├── 📁 tests/
│   │   ├── 📄 __init__.py
│   │   └── 📄 test_pipeline.py       # Unit tests
│   ├── 📄 train.py                   # Main training script
│   ├── 📄 download_data.py           # Kaggle dataset downloader
│   ├── 📄 requirements.txt           # Python dependencies
│   └── 📄 README.md                  # ML pipeline documentation
│
├── 📁 fraud_detection_app/           # Flutter Mobile App (Optional)
│   ├── 📁 lib/
│   │   ├── 📄 main.dart              # Flutter app entry point
│   │   ├── 📁 config/                # App configuration
│   │   ├── 📁 models/                # Data models
│   │   ├── 📁 providers/             # State management
│   │   ├── 📁 screens/               # UI screens
│   │   └── 📁 services/              # API services
│   ├── 📄 pubspec.yaml               # Flutter dependencies
│   └── 📄 analysis_options.yaml      # Dart analyzer options
│
├── 📄 package.json                   # Root workspace configuration
└── 📄 README.md                      # This file (project documentation)

📝 Key Directories Explained

Directory	Purpose	Technologies
client_new/	Modern React frontend with TailwindCSS styling	React, TailwindCSS, Vite, Axios
server/	RESTful API backend with authentication	Node.js, Express, MongoDB, JWT
Model/	End-to-end ML pipeline from training to deployment	Python, Scikit-learn, XGBoost, Flask
fraud_detection_app/	Cross-platform mobile application	Flutter, Dart

📦 Installation & Setup

📋 Prerequisites

Ensure you have the following installed on your system:

Software	Version	Purpose
Node.js	20.x or higher	Backend & frontend runtime
npm	10.x or higher	Package manager
Python	3.11+	ML model training & API
pip	Latest	Python package installer
MongoDB	7.0+	Database (local or Atlas)
Git	Latest	Version control

Optional but recommended:

CUDA Toolkit (for GPU acceleration during training)
Postman (for API testing)
VS Code (recommended IDE)

🔹 Step 1: Clone the Repository

git clone https://github.com/manan-monani/Payment-Fraud-Detection-Model.git
cd Payment-Fraud-Detection-Model

🔹 Step 2: Backend Setup (Node.js + Express)

# Navigate to server directory
cd server

# Install dependencies
npm install

# Install additional security packages (if not in package.json)
npm install helmet express-rate-limit joi

Environment Configuration

Create a .env file in the server/ directory:

# Server Configuration
PORT=7000
NODE_ENV=production

# MongoDB Configuration
MONGO_URI=mongodb://localhost:27017/fraud_detection
# Or use MongoDB Atlas:
# MONGO_URI=mongodb+srv://username:password@cluster.mongodb.net/fraud_detection?retryWrites=true&w=majority

# JWT Configuration
JWT_SECRET=your_super_secure_jwt_secret_key_here_min_32_chars
JWT_EXPIRE=7d

# CORS Configuration
CLIENT_URL=http://localhost:5173

# ML Service URL
ML_API_URL=http://localhost:5000

Security Best Practices:

Generate a strong JWT secret: node -e "console.log(require('crypto').randomBytes(64).toString('hex'))"
Never commit .env to version control
Use environment-specific .env files (.env.development, .env.production)

Start Backend Server

# Development mode with auto-reload
npm run dev

# Production mode
npm start

Server will be running at http://localhost:7000

🔹 Step 3: Frontend Setup (React + Vite)

# Navigate to frontend directory
cd ../client_new

# Install dependencies
npm install

# Install additional dependencies (if needed)
npm install axios react-router-dom

Frontend Environment Configuration

Create a .env file in the client_new/ directory:

VITE_API_URL=http://localhost:7000/api
VITE_ML_API_URL=http://localhost:5000

Start Development Server

npm run dev

Frontend will be running at http://localhost:5173

🔹 Step 4: ML Model Setup (Python + Flask)

# Navigate to ML directory
cd ../Model

# Create virtual environment
python -m venv venv

# Activate virtual environment
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# If requirements.txt is missing, install manually:
pip install flask flask-cors pandas numpy scikit-learn xgboost lightgbm \
            optuna imbalanced-learn matplotlib seaborn joblib pyyaml kaggle

Download Dataset

Option 1: Kaggle API (Recommended)

# Configure Kaggle API credentials
# Download kaggle.json from https://www.kaggle.com/settings/account
# Place in: 
#   Windows: C:\Users\<Username>\.kaggle\kaggle.json
#   Linux/Mac: ~/.kaggle/kaggle.json

# Download dataset
python download_data.py

# Or using Kaggle CLI directly:
kaggle datasets download -d mlg-ulb/creditcardfraud
unzip creditcardfraud.zip -d data/raw/

Option 2: Manual Download

Visit Credit Card Fraud Detection Dataset
Download creditcard.csv
Place in Model/data/raw/ directory

Option 3: Synthetic Data (for testing)

python train.py --synthetic --samples 100000

Train the Model

# Quick training (no hyperparameter tuning) - ~5 minutes
python train.py --quick

# Full training with Optuna hyperparameter tuning - ~30-60 minutes
python train.py

# Train specific model
python train.py --model xgboost

# Train ensemble model (recommended for best performance)
python train.py --model ensemble

# Compare multiple models
python train.py --compare

Start ML API Server

# Development server
python -m api.app

# Or using Flask CLI
export FLASK_APP=api.app
flask run --host=0.0.0.0 --port=5000

# Production server with Gunicorn (Linux/Mac)
pip install gunicorn
gunicorn api.app:app -w 4 -b 0.0.0.0:5000 --timeout 120

# Production server with Waitress (Windows)
pip install waitress
waitress-serve --host=0.0.0.0 --port=5000 api.app:app

ML API will be running at http://localhost:5000

🔹 Step 5: MongoDB Setup

Option A: Local MongoDB

# Windows (with MongoDB installed)
net start MongoDB

# Linux
sudo systemctl start mongod

# macOS (with Homebrew)
brew services start mongodb-community

Option B: MongoDB Atlas (Cloud)

Create account at MongoDB Atlas
Create a new cluster (free tier available)
Create a database user
Whitelist your IP address (or allow from anywhere for development)
Get connection string and update MONGO_URI in .env

🚀 Step 6: Run Complete Application

Open 3 separate terminals:

# Terminal 1: Backend
cd server
npm start

# Terminal 2: Frontend
cd client_new
npm run dev

# Terminal 3: ML Service
cd Model
python -m api.app

Access the application:

Frontend: http://localhost:5173
Backend API: http://localhost:7000
ML API: http://localhost:5000
API Docs: http://localhost:5000/ (ML API info)

✅ Verify Installation

Test each service independently:

# Test Backend
curl http://localhost:7000/api/auth/health

# Test ML API
curl http://localhost:5000/health

# Test MongoDB connection
# From MongoDB shell:
mongosh
use fraud_detection
db.users.find()

🐛 Troubleshooting

Issue	Solution
Port already in use	Change port in `.env` or kill process: `npx kill-port 7000`
MongoDB connection failed	Check if MongoDB is running, verify `MONGO_URI`
Python module not found	Activate virtual environment, reinstall requirements
CORS errors	Check `CLIENT_URL` in backend `.env`, verify CORS configuration
Model not found	Run `python train.py` to train model first
Memory error during training	Reduce dataset size or use `--quick` flag

🖥️ Screenshots

🌙 Dark Mode | ☀️ Light Mode

Beautiful, responsive UI with seamless theme switching

📊 API Endpoints

🔐 Authentication API (Node.js Backend)

Base URL: http://localhost:7000/api/auth

Method	Endpoint	Description	Auth Required
`POST`	`/register`	Register a new user	❌
`POST`	`/login`	User login (returns JWT token)	❌
`GET`	`/profile`	Get current user profile	✅
`PUT`	`/profile`	Update user profile	✅

Register User

POST /api/auth/register
Content-Type: application/json

{
  "name": "John Doe",
  "email": "john@example.com",
  "password": "SecurePass123!"
}

# Response (201 Created)
{
  "success": true,
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "user": {
    "id": "648f1a2c3d4e5f6g7h8i9j0k",
    "name": "John Doe",
    "email": "john@example.com"
  }
}

Login User

POST /api/auth/login
Content-Type: application/json

{
  "email": "john@example.com",
  "password": "SecurePass123!"
}

# Response (200 OK)
{
  "success": true,
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "user": {
    "id": "648f1a2c3d4e5f6g7h8i9j0k",
    "name": "John Doe",
    "email": "john@example.com"
  }
}

💳 Transaction API (Node.js Backend)

Base URL: http://localhost:7000/api/transactions

Method	Endpoint	Description	Auth Required
`GET`	`/`	Get all user transactions	✅
`GET`	`/:id`	Get specific transaction	✅
`POST`	`/`	Create new transaction & check for fraud	✅
`DELETE`	`/:id`	Delete transaction	✅

Create Transaction

POST /api/transactions
Authorization: Bearer <JWT_TOKEN>
Content-Type: application/json

{
  "amount": 150.00,
  "merchant": "Amazon",
  "location": "New York, USA",
  "device_id": "device_12345",
  "description": "Online purchase"
}

# Response (201 Created)
{
  "success": true,
  "transaction": {
    "id": "648f2b3c4d5e6f7g8h9i0j1k",
    "user_id": "648f1a2c3d4e5f6g7h8i9j0k",
    "amount": 150.00,
    "merchant": "Amazon",
    "location": "New York, USA",
    "device_id": "device_12345",
    "fraud_prediction": {
      "is_fraud": false,
      "fraud_probability": 0.023,
      "risk_level": "VERY LOW",
      "confidence": 0.977
    },
    "timestamp": "2026-01-03T10:30:45.123Z"
  }
}

🤖 ML Prediction API (Flask)

Base URL: http://localhost:5000

Method	Endpoint	Description	Rate Limit
`GET`	`/`	API information	None
`GET`	`/health`	Health check	None
`GET`	`/model/info`	Model metadata & performance	None
`POST`	`/predict`	Single transaction prediction	100/min
`POST`	`/predict/batch`	Batch prediction (up to 1000)	10/min
`GET`	`/threshold`	Get current threshold	None
`POST`	`/threshold`	Update classification threshold	None

Single Prediction

POST /predict
Content-Type: application/json

{
  "Time": 0,
  "V1": -1.359807134,
  "V2": -0.072781173,
  "V3": 2.536346738,
  "V4": 1.378155224,
  "V5": -0.338320769,
  "V6": 0.462387778,
  "V7": 0.239598554,
  "V8": 0.098697901,
  "V9": 0.363786970,
  "V10": 0.090794172,
  "V11": -0.551599533,
  "V12": -0.617800856,
  "V13": -0.991389847,
  "V14": -0.311169354,
  "V15": 1.468176972,
  "V16": -0.470400525,
  "V17": 0.207971242,
  "V18": 0.025790626,
  "V19": 0.403992960,
  "V20": 0.251412098,
  "V21": -0.018306778,
  "V22": 0.277837576,
  "V23": -0.110473910,
  "V24": 0.066928075,
  "V25": 0.128539358,
  "V26": -0.189114844,
  "V27": 0.133558377,
  "V28": -0.021053053,
  "Amount": 149.62
}

# Response (200 OK)
{
  "success": true,
  "prediction": {
    "is_fraud": false,
    "label": "Legitimate",
    "fraud_probability": 0.0234,
    "confidence": 0.9766,
    "risk_level": "VERY LOW",
    "threshold": 0.5
  },
  "processing_time_ms": 12.45
}

Risk Level Classification

Probability Range	Risk Level	Action Recommendation
0.00 - 0.20	VERY LOW	✅ Approve automatically
0.20 - 0.40	LOW	✅ Approve with monitoring
0.40 - 0.60	MEDIUM	⚠️ Request additional verification
0.60 - 0.80	HIGH	⚠️ Hold for manual review
0.80 - 1.00	VERY HIGH	❌ Block and alert

Batch Prediction

POST /predict/batch
Content-Type: application/json

{
  "transactions": [
    {
      "Time": 0,
      "V1": -1.35, 
      "V2": -0.07,
      // ... V3-V28
      "Amount": 149.62
    },
    {
      "Time": 1,
      "V1": 1.19,
      "V2": 0.26,
      // ... V3-V28
      "Amount": 2.69
    }
  ]
}

# Response (200 OK)
{
  "success": true,
  "predictions": [
    {
      "is_fraud": false,
      "fraud_probability": 0.023,
      "risk_level": "VERY LOW"
    },
    {
      "is_fraud": false,
      "fraud_probability": 0.015,
      "risk_level": "VERY LOW"
    }
  ],
  "summary": {
    "total": 2,
    "fraudulent": 0,
    "legitimate": 2,
    "processing_time_ms": 25.67
  }
}

Model Information

GET /model/info

# Response (200 OK)
{
  "success": true,
  "model": {
    "name": "XGBoost Fraud Detector",
    "version": "2.0.0",
    "type": "ensemble",
    "algorithms": ["xgboost", "random_forest", "lightgbm"],
    "training_date": "2026-01-03",
    "dataset_size": 284807,
    "features": 30
  },
  "performance": {
    "accuracy": 0.9992,
    "precision": 0.957,
    "recall": 0.824,
    "f1_score": 0.885,
    "roc_auc": 0.987,
    "pr_auc": 0.854
  },
  "threshold": 0.5
}

📝 Error Responses

// 400 Bad Request
{
  "success": false,
  "error": "Validation Error",
  "message": "Missing required features: V1, V2, Amount"
}

// 401 Unauthorized
{
  "success": false,
  "error": "Authentication Failed",
  "message": "Invalid or expired token"
}

// 429 Too Many Requests
{
  "success": false,
  "error": "Rate Limit Exceeded",
  "message": "Too many requests. Please try again later.",
  "retry_after": 60
}

// 500 Internal Server Error
{
  "success": false,
  "error": "Internal Server Error",
  "message": "Model prediction failed"
}

🔐 Fraud Detection Patterns & Rules

The ML ensemble model detects fraud based on sophisticated pattern recognition and statistical anomaly detection:

📊 Detection Methodology

1. Amount Anomaly Detection

$$ z_{\text{amount}} = \frac{\text{Amount} - \mu_{\text{user}}}{\sigma_{\text{user}}} $$

Flags transactions with $|z_{\text{amount}}| > 3$ (> 3 standard deviations from user's historical mean)

Patterns:

💵 Significantly higher amounts than user's typical spending
💰 Micro-transactions (< $1) often used for card testing
📈 Progressive amount increase (velocity-based fraud)

2. Temporal Pattern Analysis

$$ \text{fraud_score}_{\text{temporal}} = w_1 \cdot \mathbb{1}_{\text{night}} + w_2 \cdot \Delta t^{-1} $$

⏰ Odd timing: Transactions at 2-5 AM (higher fraud probability)
⚡ Rapid consecutive transactions: Multiple transactions within seconds
📅 Day-of-week patterns: Unusual activity on weekends

3. Device & Location Verification

$$ \text{distance}(loc_1, loc_2) = R \cdot \arccos(\sin\phi_1\sin\phi_2 + \cos\phi_1\cos\phi_2\cos(\lambda_2-\lambda_1)) $$

📱 New Device ID: First-time devices trigger additional scrutiny
🌍 Impossible travel: Distance/time ratio exceeds physical limits (e.g., 1000 km in 1 hour)
📍 High-risk geolocation: Countries with elevated fraud rates

4. Behavioral Biometrics

🖱️ Transaction frequency: Deviation from established patterns
🛒 Merchant category: Unusual merchant types for user profile
💳 Purchase patterns: Inconsistent with historical behavior

5. Feature Correlation Analysis

The model analyzes interactions between PCA features: $$ \text{anomaly_score} = \sum_{i=1}^{28} w_i \cdot |V_i - \mu_{V_i}| + \sum_{i<j} w_{ij} \cdot V_i \cdot V_j $$

🚨 Fraud Indicators (Weighted Features)

Feature	Weight	Description
V14	0.18	Highest correlation with fraud
V12	0.15	Card usage patterns
V10	0.13	Transaction frequency indicators
V17	0.12	Geographic anomalies
V4	0.11	Amount-related patterns
Amount	0.09	Transaction amount
Time	0.07	Temporal patterns

✅ Legitimate Transaction Characteristics

Consistent with user's historical spending patterns
Recognizable device IDs
Geographically plausible locations
Normal transaction frequency
Typical merchant categories

📫 Contact

Manan Monani

Full-Stack Developer | ML Engineer | Payment Systems Specialist

🌐 Connect With Me

LinkedIn	GitHub	YouTube	Kaggle
LeetCode

📞 Contact Information

Email
mmmonani747@gmail.com

Phone
🇮🇳 +91 70168 53244

Location
📍 Jamnagar, Gujarat, India

💼 Portfolio

Portfolio Website: 🚧 Coming Soon (Deployment in progress)

📊 GitHub Statistics

📬 Let's Collaborate!

I'm open to:

💼 Full-time opportunities in ML/AI & Full-Stack Development
🤝 Collaboration on open-source projects
📚 Knowledge sharing & technical discussions
🎓 Mentorship in ML, Python, and MERN stack

Response Time: Usually within 24 hours

💡 If you found this project helpful, please give it a ⭐!

Made with ❤️ by Manan Monani and the Defraudo Team

👥 Team Members

This project was developed and presented at Indus University by Team Defraudo:

Manan Monani _{Team Lead & ML Engineer} GitHub	Nevil Dhinoja _{Backend Developer}	Krishil Agrawal _{Frontend Developer}
Parthiv Panchal _{Full-Stack Developer}	Astha Makwana _{UI/UX Designer}	Yashvi Bhadani _{Data Analyst}

🎓 Academic Project: Presented at Indus University as a capstone project demonstrating advanced ML techniques in financial fraud detection.

📌 License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

Copyright (c) 2026 Manan Monani

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

🙏 Acknowledgments

Kaggle for providing the Credit Card Fraud Detection Dataset
Indus University for academic support and resources
Open Source Community for the amazing libraries and tools
XGBoost, LightGBM & Scikit-learn teams for exceptional ML frameworks

📚 References & Citations

Dataset: Machine Learning Group - ULB. (2018). Credit Card Fraud Detection. Kaggle. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
XGBoost: Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD '16.
SMOTE: Chawla, N. V., et al. (2002). SMOTE: Synthetic Minority Over-sampling Technique. JAIR, 16, 321-357.
Imbalanced Learning: Lemaitre, G., et al. (2017). Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. JMLR, 18(17), 1-5.
LightGBM: Ke, G., et al. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. NIPS 2017.
Hyperparameter Optimization: Akiba, T., et al. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. KDD 2019.

📈 Project Statistics

⚡ Built with cutting-edge technology | 🛡️ Production-ready architecture | 📊 Industry-grade ML pipeline

Thank you for visiting! Don't forget to star ⭐ this repository!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Model		Model
client		client
client_new		client_new
fraud_detection_app		fraud_detection_app
server		server
.gitignore		.gitignore
README.md		README.md
package.json		package.json

manan-monani/Payment-Fraud-Detection-Model

Folders and files

Latest commit

History

Repository files navigation

🛡️ Defraudo - AI-Powered Payment Fraud Detection System

📌 Executive Summary

🎯 Key Performance Metrics

🔬 Core Capabilities

📊 Mathematical Approach & Algorithms

🧮 Problem Formulation

🔬 Feature Engineering Pipeline

1. Principal Component Analysis (PCA) Features

2. Temporal Feature Engineering

3. Amount Transformations

⚖️ Class Imbalance Handling

SMOTE (Synthetic Minority Over-sampling Technique)

🌲 Ensemble Learning Algorithms

1. XGBoost (eXtreme Gradient Boosting)

2. Random Forest

3. LightGBM (Light Gradient Boosting Machine)

🎯 Loss Function & Optimization

📈 Evaluation Metrics

1. Precision, Recall, F1-Score

2. ROC-AUC (Receiver Operating Characteristic)

3. Matthews Correlation Coefficient (MCC)

🔍 Hyperparameter Optimization

🎲 Ensemble Voting Strategy

📊 Threshold Optimization

✨ Features

🛠️ Technology Stack

🎨 Frontend Layer

⚙️ Backend Layer

🗄️ Database Layer

🤖 Machine Learning & AI Layer

🔧 Development & DevOps Tools

📊 Visualization & Monitoring

🔒 Security & Authentication

🏗️ Architecture Overview

📂 Project Structure

📝 Key Directories Explained

📦 Installation & Setup

📋 Prerequisites

🔹 Step 1: Clone the Repository

🔹 Step 2: Backend Setup (Node.js + Express)

Environment Configuration

Start Backend Server

🔹 Step 3: Frontend Setup (React + Vite)

Frontend Environment Configuration

Start Development Server

🔹 Step 4: ML Model Setup (Python + Flask)

Download Dataset

Train the Model

Start ML API Server

🔹 Step 5: MongoDB Setup

🚀 Step 6: Run Complete Application

✅ Verify Installation

🐛 Troubleshooting

🖥️ Screenshots

📊 API Endpoints

🔐 Authentication API (Node.js Backend)

Register User

Login User

💳 Transaction API (Node.js Backend)

Create Transaction

🤖 ML Prediction API (Flask)

Single Prediction

Risk Level Classification

Batch Prediction

Model Information

📝 Error Responses

🔐 Fraud Detection Patterns & Rules

📊 Detection Methodology

1. Amount Anomaly Detection

2. Temporal Pattern Analysis

3. Device & Location Verification

4. Behavioral Biometrics

5. Feature Correlation Analysis

🚨 Fraud Indicators (Weighted Features)

✅ Legitimate Transaction Characteristics

Packages