Production-Ready AML Detection for Banking Compliance
A machine learning-powered Anti-Money Laundering (AML) detection system with an interactive Streamlit dashboard. Built using IBM's Synthetic AML Dataset (31.9M transactions), this system achieves 73% precision and ROC-AUC 97.8%, making it production-ready for banking compliance teams.
- Real-time Transaction Scoring: Instant risk assessment for individual transactions
- Calibrated Risk Scores: Reliable probability estimates (when model says 10% risk, it means 10%)
- Automatic Feature Engineering: Extracts 20+ risk indicators from raw transaction data
- Production-Optimized: 27x reduction in false positives vs baseline models
- Interactive Dashboard: User-friendly interface for compliance analysts
- Precision: 72.75% (73% of alerts are true laundering cases)
- Recall: 18.87% (catches 1 in 5 laundering cases)
- ROC-AUC: 97.81% (excellent discrimination)
- False Positive Rate: 0.01% (only 752 false alarms out of 6.4M normal transactions)
- Daily Alert Volume: ~860 alerts (manageable workload)
├── app/ # Streamlit dashboard
│ ├── Home.py # Main page with KPIs and model overview
│ ├── pages/
│ │ ├── 01_Model_Validation.py # Performance metrics & confusion matrix
│ │ ├── 02_Investigator_Workbench.py # Real-time transaction scoring
│ │ └── 04_Data_Insight.py # EDA visualizations
├── models/ # Trained model artifacts
│ ├── calibrated_lightgbm_model.pkl
│ ├── scaler.pkl
│ ├── feature_names.json
│ └── model_config.json
├── note/ # Jupyter notebooks for model development
├── data/ # find it on IBM website
└── requirements.txt # Python dependencies
- ML Framework: LightGBM with Isotonic Calibration
- Dashboard: Streamlit
- Data Processing: Pandas, NumPy
- Visualization: Plotly
- Deployment: Aws is cooming soon
- Source: IBM Synthetic AML Dataset
- Size: 31,898,238 transactions
- Period: September 1-28, 2022
- Laundering Cases: 35,230 (0.11%)
- Class Imbalance: 1:905 ratio
Catholic University of America - MDA Capstone Project Fall 2025
- Delphin Kaduli
- Tycho Janssen
- Solomon Pinto
- Python 3.8+
- pip
# Clone the repository
git clone https://github.com/DelphinKdl/CUA-MDA-Capstone-BaaS-Risk-Monitoring.git
cd CUA-MDA-Capstone-BaaS-Risk-Monitoring
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
.\.venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run the dashboard
streamlit run app/Home.pyThe dashboard will open in your browser at http://localhost:8501
- Key performance indicators (KPIs)
- Champion model overview
- Business impact metrics
- Confusion matrix visualization
- Performance metrics
- Model configuration details
- Star Feature: Real-time transaction risk scoring
- User-friendly input form
- Automatic feature engineering
- Instant risk assessment with recommendations
- Dataset overview and statistics
- Class distribution analysis
- Payment format patterns
- Temporal patterns (day of week)
- Structuring detection analysis
- ACH Payment Format: 49x baseline risk
- Weekend Transactions: 3x baseline risk
- Structuring Pattern: $9K-$10K range (8.3x risk)
- High-Risk Institutions: Bank-level risk patterns
- Currency Risk Patterns: UK Pound structuring
- Temporal Patterns: Day of week and hour analysis
- Algorithm: LightGBM with Probability Calibration
- Training Data: 25.5M transactions (80%)
- Test Data: 6.4M transactions (20%)
- Features: 20 engineered features
- Threshold: 0.10 (optimized for precision)
- Training Strategy: Full dataset (no undersampling)
- 62% reduction in analyst workload vs alternative models
- 27x reduction in false positives
- 73% confidence in every alert
- Manageable alert volume: ~860 alerts per day
- Precision-focused strategy: Maximizes analyst efficiency
- IBM for providing the Synthetic AML Dataset
- Catholic University of America MDA Program
- Project advisors: Dr. Matthew Jacob
MIT License
Copyright (c) 2025 Delphin Kaduli, Tycho Janssen and Solomon Pinto.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights