Skip to content

mauree155/FraudWatch-Africa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FraudWatch Africa: Detecting Fraud in Mobile Money Transactions with Unsupervised Learning

Table of Contents

  1. Project Background
  2. Project Goal
  3. Key Features
  4. Methodology
  5. Results and Discussion
  6. Dashboard & Deployment
  7. Tools & Technologies
  8. Conclusion
  9. Future Work
  10. How to Run the Project
  11. Acknowledgments

Project Background

Mobile Money in Africa

Mobile money has transformed financial inclusion in Africa. Services like M-Pesa (Kenya), MTN Mobile Money (Uganda), and Airtel Money (West Africa) allow millions of people to send money, pay bills, and manage their finances without relying on traditional banks.

With over 300 million active users in Sub-Saharan Africa, mobile money platforms are now the backbone of everyday transactions.

However, this rapid growth also introduces security challenges:

  • Limited regulatory oversight
  • High transaction volumes
  • The anonymity of mobile wallets

Together, these factors make mobile money ecosystems a prime target for fraudsters. Common fraud tactics include:

  • SIM swaps
  • Account takeovers
  • Fraudulent transfers

The Fraud Detection Challenge

Fraudulent transactions are notoriously difficult to detect because they rarely follow predictable patterns. Traditional supervised machine learning approaches require labeled fraudulent data, which is often scarce or unavailable.

To address this challenge, this project leverages unsupervised learning, where the model learns to identify outliers that deviate from normal transaction behavior — a promising approach in fraud detection for data-scarce environments.

Project Goal

This project aims to design a scalable, real-time fraud detection system tailored to mobile money platforms in Africa.

Key objectives include:

  • Develop an unsupervised anomaly detection model (Isolation Forest) to flag unusual transaction patterns.
  • Provide a Streamlit dashboard for interactive visualization of anomalies.
  • Deploy the model using FastAPI to enable real-time fraud detection for mobile money services.

Key Features

  • Data Simulation: A synthetic dataset mimicking real-world mobile money transactions in African markets.
  • Unsupervised Model (Isolation Forest): Detect anomalies using transaction amount, frequency, location, and device type.
  • Interactive Dashboard (Streamlit): Visual monitoring of flagged transactions and fraud patterns.
  • Real-time API (FastAPI): Seamless deployment of the fraud detection engine for live monitoring and integration.

Dataset

The dataset used in this project simulates 10,000 mobile money transactions to reflect real-world activity in African markets. It includes various features such as:

  • User IDs & Device IDs – uniquely identify customers and their devices
  • Transaction Amounts – numerical values of money transfers
  • Transaction Types – send, receive, cash-in, cash-out, buy airtime, deposit, withdrawal
  • User Locations – geographic regions within Africa (e.g., Nairobi, Lagos, Kampala)
  • Transaction Channels – USSD, Mobile App, Web, Agent
  • SIM Swap Flags – indicator for possible SIM swap fraud
  • Agent IDs – identify transactions carried out through agents

Purpose of Simulation

Since real mobile money transaction datasets are rarely publicly available (due to privacy concerns), a synthetic dataset was generated to:

  • Represent typical user behaviors
  • Simulate fraudulent patterns (e.g., unusual amounts, odd times, suspicious device usage)
  • Provide enough diversity for training and validating the unsupervised model

Methodology

This project was executed using Python, with analysis performed in Jupyter Notebook and deployment via Streamlit and FastAPI.

Exploratory Data Analysis (EDA)

  • Examined the dataset to understand distributions, patterns, and potential anomalies.

  • Investigated transaction amounts, timing (hour of day, day of week), user locations, devices, and transaction types.

  • Identified preliminary trends such as skewed transaction amounts and temporal patterns, which informed feature engineering and model expectations.

Data Preprocessing

  • Handling Missing Values:
    • Numerical columns filled with median values.
    • Categorical columns filled with mode values.
  • Feature Engineering:
    • Created log_amount to reduce skewness in transaction amounts.
    • Extracted hour and day_of_week from transaction timestamps.
    • Added time_of_day (morning, afternoon, evening, night) for better interpretability.
  • Feature Scaling: Applied scaling to numerical features to improve model stability and training efficiency.

Modeling:

  • Algorithm Used: Isolation Forest, an unsupervised model for anomaly detection.
  • Training Details:
    • Model trained on the full dataset without labeled fraud data.
    • Contamination rate set to 5% to correspond with expected anomaly proportion.
  • Anomaly Identification: Model learned “normal” patterns and flagged deviations as anomalies.

Evaluation

  • Inspected flagged transactions to assess model effectiveness.
  • Applied dimensionality reduction techniques (t-SNE, UMAP) to visualize separation between normal and anomalous transactions.
  • Tuned hyperparameters such as contamination rate to optimize anomaly detection.

Deployment

  • Streamlit Dashboard: Provides an interface to explore and monitor flagged transactions interactively.
  • FastAPI Endpoint: Enables real-time fraud detection by sending new transaction data to the model and receiving predictions.

Results and Discussion

Transaction Amount Analysis

  • The amount column is right-skewed with a mean of 3,496.41, standard deviation of 3,507.29, minimum of 0.03, and maximum of 30,221.30.
  • Applying a log transformation (log_amount) produced a more normally distributed variable, which improves model performance in detecting anomalies.
  • Most transactions fall within a “normal” range, while a small fraction (~5%) represent outliers, which the Isolation Forest model successfully flagged.

Amount Distribution

Log-Scaled Amount Distribution

Transaction Timing Patterns

  • Time of Day: Most transactions occur at night (10 PM – 4 AM), suggesting fraudsters may exploit low-monitoring periods. Morning transactions follow closely, while afternoon and evening remain low.
  • Day of Week: Saturdays have the highest volume (~1,750 transactions), with weekdays relatively stable (~1,300–1,450).

Time of Day Trends

Day of Week Transactions

Insight: Monitoring should be more vigilant during high-activity periods, especially nights and weekends.

Location-based Patterns

  • Most locations show average transaction amounts between 3,300 and 3,700.
  • Eldoret, Mombasa, and Kisumu exhibit slightly higher transaction amounts, indicating potential risk areas.

Transaction Amount by Location

User and Device Insights

  • The majority of anomalies originate from agents rather than individual users (386 out of 500 anomalies).
  • By device type: iOS leads in flagged transactions, followed by Android and Feature Phones.
  • Network distribution of anomalies is fairly even: Safaricom (173), Telkom Kenya (171), Airtel (156).

Transactions by user

Transactions by Device type

Transaction Type Patterns

  • Send Money, Buy Airtime, and Deposit Cash are the transaction types most frequently flagged as anomalies.
  • This suggests that fraud monitoring can prioritize these transaction types for enhanced scrutiny.

Transactions by types

Transactions by network providers

  • The distribution of anomalies across network providers is relatively even, with Safaricom having the most anomalies (173), followed closely by Telkom Kenya (171), and then Airtel (156).

This indicates that no single network provider is disproportionately affected by the types of anomalies detected by this model.

Transactions by network providers

Dimensionality Reduction and Model Validation

  • t-SNE Visualization: Projects transactions into 2D; blue points = normal, red points = anomalies. Normal transactions form tight clusters, while anomalies appear isolated or on cluster edges.
  • UMAP Visualization: Preserves local and global structure; confirms separation between normal and anomalous transactions.

t-SNE Plot

UMAP Plot

Key Insight: Both t-SNE and UMAP confirm that the Isolation Forest model effectively identifies anomalies, providing visual proof that flagged transactions deviate from typical behavior.

Summary of Findings

  • ~5% of transactions are flagged as anomalies, consistent with the contamination parameter.
  • Anomalies are concentrated in nighttime hours, weekends, specific locations, transaction types, and device types.
  • The model’s predictions align with observed behavioral patterns, indicating the unsupervised approach is effective for fraud detection without labeled data.

Dashboard & Deployment

The FraudWatch Africa dashboard provides an interactive interface for exploring, monitoring, and predicting fraudulent transactions in real-time. It is built using Streamlit for the frontend and FastAPI for backend predictions.

Live Demo

Streamlit App FastAPI Endpoint

Dashboard Features

  • Home Page – Project introduction and banner.
  • Dashboard Page – KPIs, flagged anomalies, filters, and anomaly visualizations.
  • Predict Transaction Page – Enter transaction details for single prediction.
  • Batch Prediction Page – Upload CSV for batch fraud predictions.

Deployment Setup

  • Streamlit serves as the interactive dashboard frontend.
  • FastAPI powers the backend with REST API endpoints.
  • Communication is seamless: the dashboard sends requests to FastAPI for anomaly predictions in real time.

Screenshot

FraudWatch Africa Dashboard

screen 1 screen 2 fraud screen 3 fraud

Tools & Technologies

Here’s an overview of the tools and technologies used in this project:

screen 3 fraud

Conclusion

This project demonstrated how unsupervised learning can be applied to the challenge of fraud detection in mobile money platforms, especially in environments where labeled fraud data is scarce.

By leveraging Isolation Forest, we successfully identified anomalous transactions that may represent fraudulent activity. The results highlighted:

  • Strong potential for detecting unusual transaction behaviors in real time.
  • Practical use of dashboards (Streamlit) for monitoring and decision support.
  • Seamless integration with FastAPI for deployment, ensuring accessibility and scalability.

The solution emphasizes how data science can drive financial security in African markets, protecting millions of users and strengthening trust in mobile money systems.

Future Work

While the current system provides a strong foundation, there are opportunities to make it more powerful and robust:

  • Enhanced Models: Experiment with advanced techniques such as Autoencoders, One-Class SVM, and Graph Neural Networks for improved anomaly detection.
  • Feature Engineering: Incorporate additional features like transaction velocity, device fingerprinting, and geospatial tracking to capture more complex fraud patterns.
  • Scalability: Deploy the system on cloud platforms with distributed data pipelines (e.g., Apache Kafka, Spark) to handle millions of transactions in real time.
  • User Feedback Loop: Integrate mechanisms for human investigators to label flagged transactions, creating feedback that strengthens the model over time.
  • Cross-Border Expansion: Extend beyond Kenya to support fraud detection across multiple African mobile money markets.
  • Explainability: Add interpretable AI components so stakeholders can understand why a transaction is flagged as suspicious.

This roadmap ensures the solution continues evolving into a production-grade fraud detection system that adapts to emerging threats.

How to Run the Project

If you’d like to explore the project locally, follow these steps:

1️⃣ Clone the Repository

git clone https://github.com/mauree155/FraudWatch-Africa.git my-repo
cd my-repo

2️⃣ Create a Virtual Environment

python -m venv venv
source venv/bin/activate   # On Mac/Linux
venv\Scripts\activate     # On Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Run the FastAPI Backend

uvicorn app.main:app --reload

API available at: http://127.0.0.1:8000/docs

5️⃣ Run the Streamlit Dashboard

streamlit run o_streamlit_app.py

Dashboard available at: http://localhost:8501

Acknowledgments

This project was carried out as part of the Dataverse Africa Internship Program.
Special thanks to our mentors and teammates for their guidance and collaboration.

👥 Team Members

  • Maureen Akunna Okoro – Team Lead | Data Analyst / Data Scientist
  • Masheida Dzimaba – Data Scientist
  • Nasiru Ibrahim – Data Analyst

About

🚨 FraudWatch Africa – A machine learning-powered app using FastAPI + Streamlit to detect and analyze fraudulent financial transactions in Kenya

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors