Skip to content

Latest commit

 

History

History
88 lines (67 loc) · 2.85 KB

File metadata and controls

88 lines (67 loc) · 2.85 KB

Anomaly Detection (Transactions & Sales)

Anomaly detection in synthetic transaction and sales data with Python. Generates realistic data, injects unusual events, and applies Isolation Forest, Local Outlier Factor, and Z-score methods to detect outliers. Produces anomaly reports and visualizations for portfolio-ready demonstration of data science skills.

Detect anomalies in synthetic transaction data using Isolation Forest, Local Outlier Factor (LOF), and a Z-score baseline. The project generates data, injects anomalies, runs detectors, and exports flagged rows and charts for audit.


Features

  • Synthetic transaction generator with anomalies (bursts, extreme purchases, negative/zero entries)
  • Multi-model detection: Isolation Forest, LOF, Z-score
  • Unified anomaly report with model votes and severity score
  • Clean visualizations: time-series spikes and amount distribution
  • Reproducible scripts with deterministic seeding

Project Structure

anomaly-detection/
├─ README.md
├─ LICENSE
├─ requirements.txt
├─ data/
│  └─ generate_transactions.py
├─ src/
│  ├─ detect_anomalies.py
│  └─ utils.py
└─ outputs/
   └─ figures & reports

Setup

python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate
pip install -r requirements.txt

Generate Synthetic Data

python data/generate_transactions.py --start 2023-01-01 --end 2024-12-31 --seed 42 --n-customers 500 --out data/transactions.csv

Run Anomaly Detection

python src/detect_anomalies.py --input data/transactions.csv --outdir outputs --contamination 0.02

Outputs

  • outputs/anomalies.csv – flagged rows with anomaly scores & model votes
  • outputs/fig_amount_time.png – transaction amounts over time with spikes
  • outputs/fig_amount_hist.png – amount distribution histogram

Sample Results

Transaction Amount Distribution

Shows most transactions are small (0–300 units). A few very large amounts (thousands) appear as outliers.

fig_amount_hist

Transaction Amounts Over Time

Transactions are generally stable, but occasional spikes (extreme purchases or errors) appear and are flagged as anomalies.

fig_amount_time

Data Schema

column description
tx_id unique transaction ID
date timestamp (daily resolution)
customer_id customer identifier
category product category
amount transaction amount (float)