Skip to content

ARUNAGIRINATHAN-K/LeakGuard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

title emoji colorFrom colorTo sdk sdk_version app_file pinned
LeakGuard
🛡️
purple
blue
gradio
6.5.0
app.py
false

LeakGuard

A web app that analyzes a CSV dataset BEFORE model training and detects silent data leakage risks that commonly cause models to fail in production.

Python Gradio scikit-learn Pandas NumPy License

Live Demo GitHub Repo

What It Detects

Type Detection Method Risk Indicators
Target Leakage Mutual Information, Pearson & Spearman correlation Features containing direct/indirect target information
Time Leakage Correlation drift, rolling window analysis Future information leaking into past samples
Duplicate Leakage Row hashing, entity ID overlap Same samples appearing across splits
Proxy Leakage Feature importance instability Hidden proxies acting as target substitutes

Quick Start

  1. Upload your CSV dataset
  2. Select target column (required)
  3. Select time & entity ID columns (optional)
  4. Click Analyze to get instant results

What You Get

  • Feature Risk Table - Detailed risk assessment with MI, Pearson, Spearman scores
  • Visual Analytics - 5 interactive charts showing leakage patterns
  • Risk Summary - Overall leakage risk across all categories

Tech Stack

  • Frontend: Gradio
  • Data Processing: Pandas, NumPy
  • ML Detection: Scikit-learn (Random Forest)
  • Statistics: SciPy (Spearman, MI)
  • Visualization: Matplotlib

Features

✅ CPU-only (no GPU required)
✅ Explainable results with statistical basis
✅ Fast analysis (seconds for typical datasets)
✅ Production-ready architecture

Links

📝 License

Apache 2.0


Built for Kaggle & Hugging Face Spaces | © 2026

About

HuggingFace Spaces

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages