| title | emoji | colorFrom | colorTo | sdk | sdk_version | app_file | pinned |
|---|---|---|---|---|---|---|---|
LeakGuard |
🛡️ |
purple |
blue |
gradio |
6.5.0 |
app.py |
false |
A web app that analyzes a CSV dataset BEFORE model training and detects silent data leakage risks that commonly cause models to fail in production.
| Type | Detection Method | Risk Indicators |
|---|---|---|
| Target Leakage | Mutual Information, Pearson & Spearman correlation | Features containing direct/indirect target information |
| Time Leakage | Correlation drift, rolling window analysis | Future information leaking into past samples |
| Duplicate Leakage | Row hashing, entity ID overlap | Same samples appearing across splits |
| Proxy Leakage | Feature importance instability | Hidden proxies acting as target substitutes |
- Upload your CSV dataset
- Select target column (required)
- Select time & entity ID columns (optional)
- Click Analyze to get instant results
- Feature Risk Table - Detailed risk assessment with MI, Pearson, Spearman scores
- Visual Analytics - 5 interactive charts showing leakage patterns
- Risk Summary - Overall leakage risk across all categories
- Frontend: Gradio
- Data Processing: Pandas, NumPy
- ML Detection: Scikit-learn (Random Forest)
- Statistics: SciPy (Spearman, MI)
- Visualization: Matplotlib
✅ CPU-only (no GPU required)
✅ Explainable results with statistical basis
✅ Fast analysis (seconds for typical datasets)
✅ Production-ready architecture
- Live Demo: Hugging Face Space
- GitHub: Source Code
Apache 2.0
Built for Kaggle & Hugging Face Spaces | © 2026