ReguCheck is a compact, end‑to‑end risk data pipeline that mirrors how banks turn raw loan files into a reporting‑ready view: ingest → validate → reconcile → report. It uses public Lending Club data to demonstrate governance checks, control reconciliations, and business‑friendly reporting in one flow.
Live Demo: https://regucheck-risk-engine-v9qextfge3a8chvnywvqtg.streamlit.app/
- Automated data ingestion from Kaggle using
kagglehub - Governance & validation checks for completeness, accuracy, and validity
- Reconciliation control against a mock General Ledger (GL)
- Interactive dashboard built with Streamlit and Plotly
-
data_loader.py
Downloads the dataset, locates the CSV, samples to 10,000 rows for speed, standardizes column names, and outputsstaged_loan_data.csv. -
governance_engine.py
Applies data quality rules and splits the data into:validated_loans.csv(clean)dq_exceptions.csv(errors withError_Reason)
-
reconciliation.py
Compares validated exposure totals by risk rating vs. a mock GL, calculates variance and status, and outputsrecon_report.csv. -
app.py
Streamlit dashboard with three tabs:- Portfolio: KPIs + exposure by risk rating
- Data Quality: exception records + failure rate
- Reconciliation: variance report with conditional formatting
pip install -r requirements.txt
python data_loader.py
python governance_engine.py
python reconciliation.py
streamlit run app.py
-
Portfolio tab
Shows total exposure and weighted average rate. The bar chart displays exposure by Risk Rating (A–G) where A = lowest risk and G = highest risk. -
Data Quality tab
Shows rows that failed validation and a Data Failure Rate (% of bad rows). -
Reconciliation tab
Compares data vs. mock GL totals. Variance > 1% is flagged as Investigation Required.
Not directly. GitHub Pages only hosts static sites, while Streamlit apps require a Python server.
- Streamlit Community Cloud (free for public GitHub repos)
- Render, Railway, Fly.io, or Heroku
- Push this repo to GitHub.
- Go to https://share.streamlit.io
- Select your repo and set
app.pyas the entry point.
Note: Kaggle downloads on first run may require you to set Kaggle credentials if the environment doesn’t already have them.
Raw CSVs are ignored by .gitignore to keep the repo lightweight. The pipeline
will regenerate them locally.



