A "one-click" data diagnostic tool that ingests raw datasets, runs automated cleaning, computes KPIs, and uses an LLM to turn cold metrics into a concise business narrative.
data_scout/
├── backend/ # FastAPI application
│ ├── main.py # API routes + static file serving
│ └── services/
│ ├── loader.py # CSV / XLSX / Parquet ingestion
│ ├── cleaner.py # Deduplication, imputation, outlier detection
│ ├── profiler.py # Metadata extraction & column selection
│ ├── stats.py # Deep statistical analysis
│ ├── kpis.py # Volume, efficiency, trend & extra KPIs
│ ├── charts.py # Chart payloads (time series, Pareto, distribution, …)
│ ├── analyzer.py # Orchestrates the full analysis pipeline
│ └── storytelling.py # LLM-powered narrative generation (Groq)
└── frontend/ # Static HTML/CSS/JS client
└── index.html
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/health |
Health check |
| POST | /api/analyze |
Upload a file and get full analysis |
| GET | /app |
Serves the frontend |
POST /api/analyze accepts multipart/form-data:
file— dataset file (.csv,.xlsx,.parquet)use_llm— boolean, whether to call the LLM for storytelling (default:true)
Response includes: cleaning summary, cleaning log, metadata, KPIs, chart payloads, and a story block.
Requirements: Python ≥ 3.12, uv
uv syncSet environment variables (optional — only needed for LLM storytelling):
export GROQ_API_KEY=your_key_here
export GROQ_MODEL=llama-3.3-70b-versatile # optional overrideuv run uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000- Frontend:
http://localhost:8000/app - API docs:
http://localhost:8000/docs
- Multi-format ingestion — CSV, Excel (
.xlsx), and Parquet - Automated cleaning — duplicate removal, missing value imputation, outlier detection
- KPI triangulation — Volume, Efficiency, Trend + extra context KPIs
- Six chart types — time series, Pareto, distribution, missing values heatmap, boxplot, correlation heatmap
- LLM storytelling — headline, observations, and recommendations via Groq (falls back to a template when no API key is set)