This project implements a full Insight Extraction + Visualization Pipeline designed for the Bayer Challenge during SinceAI Hackathon in Turku. Given:
- a user prompt, and
- a dataset retrieved by a RAG system,
the system automatically produces:
- a semantic intent JSON
- enriched categories and semantic expansions
- embedding-based categorical assignment
- SQL analytics based on the user's prompt
- visualization recommendations
- a fully dynamic Streamlit dashboard
Everything runs end-to-end with a single pipeline.
The architecture is built around three core modules, all cooperating to turn natural language into structured analytics and visual insights.
Location: insight_extraction/
This module interprets the user question and structurally organizes the dataset for downstream analysis.
Transforms the natural-language prompt into a structured JSON containing:
- requested metrics
- grouping dimensions
- filters
- semantic categories
- logical operations
This provides a clean, machine-readable blueprint for analytics generation.
For each category belonging to a dimension, the system expands it using an LLM:
- textual description
- synonyms
- example sentences
These enrichments make semantic matching far more robust.
We embed:
- dataset rows
- expanded category descriptions
Using cosine similarity, each row is assigned to the closest semantic category.
The result is a new categorical dataset ready for analytics.
Using the enriched dataset, the module generates SQL queries in a consistent, parser-friendly format
Key guarantees:
- multiple, coherent insight queries
- deterministic structure (
-- HEADERlabels) - full compatibility with our SQL executor
- ready for visualization
Location: visualization_recommender/
Given:
- SQL-generated analytics DataFrames
- the original user prompt
the module selects the best possible charts to communicate insights:
- trends → line charts
- category comparisons → bar charts
- distributions → histograms
- proportions → pie charts
- anomalies → scatter/line hybrid
It outputs a file:
recs.txt
containing high-level visualization instructions.
Location: from_text_to_streamlit_app/
A fully dynamic Streamlit frontend turns insights + recommendations into a live dashboard.
The app automatically:
- loads all analytic DataFrames (DF_1, DF_2, …)
- reads
recs.txt - renders each recommended chart
- supports multiple sections and layouts
- produces a polished analytical UI
Run it with:
pip install -r requirements.txt
streamlit run app.pyThis project uses the OpenAI GPT API.
Before running the app, make sure that the environment variable OPENAI_API_KEY is properly set on your system.
Developed by:
