🚀 Insight Extraction & Auto-Visualization RAG Tool

This project implements a full Insight Extraction + Visualization Pipeline designed for the Bayer Challenge during SinceAI Hackathon in Turku. Given:

a user prompt, and
a dataset retrieved by a RAG system,

the system automatically produces:

a semantic intent JSON
enriched categories and semantic expansions
embedding-based categorical assignment
SQL analytics based on the user's prompt
visualization recommendations
a fully dynamic Streamlit dashboard

Everything runs end-to-end with a single pipeline.

🧩 System Pipeline Overview

The architecture is built around three core modules, all cooperating to turn natural language into structured analytics and visual insights.

1️⃣ Insight Extraction Module

Location: insight_extraction/

This module interprets the user question and structurally organizes the dataset for downstream analysis.

✔ Semantic Intent

Transforms the natural-language prompt into a structured JSON containing:

requested metrics
grouping dimensions
filters
semantic categories
logical operations

This provides a clean, machine-readable blueprint for analytics generation.

✔ Semantic Category Expansion

For each category belonging to a dimension, the system expands it using an LLM:

textual description
synonyms
example sentences

These enrichments make semantic matching far more robust.

✔ Embedding-Based Category Assignment

We embed:

dataset rows
expanded category descriptions

Using cosine similarity, each row is assigned to the closest semantic category.

The result is a new categorical dataset ready for analytics.

✔ SQL Generator (Labelled SQL Blocks)

Using the enriched dataset, the module generates SQL queries in a consistent, parser-friendly format

Key guarantees:

multiple, coherent insight queries
deterministic structure (-- HEADER labels)
full compatibility with our SQL executor
ready for visualization

2️⃣ Generate Chart Recommendations

Location: visualization_recommender/

Given:

SQL-generated analytics DataFrames
the original user prompt

the module selects the best possible charts to communicate insights:

trends → line charts
category comparisons → bar charts
distributions → histograms
proportions → pie charts
anomalies → scatter/line hybrid

It outputs a file:

recs.txt

containing high-level visualization instructions.

3️⃣ Streamlit Auto-Dashboard

Location: from_text_to_streamlit_app/

A fully dynamic Streamlit frontend turns insights + recommendations into a live dashboard.

The app automatically:

loads all analytic DataFrames (DF_1, DF_2, …)
reads recs.txt
renders each recommended chart
supports multiple sections and layouts
produces a polished analytical UI

Run it with:

pip install -r requirements.txt
streamlit run app.py

🔑 API Key Required

This project uses the OpenAI GPT API.
Before running the app, make sure that the environment variable OPENAI_API_KEY is properly set on your system.

⚠️ The application will not work without a valid API key. Do not share your key or commit it to version control.

👥 Contributors

Developed by:

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
from_text_to_streamlit_app		from_text_to_streamlit_app
initial_prompts		initial_prompts
insight_extraction		insight_extraction
models		models
viz_recommender		viz_recommender
App.py		App.py
HSE_OnDemand_Visualization_Slides.pdf		HSE_OnDemand_Visualization_Slides.pdf
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
main.py		main.py
pipeline.png		pipeline.png
requirements.txt		requirements.txt
video_demo.mp4		video_demo.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Insight Extraction & Auto-Visualization RAG Tool

🧩 System Pipeline Overview

1️⃣ Insight Extraction Module

✔ Semantic Intent

✔ Semantic Category Expansion

✔ Embedding-Based Category Assignment

✔ SQL Generator (Labelled SQL Blocks)

2️⃣ Generate Chart Recommendations

3️⃣ Streamlit Auto-Dashboard

🔑 API Key Required

👥 Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

martiFabia/On-demand-visualization-of-HSE-data

Folders and files

Latest commit

History

Repository files navigation

🚀 Insight Extraction & Auto-Visualization RAG Tool

🧩 System Pipeline Overview

1️⃣ Insight Extraction Module

✔ Semantic Intent

✔ Semantic Category Expansion

✔ Embedding-Based Category Assignment

✔ SQL Generator (Labelled SQL Blocks)

2️⃣ Generate Chart Recommendations

3️⃣ Streamlit Auto-Dashboard

🔑 API Key Required

👥 Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages