An AI-powered platform for discovering, loading, and analysing scientific datasets from the PANGAEA data repository. PangaeaGPT combines a modern React frontend with a FastAPI backend that orchestrates a multi-agent LLM pipeline — including specialised oceanography, ecology, visualisation, and data-analysis agents — to answer natural-language questions about Earth system data.
| Capability | Description |
|---|---|
| Semantic Dataset Search | Natural-language queries are translated into PANGAEA API calls and ranked by an LLM search agent |
| One-Click Loading | Select datasets from search results and load them into a per-session workspace in seconds |
| Multi-Agent Analysis | A supervisor agent routes queries to specialised sub-agents: Oceanographer, Ecologist, Visualisation, DataFrame, and Writer |
| Interactive Chat | Real-time WebSocket chat with streaming status updates, markdown rendering, and inline plot display |
| Dark & Light Themes | Clean, minimalistic scientific UI with a one-click toggle — persisted to localStorage |
| ERA5 / Copernicus Integration | The Oceanographer agent can retrieve ERA5 reanalysis and Copernicus Marine data via Arraylake |
| Sandboxed Code Execution | All agent-generated Python code runs in an isolated REPL sandbox per session |
┌─────────────────────────────────────────────────────┐
│ Frontend (React + Vite + Tailwind) │
│ localhost:5174 │
│ ┌──────────────┐ ┌────────────────────┐ │
│ │ DatasetExplorer│ │ DataAgent (WS) │ │
│ └──────────────┘ └────────────────────┘ │
└─────────────────────────────────────────────────────┘
│ REST / WebSocket │
▼ ▼
┌─────────────────────────────────────────────────────┐
│ Backend (FastAPI + Uvicorn) │
│ localhost:8000 │
│ ┌────────────┐ ┌──────────────────────┐ │
│ │ SessionMgr │ │ SupervisorAgent │ │
│ └────────────┘ │ ├─ OceanographerAgt │ │
│ │ ├─ EcologistAgent │ │
│ │ ├─ VisualisationAgt │ │
│ │ ├─ DataFrameAgent │ │
│ │ └─ WriterAgent │ │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────┘
- Python 3.10+
- Node.js 18+ and npm
- An OpenAI API key (set as
OPENAI_API_KEY)
git clone https://github.com/CliDyn/pangaeaGPT.git
cd pangaeaGPT
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
python setup.py # downloads shapefiles + bathymetry dataCreate a .env file in the project root (already in .gitignore):
OPENAI_API_KEY=sk-...
# Optional:
LANGCHAIN_API_KEY=ls-...
ARRAYLAKE_API_KEY=...cd frontend
npm install
npm run dev # → http://localhost:5174# In a separate terminal, from the project root
source .venv/bin/activate
uvicorn api.main:app # → http://localhost:8000Open http://localhost:5174 — the app will auto-connect to the backend.
The interface has two main views, accessible from the left sidebar:
Search PANGAEA, browse result cards with metadata, preview tabular data inline, and select datasets to load into your workspace.
Chat with the multi-agent pipeline. Ask questions like:
- "What columns are in this dataset?"
- "Plot a map of the sampling stations"
- "Calculate the mean temperature by depth"
- "Create a species distribution model"
The agent will plan, delegate to specialist sub-agents, execute Python code in a sandbox, and stream the results back — including inline matplotlib/cartopy plots.
Click the ☀️ / 🌙 icon in the sidebar to switch between dark and light themes.
├── api/ # FastAPI routes and server entry point
│ ├── main.py # App factory, static mounts, CORS
│ └── routes/ # REST + WebSocket endpoints
│ ├── sessions.py # Session lifecycle
│ ├── search.py # Dataset search, select, preview
│ └── agent.py # WebSocket agent chat
├── frontend/ # React (Vite + TypeScript + Tailwind)
│ └── src/
│ ├── App.tsx # Shell, sidebar, theme toggle
│ ├── index.css # CSS custom properties (dark/light)
│ ├── api/client.ts # REST client + WebSocket class
│ └── components/
│ ├── DatasetExplorer.tsx
│ ├── DataAgent.tsx
│ └── WorkspacePanel.tsx
├── src/ # Core Python package
│ ├── agents/ # Supervisor + specialist agents
│ ├── tools/ # LangChain tools (REPL, ERA5, planning, …)
│ ├── search/ # PANGAEA search + dataset download utils
│ ├── config.py # Env-var config loader (no hardcoded keys)
│ └── llm_factory.py # Model instantiation helper
├── main.py # Session init, dataset loading, agent wiring
├── setup.py # One-time data download script
└── requirements.txt
The original Streamlit frontend (app.py) is still available for reference:
streamlit run app.pyThe new React + FastAPI architecture decouples frontend and backend, supports WebSocket streaming, session isolation, and theme customisation.
Contributions welcome! Please open an issue or PR on GitHub.
