NOMAD RAG Assistant

This project is a sophisticated chatbot built with Retrieval-Augmented Generation (RAG) to assist researchers and developers working with the NOMAD platform. It can answer questions about documentation, features, and best practices by retrieving relevant information from a knowledge base and generating concise, accurate answers.

The project features a standalone FastAPI backend for the RAG pipeline, a Gradio web interface for user interaction, and a complete evaluation suite.

Project Structure 📂

This project follows a standard src layout to separate source code from project configuration and data. The core logic is located within the src/nomad_ragbot package.

nomad-bot-rag-docs-discord/
├── .env                  # Local environment variables (ignored by Git)
├── .env.example          # Template for environment variables
├── pyproject.toml        # Project metadata and dependencies
├── uv.lock               # Pinned versions for reproducible builds
├── data/                 # Holds the input data for the knowledge base
│   ├── chunks/           # Processed document chunks (JSONL)
│   ├── evaluation/       # Gold standard Q&A datasets
│   └── fetched/          # Raw documentation from repositories
├── chroma_store/         # Local vector database storage (ignored by Git)
├── dev-notes/            # Development documentation and workflow notes
├── utils/                # Utility scripts for data processing
│   └── gold/             # Gold dataset generation tools
└── src/
    └── nomad_ragbot/
        ├── api/          # FastAPI Backend
        │   ├── main.py
        │   ├── config.py
        │   └── ...
        │
        ├── query/        # Core RAG logic and query engine
        │   └── query.py
        │
        ├── gradio_app.py # Standalone Gradio Web UI
        ├── llm_client.py # Client for interacting with the LLM
        └── eval/         # Evaluation scripts and dashboard logic

src/nomad_ragbot/api/: A self-contained FastAPI application that serves the RAG pipeline. It handles indexing the data into a ChromaDB vector store and exposing an /ask endpoint.
src/nomad_ragbot/query/: The heart of the RAG system. It contains the RAGQueryEngine which manages retrieving context, reranking results, and generating answers.
src/nomad_ragbot/gradio_app.py: A standalone Gradio web interface for easy interaction with the chatbot. It calls the RAG logic directly.
data/: Your source documents (e.g., docs.dynamic.jsonl) that will be indexed into the vector database.
chroma_store/: The directory where the Chroma vector database is persisted locally. This is automatically generated.

Setup and Installation ⚙️

Follow these steps to set up your local environment. This project uses uv for fast package and environment management.

1. Clone and Set Up the Environment

First, clone the repository to your local machine.

git clone https://github.com/FAIRmat-NFDI/nomad-bot-rag-docs-discord.git
cd nomad-bot-rag-docs-discord

Next, create a virtual environment and install all necessary dependencies using uv.

# Create a virtual environment named .venv
uv venv

# Activate the environment (on macOS/Linux)
source .venv/bin/activate

# Install all packages from pyproject.toml
uv sync

2. Configure Environment Variables (optional)

Open the .env file and configure the paths and model endpoints if necessary.

JSONL_PATH="data/chunks/docs.dynamic.jsonl"
CHROMA_DIR="chroma_store"

# If you run Ollama locally:
# EMBED_BASE_URL="http://127.0.0.1:11434"
# GENERATOR_BASE_URL="http://127.0.0.1:11434/v1"

# If you want to use the defaults from config.py, do not set these two variables at all.
# (The defaults currently point to a non-local host behind HU VPN.)

# --- Models ---
EMBED_MODEL_NAME="nomic-embed-text"
GENERATOR_MODEL="gpt-oss:20b"

Note:

src/nomad_ragbot/api/config.py provides defaults.
Any variables you set in .env override those defaults.

Running the Application ▶️

The API server and the Gradio UI are two separate applications. You must run them in two separate terminals.

Terminal 1: Start the API Server

This server handles the RAG logic and indexing. The first time you run it, it will build the ChromaDB vector store, which may take a few minutes.

uvicorn src.nomad_ragbot.api.main:app --reload

The API will be available at http://127.0.0.1:8000.

Terminal 2: Start the Gradio Web UI

This command launches the user-friendly web interface for asking questions.

nomad-ragbot-ui

You can now open your browser and navigate to http://127.0.0.1:7860 to interact with the chatbot!

Evaluation Dashboard 📊

The project includes a suite for evaluating the performance of the RAG pipeline.

1. Install Evaluation Dependencies

Install the project in editable mode with the optional [eval] dependencies.

pip install -e ".[eval]"

2. Run Evaluation

Execute the evaluation script against a "golden dataset" of questions and answers.

ragbot-eval --data_path data/evaluation/gold_all.jsonl --out_dir runs/your-run-name --use_llm_judge

3. View the Dashboard

Launch the evaluation dashboard to visualize the results from your run.

ragbot-eval-dash --results_path runs/your-run-name/eval_results.parquet

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
assets		assets
data		data
dev-notes		dev-notes
notebooks		notebooks
reports		reports
runs/2025-09-11		runs/2025-09-11
src/nomad_ragbot		src/nomad_ragbot
utils		utils
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NOMAD RAG Assistant

Project Structure 📂

Setup and Installation ⚙️

1. Clone and Set Up the Environment

2. Configure Environment Variables (optional)

Running the Application ▶️

Terminal 1: Start the API Server

Terminal 2: Start the Gradio Web UI

Evaluation Dashboard 📊

1. Install Evaluation Dependencies

2. Run Evaluation

3. View the Dashboard

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

FAIRmat-NFDI/nomad-bot-rag-docs-discord

Folders and files

Latest commit

History

Repository files navigation

NOMAD RAG Assistant

Project Structure 📂

Setup and Installation ⚙️

1. Clone and Set Up the Environment

2. Configure Environment Variables (optional)

Running the Application ▶️

Terminal 1: Start the API Server

Terminal 2: Start the Gradio Web UI

Evaluation Dashboard 📊

1. Install Evaluation Dependencies

2. Run Evaluation

3. View the Dashboard

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages