Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
ced9bf1
Archive old Kotaemon code in folder old
fraboniface Jan 7, 2026
5a116a5
init SvelteKit frontend
fraboniface Jan 7, 2026
c57fd5e
Implement dummy full-stack chatbot app
fraboniface Jan 7, 2026
3f7a9c9
feat: basic RAG pipeline
fraboniface Jan 9, 2026
054b19d
Move hardcoded config to settings
fraboniface Jan 12, 2026
7f92236
Handle chat history
fraboniface Jan 12, 2026
9e9e6bc
add llm-based reranking
fraboniface Jan 12, 2026
95d3f5f
Display retrieved documents and repair markdown
fraboniface Jan 13, 2026
5a91994
fix citations and new lines
fraboniface Jan 13, 2026
bbc667b
feat : fetch publications from OA
fraboniface Jan 14, 2026
fd01885
make chat stick to bottom
fraboniface Jan 14, 2026
0d32007
format .svelte files
fraboniface Jan 14, 2026
794f010
feat: copy response to clipboard
fraboniface Jan 14, 2026
0d5a846
add loader
fraboniface Jan 14, 2026
b2c5630
fix click on message to see sources
fraboniface Jan 14, 2026
88a6bcb
feat: app bar and new chat button
fraboniface Jan 14, 2026
a759838
add server step to protect backend URL
fraboniface Jan 15, 2026
8e9debc
add start script
fraboniface Jan 15, 2026
201f409
Add requirements.txt for clevercloud
fraboniface Jan 15, 2026
48cd34f
Make sure pytorch is CPU-only
fraboniface Jan 15, 2026
7743611
log backend url for debugging
fraboniface Jan 15, 2026
38e4ee8
small fixes
fraboniface Jan 16, 2026
7b51e41
fix latency-induced bug with sending sources
fraboniface Jan 16, 2026
46f1dc3
feat: save chats to database
fraboniface Jan 19, 2026
5ab4540
set correct favicon
fraboniface Jan 19, 2026
43402ec
feat: feedback button
fraboniface Jan 19, 2026
8febba2
move chat backend source files to app folder
fraboniface Jan 20, 2026
3c6f568
update doc
fraboniface Jan 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ dist/
downloads/
eggs/
.eggs/
lib/
lib64/
lib64
parts/
Expand Down
9 changes: 9 additions & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Package Managers
package-lock.json
pnpm-lock.yaml
yarn.lock
bun.lock
bun.lockb

# Miscellaneous
/static/
16 changes: 16 additions & 0 deletions .prettierrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"useTabs": true,
"singleQuote": true,
"trailingComma": "none",
"printWidth": 100,
"plugins": ["prettier-plugin-svelte", "prettier-plugin-tailwindcss"],
"overrides": [
{
"files": "*.svelte",
"options": {
"parser": "svelte"
}
}
],
"tailwindStylesheet": "rag_system/frontend/src/routes/layout.css"
}
Binary file added assets/chat_archi.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
253 changes: 43 additions & 210 deletions rag_system/README.md
Original file line number Diff line number Diff line change
@@ -1,230 +1,63 @@
# ChatSufficiency
# Chat Sufficiency

Ce README est composé de deux parties :
1. Quick start technique en anglais
2. Explications du travail effectué et roadmap en français.

Il est prévu de bouger le sous-dossier `policy_analysis` dans un dossier frère car il s'agit d'un axe de travail à part entière, en bonne partie indépendant du RAG.

# Quick start

The RAG System is a collection of tools for RAG-based document QA.

## RAG System Structure

The RAG System is structured as follows:

```bash
rag_system
├── docker-compose.yml
├── Dockerfile
├── flowsettings.py
├── kotaemon
├── kotaemon_install_guide
├── kotaemon_pipeline_scripts
├── policy_analysis
├── README.md
└── taxonomy
Folder org:
```

There are 2 pipeline ingestion projects here...
... that share the same taxonomy.

The first one (with Kotaemon) use these folders :

```bash
├── docker-compose.yml
├── Dockerfile
├── kotaemon
├── kotaemon_install_guide
├── kotaemon_pipeline_scripts
└── taxonomy
- old: previous work based on Kotaemon
- backend: python backend using FastAPI and implementing RAG
- frontend: typescript frontend using SvelteKit
```

The second one (currently without Kotaemon ?) use these folders:

```bash
├── policy_analysis
├── README.md
└── taxonomy
## Quick start (dev setup)
```
# start backend
cd backend
cp .env.example .env
# fill in correct values in .env
uv run uvicorn app.main:app --reload

[README policy analysis](policy_analysis/README.md)


## KOTAEMON Pipeline Scripts Instructions

This framework is build according to Kotaemon to allow a new custom built 'fast' ingestion script (multi-threading ingestion for hundred and hundred document with one batch), side-to-side with the standard 'drag-and-drop' Kotaemon ingestion from the UI.


### DEV set-up deployment

You have two config files to check:


#### - the official Kotaemon file 'flowsettings.py" :

This file is at the root of 'rag_system'. (It will overwrite the official 'flowsettings.py' during the docker build.)

where are declared (among other things but the main declared components...):

- ```KH_OLLAMA_URL``` : the uri used to connect to the Ollama service inference (LLM models inference service)
- ```KH_APP_DATA_DIR``` : The main app data root directory where Kotaemon store all the internal data
- ```KH_DOCSTORE``` : The Kotaemon Docstore used and the path for it. Local Lancedb by default, but you could choose a remote LanceDB database
- ```KH_VECTORSTORE``` : The Kotaemon VectorStore used and the url for it. Qdrant by default for the dev team.
- ```KH_DATABASE``` : The Kotaemon internal SQL database. Could be sqllite (defaut) or any sql backend.
- ```KH_FILESTORAGE_PATH``` : The Kotaemon path storage fo all the raw documents (pdf images for each page, etc.)
- ...

You should not touch all these config for now... (during your dev setup)


#### - an additionnal .env to set inside the 'kotaemon_pipeline_scripts' folder :

This file lives inside 'kotaemon_pipeline_scripts'.

First : you have to generate your own .env from the .env.example template :

```bash
cd kotaemon_pipeline_scripts
cp .env.example .env
# start frontend (dev)
cd frontend
npm install
cp .env.example .env
npm run dev
```

And now check all the .env values :

- ```PG_DATABASE_URL``` = The URL of the Data4Good database that maintains the OpenAlex articles metadata (ask to the team)
- ```LLM_INFERENCE_URL``` = The URL for the LLM inference stack (your Ollama service for local dev)
- ```LLM_INFERENCE_MODEL``` = The model used for the chunk inference on metadatas
- ```LLM_INFERENCE_API_KEY``` = The API Key for the LLM inference stack
- ```EMBEDDING_MODEL_URL``` = The URL for the LLM embedding model stack (Ollama for local dev)
- ```EMBEDDING_MODEL``` = The model used for the embedding
- ```EMBEDDING_MODEL_API_KEY``` = The API Key for the LLM embedding model
- ```COLLECTION_ID``` = The id of the collection within Kotaemon App (BE CAREFULL TO CHOOSE THE RIGHT ID)
- ```USER_ID``` = The User ID taken from the Kotaemon App (BE CAREFULL TO CHOOSE THE RIGHT ID)
## Pipeline and architecture
Here is the pipeline and main architectural elements:
1. The **SvelteKit frontend** sends a query to the **FastAPI backend** via a POST request.
2. The backend calls a **generative AI API (Scaleway)** to determine whether the query is on-topic. If yes, it rewrites it for retrieval. If not, it answers directly.
3. The rewritten query is embedded on the server's CPU with a small model using **sentence-transformers**. It would be more cost-efficient to use an API and a smaller instance, but the embedding model we started with, **Qwen3-embedding-0.6B** isn't available on any commercial API.
4. The server sends the query to **Qdrant** (vector db), that returns the top $k_{vector}$ matches (configurable).
5. It then reranks the results locally using **flashrank**. Again, a more mature version might use e.g. Cohere's API.
6. The top $k_{rerank}$ chunks are then used to build the context. If the FETCH_PUBS env var is true (default), we use the OpenAlex ID of the chunks to fetch the corresponding publications from **OpenAlex's API**. We build the context using the title, abstract, and the retrieved chunks.
7. The context is passed along with the original query to the generative API and the backend streams back the response the the frontend.
8. The backend saves the messages and intermediary results to **Postgres**.

For now, do not touch the 'USER_ID' before launching the Kotaemon app for the first time. (see further)
SvelteKit is a full-stack framework for Svelte, similar to Next for React or Nuxt for Vue. We could almost have used it as a static site generator, but we use server-side functions to hide the backend's URL from the user.

The schema below is an illustration of the aformentioned pipeline. Note that the policy analysis retrieval isn't implemented yet.

### Running the RAG System
![Chat sufficiency architecture schema](../assets/chat_archi.png)

1) The 'dev' deployment is used to launch, work and debug with the python package in editable mode.
Moreover, all the 'kotaemon_pipeline_scripts' folder is mapped (as a volume) inside the container, to allow working on it during this dev stage.
## CleverCloud deployment
Both applications (front and back) are deployed to CleverCloud on the World Sufficiency Lab organization. CC handles the continuous deployment at each push to the given branch.

First, launch the different services with the docker compose provided in this folder.
The frontend requires a larger instance to build than to run, so we configured one. CC also doesn't include build step by default, so it needs to be configured via env vars.

Nothing to do — everything’s already set up: the Docker Compose file was created to save you the hassle.

You only need to pay attention, if necessary, to the volume mappings.

And if you don't have anny GPU on your local device and you don't have set-up cuda with docker, remove these line for the Ollama service ;

```yaml
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Due to the local computations, the backend currently demands 4 GB of RAM. CleverCloud also doesn't handle uv well, so we need to go through pip. Use :
```

```bash
docker compose up
uv pip compile pyproject.toml --output-file requirements.txt
```

Additionally, the command that normally launches the Kotaemon app (./launch.sh) has been deliberately disabled so you can develop on the app — coding the different libraries (kotaemon, ktem, and our custom ones) — without having to stop/restart the Kotaemon container.

Indeed, to run the Kotaemon app for testing, you need to enter the container:

From the rag_system folder where the Docker Compose file is located:

```bash
docker compose exec -it kotaemon bash
to create `requirements.txt` file from `pyproject.toml`. To avoid installing useless GPU-related libraries, pyproject.toml is configured to install torch+cpu. For this to work, don't forget to add this line at the start of `requirements.txt` after generating it :
```

And launch the Kotaemon app :
```bash
./launch.sh
```

IMPORTANT: After launching the Kotaemon app, open any page and check the logs to retrieve the USER ID.
Then, shut down the Kotaemon app from inside the container (or stop the container if you prefer).
Update your .env file with the correct USER ID.
Finally, restart the Kotaemon app — your fast ingestion pipeline scripts should now be consistent with the correct user collection.


2) You also need to pull the different models with the Ollama service.
Read and follow the point 2 of the README inside the 'kotaemon_install_guide' (FR) relative to this.

3) And now, for your first steps on the Kotaemon app, read and follow the point 3 of of the README inside the 'kotaemon_install_guide' (FR) relative to this.


### Running the 'Fast' ingestion pipeline scripts

The 'fast_ingestion_good_version.py' script calls the shortcut_indexing_pipeline.py that describes all the ingestion steps, build througth the Kotaemon API.
This script launchs an ingestion with the documents that are not ingested on the Data4Good metadata database.
To force a re-index, you could add a '-fr' argument do an incrementation of the version.

This pipeline uses also the 'pipelineblocks' modules (inside the folder kotaemon) which is a 'plugin' package build side-to-side with kotaemon.

To run the pipeline for a new ingestion, launch the script inside the container :
```bash
python3 pipeline_scripts/fast_ingestion_pipeline_good_version.py
--extra-index-url https://download.pytorch.org/whl/cpu
```
You can update the ingestion version (for example : 2) from two ways:
- by changing the ```INGESTION_VERSION=version_2``` env variable within your .env
- by providing an '--ingestion_version=version_2' argument directly to the fast ingestion python script.

The second choice will override the environment variable.

If you provide a new version for the first time, all the documents, without any exception, will be ingested again for the first time.


## Kotaemon Subtree Setup

The Kotaemon folder is a shared Data4Good subtree, synchronized with the common project:

🔗 https://github.com/dataforgoodfr/kotaemon

For setup, synchronization, and contribution instructions, please see the detailed guide here:
[📄](../docs/development/setup-kotaemon.md)


# Etat du projet et roadmap

## Choix d'utiliser puis de sortir de Kotaemon
L'option retenue initialement pour le chatbot était [Kotaemon](https://github.com/Cinnamon/kotaemon), un projet open source implémentant une interface de chatbot RAG en local, censément facilement customisable.
Il apportait notamment les fonctionnalités suivantes déjà codées :
- interface graphique avec affichage des sources PDF ;
- nombreux algorithmes de RAG disponibles ;
- pipeline d'ingestion de documents.

Il "suffisait" donc de connecter Kotaemon à la library pour avoir une v1 du chatbot.

Mais Kotaemon imposait des contraintes fortes sur l'ingestion de documents : il est conçu pour ajouter des documents en local mais pas pour utiliser une base séparée déjà traitée avec un pipeline d'ingestion en propre, ce que nous voulions pour la library. Il fallait donc pour le faire marcher passer par le code de Kotaemon pour certaines étapes de la création de la library. Cela a imposé un couplage entre les deux sous-projets qui a compliqué le développement et la coordination, et obligé à multiplier les bases de données. Bref, cela a ajouté une complexité énorme qui a finalement nuit au projet.

Les autres facteurs plaidant pour une sortie de Kotaemon sont les suivants :
- le projet n'est plus maintenu ;
- son interface est celle d'un outil personnel ou interne, pas d'un site web grand public ;
- il contient une gestion des utilisateurs inutile pour le projet mais qui impose des contraintes qui n'ont pas été clarifiées ;
- les possibilités d'améliorations et de customisation futures sont restreintes.

## Travail effectué
En conséquence du choix d'utiliser Kotaemon, le travail s'est concentré sur l'intégration de Kotaemon au reste du projet via :
- une customisation du code internet de Kotaemon (`rag_system/kotaemon/libs/pipelineblocks`) ;
- de nombreux fichiers de config (`rag_system/kotaemon_pipeline_scripts/fast_ingestion/`) ;
- un pipeline d'ingestion de documents adapté au projet (`rag_system/kotaemon_pipeline_scripts/fast_ingestion/`), incluant notamment l'extraction de la taxonomie.

Malheureusement, une grosse partie de ce travail est propre à Kotaemon et n'est pas réutilisable si nous en sortons. Seul le dernier point peut être réutilisé pour l'enrichissement des métadonnées de la library.

En revanche, un travail d'évaluation a été fait sur la branche `retrieval_evaluation` que nous pourrons réemployer pour l'optimisation du retrieval et de la génération.

## Roadmap
En remplacement de Kotaemon, la solution proposée est de simplement recoder les fonctionnalités dont nous avons besoins. Des projets alternatifs comme OpenWebUI ont été considérés mais exposent aux mêmes écueils que Kotaemon.

- [ ] Retrieval = moteur de recherche sur la library (abstract puis full text, recherche par mot clé puis par similarité sémantique)
- [ ] Interface web pour ce moteur de recherche (API FastAPI, app SvelteKit)
- [ ] Ingestion de la library en base vectorielle : chunking et embedding (nécessaire à la recherche sémantique)
- [ ] Génération avec citation des sources
- [ ] Interface web pour le chatbot = extension de celle du moteur de recherche utilisant [Svelte AI Elements](https://svelte-ai-elements.vercel.app/)
- [ ] Complexification progressive (affichage des PDF, des graphiques...)
## TODO
- suggestions (of questions)
- prompt: answer in same language as query
- prompt: don't forget social floor
- hybrid search
- policy analysis
- optimize cost (use APIs for query embedding and reranking to use a smaller server instance)
12 changes: 12 additions & 0 deletions rag_system/backend/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
QDRANT_URL=https://***.eu-central-1-0.aws.cloud.qdrant.io
QDRANT_API_KEY=YOUR_QDRANT_API_KEY_HERE
QDRANT_COLLECTION_NAME=library-v1
EMBEDDING_DIM=256

GENERATION_API_URL=https://api.scaleway.ai/v1
GENERATION_MODEL_NAME=mistral-small-3.2-24b-instruct-2506

SCW_ACCESS_KEY=YOUR_SCALEWAY_ACCESS_KEY_HERE
SCW_SECRET_KEY=YOUR_SCALEWAY_SECRET_KEY_HERE

POSTGRES_URI=DIRECT_POSTGRESQL_URI_HERE
45 changes: 45 additions & 0 deletions rag_system/backend/app/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
from typing import Literal
from pydantic_settings import BaseSettings, SettingsConfigDict


class Settings(BaseSettings):
fetch_pubs: bool = True

# retrieval
qdrant_url: str
qdrant_api_key: str
qdrant_collection_name: str = "library-test"
qdrant_timeout: int = 10
embedding_dim: int = 128
embedding_model: str = "Qwen/Qwen3-Embedding-0.6B"
max_length_reranker: int = 1024
k_vector_search: int = 20

query_rewrite_temperature: float = 0.05
query_rewrite_top_p: float = 0.1
query_rewrite_max_tokens: int = 128
query_rewrite_timeout: int = 15

k_rerank: int = 5
rerank_method: Literal["flashrank", "llm"] = "flashrank"
llm_rerank_model: str = "mistral-small-3.2-24b-instruct-2506"

# generation
generation_api_url: str
generation_model_name: str
scw_access_key: str
scw_secret_key: str

answer_temperature: float = 0.01
answer_top_p: float = 0.1
answer_max_tokens: int = 1024
answer_timeout: int = 30

# database
postgres_uri: str
log_usage: bool = False

model_config = SettingsConfigDict(env_file=".env")


settings = Settings()
Loading
Loading