dataforgoodfr · fraboniface · Jan 21, 2026 · Jan 7, 2026 · Jan 7, 2026 · Jan 7, 2026
diff --git a/.gitignore b/.gitignore
@@ -20,7 +20,6 @@ dist/
 downloads/
 eggs/
 .eggs/
-lib/
 lib64/
 lib64
 parts/

diff --git a/.prettierignore b/.prettierignore
@@ -0,0 +1,9 @@
+# Package Managers
+package-lock.json
+pnpm-lock.yaml
+yarn.lock
+bun.lock
+bun.lockb
+
+# Miscellaneous
+/static/
diff --git a/.prettierrc b/.prettierrc
@@ -0,0 +1,16 @@
+{
+	"useTabs": true,
+	"singleQuote": true,
+	"trailingComma": "none",
+	"printWidth": 100,
+	"plugins": ["prettier-plugin-svelte", "prettier-plugin-tailwindcss"],
+	"overrides": [
+		{
+			"files": "*.svelte",
+			"options": {
+				"parser": "svelte"
+			}
+		}
+	],
+	"tailwindStylesheet": "rag_system/frontend/src/routes/layout.css"
+}
diff --git a/assets/chat_archi.png b/assets/chat_archi.png
diff --git a/rag_system/README.md b/rag_system/README.md
@@ -1,230 +1,63 @@
-# ChatSufficiency
+# Chat Sufficiency
 
-Ce README est composé de deux parties :
-1. Quick start technique en anglais
-2. Explications du travail effectué et roadmap en français.
-
-Il est prévu de bouger le sous-dossier `policy_analysis` dans un dossier frère car il s'agit d'un axe de travail à part entière, en bonne partie indépendant du RAG.
-
-# Quick start
-
-The RAG System is a collection of tools for RAG-based document QA.
-
-## RAG System Structure
-
-The RAG System is structured as follows:
-
-```bash
-rag_system
-├── docker-compose.yml
-├── Dockerfile
-├── flowsettings.py
-├── kotaemon
-├── kotaemon_install_guide
-├── kotaemon_pipeline_scripts
-├── policy_analysis
-├── README.md
-└── taxonomy
+Folder org:
 ```
-
-There are 2 pipeline ingestion projects here...
-... that share the same taxonomy.
-
-The first one (with Kotaemon) use these folders :
-
-```bash
-├── docker-compose.yml
-├── Dockerfile
-├── kotaemon
-├── kotaemon_install_guide
-├── kotaemon_pipeline_scripts
-└── taxonomy  
+- old: previous work based on Kotaemon
+- backend: python backend using FastAPI and implementing RAG
+- frontend: typescript frontend using SvelteKit
 ```
 
-The second one (currently without Kotaemon ?) use these folders:
-
-```bash
-├── policy_analysis
-├── README.md
-└── taxonomy
+## Quick start (dev setup)
 ```
+# start backend
+cd backend
+cp .env.example .env
+# fill in correct values in .env
+uv run uvicorn app.main:app --reload
 
-[README policy analysis](policy_analysis/README.md)
-
-
-## KOTAEMON Pipeline Scripts Instructions
-
-This framework is build according to Kotaemon to allow a new custom built 'fast' ingestion script (multi-threading ingestion for hundred and hundred document with one batch), side-to-side with the standard 'drag-and-drop' Kotaemon ingestion from the UI.
-
-
-### DEV set-up deployment
-
-You have two config files to check:
-
-
-#### - the official Kotaemon file 'flowsettings.py" :
-
-This file is at the root of 'rag_system'. (It will overwrite the official 'flowsettings.py' during the docker build.)
-
-where are declared (among other things but the main declared components...):
-
-- ```KH_OLLAMA_URL``` : the uri used to connect to the Ollama service inference (LLM models inference service)
-- ```KH_APP_DATA_DIR``` : The main app data root directory where Kotaemon store all the internal data
-- ```KH_DOCSTORE``` : The Kotaemon Docstore used and the path for it. Local Lancedb by default, but you could choose a remote LanceDB database
-- ```KH_VECTORSTORE``` : The Kotaemon VectorStore used and the url for it. Qdrant by default for the dev team.
-- ```KH_DATABASE``` : The Kotaemon internal SQL database. Could be sqllite (defaut) or any sql backend.
-- ```KH_FILESTORAGE_PATH``` : The Kotaemon path storage fo all the raw documents (pdf images for each page, etc.)
-- ...
-
-You should not touch all these config for now... (during your dev setup)
-
-
-#### - an additionnal .env to set inside the 'kotaemon_pipeline_scripts' folder :
-
-This file lives inside 'kotaemon_pipeline_scripts'.
-
-First : you have to generate your own .env from the .env.example template :
-
-```bash
-cd kotaemon_pipeline_scripts
-cp .env.example .env 
+# start frontend (dev)
+cd frontend
+npm install
+cp .env.example .env
+npm run dev
 ```
 
-And now check all the .env values :
-
-- ```PG_DATABASE_URL```  = The URL of the Data4Good database that maintains the OpenAlex articles metadata (ask to the team)
-- ```LLM_INFERENCE_URL```  = The URL for the LLM inference stack (your Ollama service for local dev)
-- ```LLM_INFERENCE_MODEL```  = The model used for the chunk inference on metadatas
-- ```LLM_INFERENCE_API_KEY```  = The API Key for the LLM inference stack
-- ```EMBEDDING_MODEL_URL```  = The URL for the LLM embedding model stack (Ollama for local dev)
-- ```EMBEDDING_MODEL```  = The model used for the embedding
-- ```EMBEDDING_MODEL_API_KEY```  = The API Key for the LLM embedding model
-- ```COLLECTION_ID```  = The id of the collection within Kotaemon App (BE CAREFULL TO CHOOSE THE RIGHT ID)
-- ```USER_ID```  = The User ID taken from the Kotaemon App (BE CAREFULL TO CHOOSE THE RIGHT ID)
+## Pipeline  and architecture
+Here is the pipeline and main architectural elements:
+1. The **SvelteKit frontend** sends a query to the **FastAPI backend** via a POST request.
+2. The backend calls a **generative AI API (Scaleway)** to determine whether the query is on-topic. If yes, it rewrites it for retrieval. If not, it answers directly.
+3. The rewritten query is embedded on the server's CPU with a small model using **sentence-transformers**. It would be more cost-efficient to use an API and a smaller instance, but the embedding model we started with, **Qwen3-embedding-0.6B** isn't available on any commercial API.
+4. The server sends the query to **Qdrant** (vector db), that returns the top $k_{vector}$ matches (configurable).
+5. It then reranks the results locally using **flashrank**. Again, a more mature version might use e.g. Cohere's API.
+6. The top $k_{rerank}$ chunks are then used to build the context. If the FETCH_PUBS env var is true (default), we use the OpenAlex ID of the chunks to fetch the corresponding publications from **OpenAlex's API**. We build the context using the title, abstract, and the retrieved chunks. 
+7. The context is passed along with the original query to the generative API and the backend streams back the response the the frontend.
+8. The backend saves the messages and intermediary results to **Postgres**.
 
-For now, do not touch the 'USER_ID' before launching the Kotaemon app for the first time. (see further)
+SvelteKit is a full-stack framework for Svelte, similar to Next for React or Nuxt for Vue. We could almost have used it as a static site generator, but we use server-side functions to hide the backend's URL from the user.
 
+The schema below is an illustration of the aformentioned pipeline. Note that the policy analysis retrieval isn't implemented yet.
 
-### Running the RAG System
+![Chat sufficiency architecture schema](../assets/chat_archi.png)
 
-1) The 'dev' deployment is used to launch, work and debug with the python package in editable mode.
-Moreover, all the 'kotaemon_pipeline_scripts' folder is mapped (as a volume) inside the container, to allow working on it during this dev stage.
+## CleverCloud deployment
+Both applications (front and back) are deployed to CleverCloud on the World Sufficiency Lab organization. CC handles the continuous deployment at each push to the given branch.
 
-First, launch the different services with the docker compose provided in this folder.
+The frontend requires a larger instance to build than to run, so we configured one. CC also doesn't include build step by default, so it needs to be configured via env vars.
 
-Nothing to do — everything’s already set up: the Docker Compose file was created to save you the hassle.
-
-You only need to pay attention, if necessary, to the volume mappings.
-
-And if you don't have anny GPU on your local device and you don't have set-up cuda with docker, remove these line for the Ollama service ;
-
-```yaml
-deploy:
-      resources:
-        reservations:
-          devices:
-            - driver: nvidia
-              count: all
-              capabilities: [gpu]
+Due to the local computations, the backend currently demands 4 GB of RAM. CleverCloud also doesn't handle uv well, so we need to go through pip. Use :
 ```
-
-```bash
-docker compose up
+uv pip compile pyproject.toml --output-file requirements.txt
 ```
-
-Additionally, the command that normally launches the Kotaemon app (./launch.sh) has been deliberately disabled so you can develop on the app — coding the different libraries (kotaemon, ktem, and our custom ones) — without having to stop/restart the Kotaemon container.
-
-Indeed, to run the Kotaemon app for testing, you need to enter the container:
-
-From the rag_system folder where the Docker Compose file is located:
-
-```bash
-docker compose exec -it kotaemon bash
+to create `requirements.txt` file from `pyproject.toml`. To avoid installing useless GPU-related libraries, pyproject.toml is configured to install torch+cpu. For this to work, don't forget to add this line at the start of `requirements.txt` after generating it :
 ```
-
-And launch the Kotaemon app :
-```bash
-./launch.sh
-```
-
-IMPORTANT: After launching the Kotaemon app, open any page and check the logs to retrieve the USER ID.
-Then, shut down the Kotaemon app from inside the container (or stop the container if you prefer).
-Update your .env file with the correct USER ID.
-Finally, restart the Kotaemon app — your fast ingestion pipeline scripts should now be consistent with the correct user collection.
-
-
-2) You also need to pull the different models with the Ollama service.
-Read and follow the point 2 of the README inside the 'kotaemon_install_guide' (FR) relative to this.
-
-3) And now, for your first steps on the Kotaemon app, read and follow the point 3 of of the README inside the 'kotaemon_install_guide' (FR) relative to this.
-
-
-### Running the 'Fast' ingestion pipeline scripts
-
-The 'fast_ingestion_good_version.py' script calls the shortcut_indexing_pipeline.py that describes all the ingestion steps, build througth the Kotaemon API.
-This script launchs an ingestion with the documents that are not ingested on the Data4Good metadata database.
-To force a re-index, you could add a '-fr' argument do an incrementation of the version.
-
-This pipeline uses also the 'pipelineblocks' modules (inside the folder kotaemon) which is a 'plugin' package build side-to-side with kotaemon.
-
-To run the pipeline for a new ingestion, launch the script inside the container :
-```bash
-python3 pipeline_scripts/fast_ingestion_pipeline_good_version.py
+--extra-index-url https://download.pytorch.org/whl/cpu 
 ```
-You can update the ingestion version (for example : 2) from two ways:
-- by changing the ```INGESTION_VERSION=version_2``` env variable within your .env
-- by providing an '--ingestion_version=version_2' argument directly to the fast ingestion python script. 
-
-The second choice will override the environment variable.
-
-If you provide a new version for the first time, all the documents, without any exception, will be ingested again for the first time.
-
-
-## Kotaemon Subtree Setup
-
-The Kotaemon folder is a shared Data4Good subtree, synchronized with the common project:
-
-🔗 https://github.com/dataforgoodfr/kotaemon
-
-For setup, synchronization, and contribution instructions, please see the detailed guide here:
-[📄](../docs/development/setup-kotaemon.md)
-
-
-# Etat du projet et roadmap
-
-## Choix d'utiliser puis de sortir de Kotaemon
-L'option retenue initialement pour le chatbot était [Kotaemon](https://github.com/Cinnamon/kotaemon), un projet open source implémentant une interface de chatbot RAG en local, censément facilement customisable.
-Il apportait notamment les fonctionnalités suivantes déjà codées :
-- interface graphique avec affichage des sources PDF ;
-- nombreux algorithmes de RAG disponibles ;
-- pipeline d'ingestion de documents.
-
-Il "suffisait" donc de connecter Kotaemon à la library pour avoir une v1 du chatbot.
-
-Mais Kotaemon imposait des contraintes fortes sur l'ingestion de documents : il est conçu pour ajouter des documents en local mais pas pour utiliser une base séparée déjà traitée avec un pipeline d'ingestion en propre, ce que nous voulions pour la library. Il fallait donc pour le faire marcher passer par le code de Kotaemon pour certaines étapes de la création de la library. Cela a imposé un couplage entre les deux sous-projets qui a compliqué le développement et la coordination, et obligé à multiplier les bases de données. Bref, cela a ajouté une complexité énorme qui a finalement nuit au projet.
-
-Les autres facteurs plaidant pour une sortie de Kotaemon sont les suivants :
-- le projet n'est plus maintenu ;
-- son interface est celle d'un outil personnel ou interne, pas d'un site web grand public ;
-- il contient une gestion des utilisateurs inutile pour le projet mais qui impose des contraintes qui n'ont pas été clarifiées ;
-- les possibilités d'améliorations et de customisation futures sont restreintes.
-
-## Travail effectué
-En conséquence du choix d'utiliser Kotaemon, le travail s'est concentré sur l'intégration de Kotaemon au reste du projet via :
-- une customisation du code internet de Kotaemon (`rag_system/kotaemon/libs/pipelineblocks`) ;
-- de nombreux fichiers de config (`rag_system/kotaemon_pipeline_scripts/fast_ingestion/`) ;
-- un pipeline d'ingestion de documents adapté au projet (`rag_system/kotaemon_pipeline_scripts/fast_ingestion/`), incluant notamment l'extraction de la taxonomie.
-
-Malheureusement, une grosse partie de ce travail est propre à Kotaemon et n'est pas réutilisable si nous en sortons. Seul le dernier point peut être réutilisé pour l'enrichissement des métadonnées de la library.
-
-En revanche, un travail d'évaluation a été fait sur la branche `retrieval_evaluation` que nous pourrons réemployer pour l'optimisation du retrieval et de la génération.
 
-## Roadmap
-En remplacement de Kotaemon, la solution proposée est de simplement recoder les fonctionnalités dont nous avons besoins. Des projets alternatifs comme OpenWebUI ont été considérés mais exposent aux mêmes écueils que Kotaemon.
 
-- [ ] Retrieval = moteur de recherche sur la library (abstract puis full text, recherche par mot clé puis par similarité sémantique)
-- [ ] Interface web pour ce moteur de recherche (API FastAPI, app SvelteKit)
-- [ ] Ingestion de la library en base vectorielle : chunking et embedding (nécessaire à la recherche sémantique)
-- [ ] Génération avec citation des sources
-- [ ] Interface web pour le chatbot = extension de celle du moteur de recherche utilisant [Svelte AI Elements](https://svelte-ai-elements.vercel.app/)
-- [ ] Complexification progressive (affichage des PDF, des graphiques...)
+## TODO
+- suggestions (of questions)
+- prompt: answer in same language as query
+- prompt: don't forget social floor
+- hybrid search
+- policy analysis
+- optimize cost (use APIs for query embedding and reranking to use a smaller server instance)
diff --git a/rag_system/backend/.env.example b/rag_system/backend/.env.example
@@ -0,0 +1,12 @@
+QDRANT_URL=https://***.eu-central-1-0.aws.cloud.qdrant.io
+QDRANT_API_KEY=YOUR_QDRANT_API_KEY_HERE
+QDRANT_COLLECTION_NAME=library-v1
+EMBEDDING_DIM=256
+
+GENERATION_API_URL=https://api.scaleway.ai/v1
+GENERATION_MODEL_NAME=mistral-small-3.2-24b-instruct-2506
+
+SCW_ACCESS_KEY=YOUR_SCALEWAY_ACCESS_KEY_HERE
+SCW_SECRET_KEY=YOUR_SCALEWAY_SECRET_KEY_HERE
+
+POSTGRES_URI=DIRECT_POSTGRESQL_URI_HERE
diff --git a/...bs/kotaemon/kotaemon/contribs/__init__.py → rag_system/backend/app/__init__.py b/...bs/kotaemon/kotaemon/contribs/__init__.py → rag_system/backend/app/__init__.py
diff --git a/rag_system/backend/app/config.py b/rag_system/backend/app/config.py
@@ -0,0 +1,45 @@
+from typing import Literal
+from pydantic_settings import BaseSettings, SettingsConfigDict
+
+
+class Settings(BaseSettings):
+    fetch_pubs: bool = True
+
+    # retrieval
+    qdrant_url: str
+    qdrant_api_key: str
+    qdrant_collection_name: str = "library-test"
+    qdrant_timeout: int = 10
+    embedding_dim: int = 128
+    embedding_model: str = "Qwen/Qwen3-Embedding-0.6B"
+    max_length_reranker: int = 1024
+    k_vector_search: int = 20
+
+    query_rewrite_temperature: float = 0.05
+    query_rewrite_top_p: float = 0.1
+    query_rewrite_max_tokens: int = 128
+    query_rewrite_timeout: int = 15
+
+    k_rerank: int = 5
+    rerank_method: Literal["flashrank", "llm"] = "flashrank"
+    llm_rerank_model: str = "mistral-small-3.2-24b-instruct-2506"
+
+    # generation
+    generation_api_url: str
+    generation_model_name: str
+    scw_access_key: str
+    scw_secret_key: str
+
+    answer_temperature: float = 0.01
+    answer_top_p: float = 0.1
+    answer_max_tokens: int = 1024
+    answer_timeout: int = 30
+
+    # database
+    postgres_uri: str
+    log_usage: bool = False
+
+    model_config = SettingsConfigDict(env_file=".env")
+
+
+settings = Settings()
-Original file line number
+Diff line change
@@ Expand Up / @@ -20,7 +20,6 @@ dist/ @@
     downloads/
     eggs/
     .eggs/
-    lib/
     lib64/
     lib64
     parts/
@@ Expand Down @@