Hybrid_Data_RAGfinery

Status: In Use / AS-IS • Internal reference project, open-sourced for documentation purposes. Not actively maintained.

Authors: Philipp Mattern & Marcel Telaar • Developed within a collaboration between INP Greifswald and Xototec.

What is this?

Hybrid_Data_RAGfinery is a pragmatic, containerized hybrid RAG reference that combines a graph metadata store (ArangoDB) with a vector database (Qdrant), plus a lightweight Docling conversion service, a Python backend (hare_rag), n8n automation, and openwebui as a chat UI.

This repository documents how we wired these parts together for an internal deployment. It is not an actively developed product. You are welcome to read and reuse ideas under the Apache-2.0 license.

Project scope & ground rules

AS-IS: The code and docs reflect a system we have running internally. We do not plan feature work, support, or a roadmap.
Compose-first: We publish a Docker Compose setup to explain the architecture. No Helm charts/K8s.
Critical prerequisite: The system assumes an upstream export from an HCL Notes database into compact JSON files. Those JSONs are the canonical inputs for ingestion, retrieval, and prompting. Without this step, the system will not function as described.

High-level architecture

+-------------------+         +-------------------+
| Upstream System   |         | (This repo)       |
| HCL Notes DB      |         |                   |
|  → JSON exporter  +-------> |  Ingestion & RAG  |
+-------------------+         +-------------------+
                                   │
                                   ▼
     ┌────────────┐   ┌──────────────┐   ┌─────────────┐
     │ ArangoDB   │   │   Qdrant     │   │ Docling API │
     │ (graph)    │   │ (vectors)    │   │ (chunking)  │
     └─────┬──────┘   └──────┬───────┘   └──────┬──────┘
           │                 │                  │
           ▼                 ▼                  │
     ┌────────────┐   ┌──────────────┐   ┌──────▼──────┐
     │ hare_rag   │→→│ LLM provider  │   │   n8n       │
     │ (backend)  │   │ (embeddings/ │   │ (triggers)  │
     │            │   │  completion) │   └──────┬──────┘
     └─────┬──────┘   └──────────────┘          │
           │                                     │
           ▼                                     ▼
       ┌────────┐                          ┌──────────┐
       │ API    │                          │ openwebui│
       └────────┘                          └──────────┘

Key idea: dual retrieval. We keep structured context (documents, categories, relationships, attachments) in ArangoDB, while semantic similarity over text chunks lives in Qdrant.

Components (brief)

ArangoDB – graph & metadata (documents, categories, keywords, attachments, chunk nodes, edges for relations)
Qdrant – vector search over content/attachment chunks
Docling (FastAPI) – file conversion (e.g., PDF/DOCX → Markdown) + chunking
hare_rag (Python) – folder crawler, DB upserts, embedding calls, retrieval orchestration, prompt building
n8n – watch-folder automation & simple routing (e.g., smalltalk vs. RAG)
openwebui – chat frontend that sends queries via n8n to hare_rag

Note: The JSON export from HCL Notes feeds the crawler. Attachments can be processed via Docling and linked into the graph.

Quick start (Docker Compose)

Prepare inputs
- Run your HCL Notes → JSON exporter.
- Place result folders into the configured MANUAL_UPLOAD_FOLDER or WATCH_FOLDER (see .env). Each folder is expected to include:
  - content_AI.md
  - metadata_AI.json
  - optional attachments (PDF/DOCX/PPTX/CSV/MD, etc.)

Environment Create .env next to docker-compose.yml:

ARANGO_ROOT_PASSWORD=change-me
OPENAI_API_KEY=sk-...
MANUAL_UPLOAD_FOLDER=./manual_upload_folder
WATCH_FOLDER=./watch_folder

Build images

cd docling_app_container && docker build -t docling-py-custom .
cd ../rag_application && docker build -t hare_rag .
cd ..

Run stack
```
docker compose up -d
```
Ingest data
- Automatic: drop a new folder into ./watch_folder (n8n will trigger /upload_folder).
- Manual: curl -X POST http://localhost:7000/upload_folder -H 'Content-Type: application/json' -d '{"folder_path":"/data/manual_upload_folder/example"}'
Query
- Use openwebui (default :3000) with a function/pipe that POSTs to n8n → hare_rag /query.

API (thin, pragmatic)

POST /process_file (Docling): upload a single attachment, get Markdown + chunks
POST /upload_folder (hare_rag): crawl a prepared folder (content_AI.md, metadata_AI.json, attachments)
POST /query (hare_rag): embed query, retrieve via Qdrant/ArangoDB, call LLM, return answer + sources

Data model (essentials)

ArangoDB

Vertices: documents, forms, responsibles, keywords, categories, chunks, attachments
Edges: document_forms, document_responsibles, document_keywords, document_category, document_attachments, chunks_attachment, category_hierarchy

Qdrant

Collections:
- content_vectors
- attachment_chunks

Operational notes

Logs: docker logs hare_rag, docker logs n8n
openwebui default: http://localhost:3000 • n8n default: http://localhost:5678
This stack assumes outbound access to the chosen LLM/embedding provider (default OpenAI). Swap-in local models at your own discretion.

Limitations & caveats

Upstream dependency: Requires the HCL Notes → JSON export step; this repo does not include it.
Not hardened: No production SSO, RBAC, or multi-tenant controls provided here.
No SLAs & support: This is a snapshot of what worked for us. Expect to adapt it for your context.
No active maintenance: Issues/PRs may go unanswered.

Related tools & frameworks (non-exhaustive)

If you are evaluating RAG systems in 2025, you might also look at (in alphabetical order):

Dify
Haystack
LangChain (incl. agentic patterns)
LlamaIndex
Microsoft GraphRAG (research & patterns)
OpenSearch (hybrid/neural search options)
RAGFlow
Vector DBs with hybrid features (e.g., Milvus, Weaviate)

We do not maintain a comparison matrix. Our system is good enough for our internal use case, and this repo serves as documentation of that design.

Further reading

"HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction" (https://arxiv.org/html/2408.04948v1)

Acknowledgments & disclaimer

Developed by Philipp Mattern and Marcel Telaar as part of a collaboration between INP Greifswald and Xototec.

This software is provided "AS IS", without warranties or conditions of any kind, either express or implied. Use at your own risk.

License

This repository is released under the Apache License, Version 2.0. See LICENSE.

We also recommend including a short NOTICE file and—if you redistribute third-party code within this repo—a THIRD-PARTY-NOTICES file.

Contributing / security / community

We are not accepting feature requests or regular contributions. Security issues may not receive a response. This repository is primarily for documentation and reference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hybrid_Data_RAGfinery

What is this?

Project scope & ground rules

High-level architecture

Components (brief)

Quick start (Docker Compose)

API (thin, pragmatic)

Data model (essentials)

Operational notes

Limitations & caveats

Related tools & frameworks (non-exhaustive)

Acknowledgments & disclaimer

License

Contributing / security / community

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docling_app_container		docling_app_container
docs		docs
initdb		initdb
n8n_import		n8n_import
openwebui		openwebui
rag_application		rag_application
.env		.env
DISCLAIMER.md		DISCLAIMER.md
LICENSE		LICENSE
NOTICE		NOTICE
docker-compose.yml		docker-compose.yml
readme.md		readme.md

License

PiMaV/Hybrid_Data_RAGfinery

Folders and files

Latest commit

History

Repository files navigation

Hybrid_Data_RAGfinery

What is this?

Project scope & ground rules

High-level architecture

Components (brief)

Quick start (Docker Compose)

API (thin, pragmatic)

Data model (essentials)

Operational notes

Limitations & caveats

Related tools & frameworks (non-exhaustive)

Acknowledgments & disclaimer

License

Contributing / security / community

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages