[Proposal] An unified RAG solution that combines structured and unstructured data #196

subbaksh · 2025-08-14T13:48:51Z

subbaksh
Aug 14, 2025
Maintainer

Current architecture

Currently we have two distinct approaches to RAG based on the type of data.

For unstructured data (usually text, html etc.), we generate embeddings and store in a Vector DB (Milvus). When querying we convert the query to embeddings, and do a similarity search in the vector space to get the most relavant chunks.

For structured data (usually JSON, YAML etc.) we store them in a Labeled property graph (Neo4J), we then run analyse the data using LLM to generate the relations. When querying, an agent has access to the database to do queries and traverse the graph.

Problems

Both approaches has worked decently in isolation and for the data they support and the use-cases targeted by AI platform engineer.

However when the two RAG systems are combined under a MAS, the agent has trouble figuring out which RAG system to query.
The two disparate system fails especially where there is unstructured data that relates to structured data (as is the case in a lot of platform documentation, where an account id, or team name is referenced).
In addition, the architecture is complicated as multiple systems need to be maintained and the burden is on user to give the right data to the right system.

Proposed architecture

In this discussion, we propose to create a Unified system to do RAG thats specialised for Platform engineering.

The system will have the following for ingestion/indexing:

Support for manual upload
Support for URL, Markdown docs, Docusaurus etc.
Support for "connectors" - these are programs that can be deploy in a company's infrastructure to collect structured data from APIs (e.g. K8s, AWS, Backstage etc.)

The system will have the following for querying:

Using vectordb to do a sematic search for query
The chunks from semantic search can be used to do a "graph search" , and its nearest connections

The following diagram shows how this can be done:

subbaksh · 2025-10-01T15:17:18Z

subbaksh
Oct 1, 2025
Maintainer Author

PR proposed here: #329

0 replies

subbaksh · 2025-10-13T09:41:19Z

subbaksh
Oct 13, 2025
Maintainer Author

This has been implemented in the PR mentioned

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] An unified RAG solution that combines structured and unstructured data #196

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Proposal] An unified RAG solution that combines structured and unstructured data #196

Uh oh!

subbaksh Aug 14, 2025 Maintainer

Current architecture

Problems

Proposed architecture

Replies: 2 comments

Uh oh!

subbaksh Oct 1, 2025 Maintainer Author

Uh oh!

subbaksh Oct 13, 2025 Maintainer Author

subbaksh
Aug 14, 2025
Maintainer

subbaksh
Oct 1, 2025
Maintainer Author

subbaksh
Oct 13, 2025
Maintainer Author