- 
                Notifications
    
You must be signed in to change notification settings  - Fork 12
 
Description
The traditional RAG approach has difficulty in extracting complex relationships or overarching themes from the source material due to chunking and later only retrieving some of these chunks. This limits the usefulness of RAG for more complex and in-depth topics that cannot be solved by retrieving only chunks of the source material. Also, real-world data sources may contain conflicting and unreliable information which may confuse an LLM trying to generate an answer without being aware of the broader context.
Knowledge graphs can help solve this issue by incrementally building a graph structure from the source data where each edge in the graph represents a contextual relationships between separate sets of facts or topics. This methodology allows not only retrieving the relevant chunk of source material but also the relevant context for that source material.
Potential use cases
- An expert RAG agent on a constrained topic with complex factual relationships
- A domain expert on scientific or technical topics with relevant literature as source material
 - This is something I've personally found that traditional RAG is almost useless in since the relationships between different source documents are actually the interesting bits of information
 - A support bot based on 'dirty' datasets like old/existing support logs where some/most of the information may be unreliable or dated
 
 - A "project-based" expert RAG
- Suppose a team works on a project (or separate projects) adding material to the RAG data source over time. Some of this material may be conflicting and ambiguous or even change as the project progresses. This is the type of real-world data in which graph rag would excel in finding over-arching relationships and themes. Given suitable source material (besides files these could be todo-lists, calendar entries etc.), you could ask the context chat questions like: What is the current state of the project? What is currently the most important blocking issue for the completion of the project?
 
 
Difficulties/Limitations
- Inserting data into a knowledge graph requires LLM processing
- Is the back-end app currently able to access LLM providers?
 - Considerably more processing is required for building the knowledge graphs vs building a vector db which limits using knowledge graphs for all users' data.
 
 - Seamlessly updating existing information in knowledge graphs may not be possible currently (Incremental indexing (adding new content) microsoft/graphrag#741), although knowledge graphs can apparently deal with conflicting information quite well even when it is added incrementally.
 
P.S. I'm working 50% currently so I'm available for discussions or just catching up!
EDIT: A good basic explanation of the concepts involved: https://www.youtube.com/watch?v=6vG_amAshTk