Skip to content

Commit f9b7268

Browse files
committed
MSFT QFS
1 parent 3b049c5 commit f9b7268

File tree

1 file changed

+40
-3
lines changed

1 file changed

+40
-3
lines changed

src/content/docs/reference/graphrag/global-community-summary-retriever.md

Lines changed: 40 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,27 +8,64 @@ tags: ["Advanced"]
88

99
- Microsoft GraphRAG
1010
- Global Retriever
11+
- Query Focused Summarization
1112

1213
## Required Graph Shape
1314

15+
![](../../../../assets/images/knowledge-graph-lexical-graph-extracted-entities-community-summaries.svg)
1416
[Lexical Graph with Extracted Entities and Community Summaries](/reference/knowledge-graph/lexical-graph-extracted-entities-community-summaries)
1517

18+
## Context
19+
20+
Certain questions that can be asked on a whole dataset do not just relate to things present in some chunks but rather search for an overall message that is overarching in the dataset.
21+
The previously mentioned patterns aren’t suited to answer these kinds of “global” questions.
22+
1623
## Description
1724

1825
Given the user question and a given Community level, the Community Summaries are retrieved and given to the LLM.
1926

2027
## Usage
2128

22-
This pattern is useful for questions that have a global character. Examples would be summarizing the content of the whole database or looking for topic structures across the whole data. The effort of setting up the required Graph Pattern is quite high since there are a lot of steps to be taken: entity & relationship extraction, Community detection and Community summarizations. It needs to be considered which of these tasks shall be executed by LLMs and which tasks can be handled differently to keep the pre-processing cost acceptable.
29+
This pattern is useful for questions that have a global character.
30+
Examples would be summarizing the content of the whole database or looking for topic structures across the whole data.
31+
The effort of setting up the required Graph Pattern is quite high since there are a lot of steps to be taken: entity & relationship extraction, Community detection and Community summarizations.
32+
It needs to be considered which of these tasks shall be executed by LLMs and which tasks can be handled differently to keep the pre-processing cost acceptable.
33+
34+
## Required pre-processing
35+
36+
In addition to extracting entities and their relationships, we need to form hierarchical communities within the Domain Graph.
37+
This can be done by using the Louvain or Leiden clustering algorithm. For every community, an LLM summarizes the entity and relationship information into Community Summaries.
38+
39+
## Retrieval query
40+
41+
The default implementation of this approach does not use vector search to fetch communities similar to the question.
42+
It simply fetches all Community Summaries of a given level and passes all of them (possibly in batches) to the LLM.
43+
44+
```
45+
MATCH (c:__Community__)
46+
WHERE c.level = $level
47+
RETURN c.full_content AS output
48+
```
49+
50+
## Variants
51+
52+
There are several variations in which you could use the Lexical Graph with extracted entities, communities, and community summaries:
53+
54+
* A [Local Retriever](../graphrag/local-retriever) could start by executing a vector search on the entity embeddings and traversing to related entities, chunks, or communities (e.g., see Integrating Microsoft GraphRAG into Neo4j).
55+
56+
* Depending on the question, we could also execute a vector similarity search on embeddings of the Community Summaries first to identify which subgraph is relevant for the question, then traverse from the communities to its entities and chunks to retrieve additional information.
57+
58+
* [DRIFT](https://www.microsoft.com/en-us/research/blog/introducing-drift-search-combining-global-and-local-search-methods-to-improve-quality-and-efficiency/) is a multi-stage approach that first executes a generic or vector based community search and then generates additional questions for local search from those results. All results are then re-ranked and used together for generating the final answer.
2359

2460
## Further reading
2561

2662
- [From Local to Global:A Graph RAG Approach to Query-Focused Summarization](https://arxiv.org/pdf/2404.16130) (April 2024)
2763
- [Implementing 'From Local to Global' GraphRAG with Neo4j and LangChain: Constructing the Graph](https://neo4j.com/developer-blog/global-graphrag-neo4j-langchain/) (July 2024)
2864
- [Integrating Microsoft GraphRAG into Neo4j](https://towardsdatascience.com/integrating-microsoft-graphrag-into-neo4j-e0d4fa00714c) (July 2024)
65+
- [Introducing DRIFT Search: Combining global and local search methods to improve quality and efficiency - Microsoft Research](https://www.microsoft.com/en-us/research/blog/introducing-drift-search-combining-global-and-local-search-methods-to-improve-quality-and-efficiency/)
2966

3067
## Existing Implementations
3168

32-
## Example Implementations
33-
3469
- [Microsoft GraphRAG](https://github.com/microsoft/graphrag)
70+
<!-- not supported yet [Neo4j GraphRAG]() -->
71+
- [LLM Knowledge Graph Builder with GDS](https://neo4j.com/labs/genai-ecosystem/llm-graph-builder/)

0 commit comments

Comments
 (0)