Skip to content

Commit 5323c74

Browse files
committed
Graph-Enhanced Vector Search
1 parent 52998a1 commit 5323c74

File tree

1 file changed

+42
-2
lines changed

1 file changed

+42
-2
lines changed

src/content/docs/reference/graphrag/graph-enhanced-vector-search.md

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,55 @@ tags: ["Advanced"]
1111

1212
## Required Graph Shape
1313

14+
![](../../../../assets/images/knowledge-graph-lexical-graph-extracted-entities.svg)
1415
[Lexical Graph with Extracted Entities](/reference/knowledge-graph/lexical-graph-extracted-entities)
1516

17+
## Context
18+
19+
The biggest problem with basic GraphRAG patterns is finding all relevant context necessary to answer a question.
20+
The context can be spread across many chunks not being found by the search.
21+
Relating the real-world entities from the chunks to each other and retrieving these relationships together with a vector search provides additional context about these entities that the chunks refer to.
22+
They can also be used to relate chunks to each other through the entity network.
23+
1624
## Description
1725

18-
The user question is embedded using the same embedder that has been used before to create embeddings. A vector similarity search is executed on the Chunk embeddings to find k (number previously configured by developer / user) most similar Chunks. A traversal starting at the found chunks is executed to retrieve more context.
26+
The user question is embedded using the same embedder that has been used before to create embeddings.
27+
A vector similarity search is executed on the Chunk embeddings to find k (number previously configured by developer / user) most similar Chunks.
28+
A traversal starting at the found chunks is executed to retrieve more context.
1929

2030
## Usage
2131

22-
This pattern is useful for retrieving more enriched context than the results of executing only a vector search as in e.g. [Basic Retrievers](/reference/graphrag/basic-retriever) or [Parent-Child Retrievers](/reference/graphrag/parent-child-retriever). The additional traversal retrieves the interaction of entities within the provided data which reveals much richer information than the retrieval of specific text chunks. Naturally, the preprocessing for this GraphRAG pattern is much more tedious and expensive. Furthermore, the amount of context that is returned by the Graph Traversal might be too much to handle for an LLM.
32+
This pattern is useful for retrieving more enriched context than the results of executing only a vector search as in e.g. [Basic Retrievers](/reference/graphrag/basic-retriever) or [Parent-Child Retrievers](/reference/graphrag/parent-child-retriever).
33+
The additional traversal retrieves the interaction of entities within the provided data which reveals much richer information than the retrieval of specific text chunks.
34+
Naturally, the preprocessing for this GraphRAG pattern is effort.
35+
Furthermore, the amount of context that is returned by the Graph Traversal can be much larger context which the LLM needs be able to process.
36+
37+
## Required pre-processing
38+
39+
Use an LLM to execute entity and relationship extraction on the chunks. Import the retrieved triples into the graph.
40+
41+
## Retrieval Query
42+
43+
```cypher
44+
MATCH (node)-[:PART_OF]->(d:Document)
45+
CALL { WITH node
46+
MATCH (node)-[:HAS_ENTITY]->(e)
47+
MATCH path=(e)(()-[rels:!HAS_ENTITY&!PART_OF]-()){0,2}(:!Chunk&!Document)
48+
49+
RETURN …}
50+
RETURN …
51+
```
52+
53+
## Variants
54+
55+
There are some variations of this retriever:
56+
57+
* **Entity disambiguation** — A naive Entity Extraction Pipeline will pull out any entities from texts. However, multiple entities might actually be referred to differently in the text but mean the same real-world entity. To keep the graph clean, an entity disambiguation step can be executed, where these entities are merged. Possible ways of doing this are described in [Implementing ‘From Local to Global’ GraphRAG with Neo4j and LangChain: Constructing the Graph](https://neo4j.com/developer-blog/global-graphrag-neo4j-langchain/) and [Entity Linking and Relationship Extraction With Relik in LlamaIndex](https://neo4j.com/developer-blog/entity-linking-relationship-extraction-relik-llamaindex/).
58+
59+
* **Question-guided/Schema-defined extraction** — Instead of letting the LLM extract any kinds of entities and relationships, provide a set of questions or a fixed schema to guide the LLM to extract only the domain knowledge that is relevant for the application. This approach will narrow down the scope and the volume of the extraction (e.g., [Introducing WhyHow.AI Open-Source Knowledge Graph Schema Library — Start Experimenting Faster](https://medium.com/enterprise-rag/introducing-whyhow-ai-open-source-knowledge-graph-schema-library-start-experimenting-faster-0d836b76efe6)).
60+
61+
* **Entity embeddings** — When extracting the entities and the relationships using an LLM, we can instruct the LLM to also create/extract entity and relationship descriptions. These can be embedded and subsequently be used for the initial vector search and other guidance during traversal.
62+
* **Ontology-driven traversal** — Instead of hard-coding a traversal into your application code, you can provide an ontology for the traversal. This approach is explained in [Going meta — Ep 24: KG+LLMs: Ontology driven RAG patterns](https://www.youtube.com/watch?v=5_WXr0GtVas&list=PL9Hl4pk2FsvX-5QPvwChB-ni_mFF97rCE&index=5).
2363

2464
## Further reading
2565

0 commit comments

Comments
 (0)