Skip to content

Commit 25ba696

Browse files
committed
Update parent child retriever
1 parent 228fe97 commit 25ba696

File tree

1 file changed

+43
-2
lines changed

1 file changed

+43
-2
lines changed

src/content/docs/reference/graphrag/parent-child-retriever.md

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,43 @@ tags: ["Basic"]
1010

1111
## Required Graph Shape
1212

13+
![Parent-Child Lexical Graph](../../../../assets/images/knowledge-graph-lexical-graph-parent-child.svg)
1314
[Parent-Child Lexical Graph](/reference/knowledge-graph/lexical-graph-parent-child)
1415

16+
## Context
17+
18+
Text embeddings represent a text’s semantic meaning.
19+
A more narrow piece of text will yield a more meaningful vector representation since there is less noise from multiple topics.
20+
However, if the LLM only receives a small piece of information for answer generation, the information might be missing context.
21+
Retrieving the broader surrounding text that the found information resides within solves the problem.
22+
1523
## Description
1624

17-
The user question is embedded using the same embedder that has been used before to create the Chunk embeddings. A vector similarity search is executed on the Child Chunk embeddings to find k (number previously configured by developer / user) most similar Chunks. The Parent Chunks of the found Child Chunks are retrieved.
25+
The user question is embedded using the same embedder that has been used before to create the Chunk embeddings.
26+
A vector similarity search is executed on the Child Chunk embeddings to find k (number previously configured by developer / user) most similar Chunks.
27+
The Parent of the found Child Chunks are retrieved and additional metadata from the parent returned.
28+
Optionally chunks for the same parent are aggregated and their scores are averaged or picked by max.
1829

1930
## Usage
2031

21-
This pattern is a useful evolution of the [Basic Retriever](/reference/graphrag/basic-retriever). It is especially useful when several topics are covered in a chunk which subsequently influence the embedding negatively while smaller chunks will have more meaningful vector representations which can then lead to better similarity search results. With limited additional effort, better results can be obtained.
32+
This pattern is a useful evolution of the [Basic Retriever](/reference/graphrag/basic-retriever).
33+
It is especially useful when several topics are covered in a chunk which subsequently influence the embedding negatively while smaller chunks will have more meaningful vector representations which can then lead to better similarity search results.
34+
With limited additional effort, better results can be obtained.
35+
36+
## Required pre-processing
2237

38+
Split documents into (bigger) chunks (parent chunks) and further split these chunks into smaller chunks (child chunks).
39+
Use an embedding model to embed the text content of the child chunks.
40+
Note that it isn’t necessary to embed the parent chunks since they are only used for the answer generation and not for the similarity search.
41+
42+
## Retrieval Query
43+
44+
```cypher
45+
MATCH (node)<-[:HAS_CHILD]-(parent)
46+
WITH parent, collect(node.text) as chunks, max(score) AS score // deduplicate parents
47+
RETURN parent.title + reduce(r="", c in chunks | r + "\n\n" + c.text) AS text,
48+
score, {source:parent.url} AS metadata
49+
```
2350
## Further reading
2451

2552
- [Advanced Retriever Techniques to Improve Your RAGs](https://towardsdatascience.com/advanced-retriever-techniques-to-improve-your-rags-1fac2b86dd61) (Damian Gil, April 2024)
@@ -32,3 +59,17 @@ This pattern is a useful evolution of the [Basic Retriever](/reference/graphrag/
3259
## Example Implementations
3360

3461
- [Langchain Templates: Neo4j Advanced RAG](https://github.com/langchain-ai/langchain/blob/master/templates/neo4j-advanced-rag/neo4j_advanced_rag/retrievers.py)
62+
63+
## Similar Patterns
64+
65+
Similar patterns can be implemented on [Lexical Graphs With a Sibling Structure](../knowledge-graph/lexical-graph-sibling-structure) or [Lexical Graphs With a Hierarchical Structure](../knowledge-graph/lexical-graph-hierarchical-structure), where the additional context does not come from retrieving just the parent document but sibling documents or a previously set depth of structures.
66+
The Lexical Graph With Sibling Structure is, for example, currently implemented in [Neo4j’s LLM Knowledge Graph Builder](https://neo4j.com/labs/genai-ecosystem/llm-graph-builder/).
67+
68+
Note that there are two kinds of retrievers possible on a Lexical Graph With a Hierarchical Structure:
69+
70+
* **Bottom-up**: Execute retrieval on leaf nodes and retrieve other chunks higher up in the tree (see [Going Meta — Ep 24: KG+LLMs: Ontology driven RAG patterns](https://www.youtube.com/watch?v=5_WXr0GtVas&list=PL9Hl4pk2FsvX-5QPvwChB-ni_mFF97rCE&index=5))
71+
* **Top-down**: Use the top-level nodes to determine which subtree(s) to consider for retrieval. Iterate this methodology until the set of nodes for the similarity search is reasonably narrowed down (see [RAG Strategies — Hierarchical Index Retrieval](https://pixion.co/blog/rag-strategies-hierarchical-index-retrieval)).
72+
73+
![Lexical Graph With Sibling Structure](../../../../assets/images/knowledge-graph-lexical-graph-sibling-structure.svg)
74+
75+
![Lexical Graph With Hierarchical Structure](../../../../assets/images/knowledge-graph-lexical-graph-hierarchical-structure.svg)

0 commit comments

Comments
 (0)