You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/content/docs/reference/graphrag/hypothetical-question-retriever.md
+34-3Lines changed: 34 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,16 +7,38 @@ tags: ["Basic"]
7
7
## Required Graph Shape
8
8
9
9
[Lexical Graph with Hypothetical Questions](/reference/knowledge-graph/lexical-graph-hypothetical-questions)
10
+

11
+
12
+
## Context:
13
+
14
+
The vector similarity between a question’s embedding and the text embedding of an appropriate answer or text source might be quite low.
15
+
If we have question-chunk pairs available, we can execute a vector similarity search on the question embeddings, which will probably deliver much better results than a vector similarity search on the original text chunk.
10
16
11
17
## Description
12
18
13
-
The user question is embedded using the same embedder that has been used before to create the question embeddings. A vector similarity search is executed on the previously generated questions. k (number previously configured by developer / user) most similar questions are found and their related Chunks are retrieved.
19
+
The user question is embedded using the same embedder that has been used before to create the question embeddings.
20
+
A vector similarity search is executed on the previously generated questions. `k` (number previously configured by developer / user) most similar questions are found and their related Chunks are retrieved.
14
21
15
22
## Usage
16
23
17
-
This pattern can yield better results in the vector similarity search than a question-to-chunk similarity search as used in e.g. [Basic Retrievers](/reference/graphrag/basic-retriever) or [Parent-Child Retrievers](/reference/graphrag/parent-child-retriever). However, it also requires more pre-processing effort and cost in LLM calls for the question generation.
24
+
This pattern can yield better results in the vector similarity search than a question-to-chunk similarity search as used in e.g. [Basic Retrievers](/reference/graphrag/basic-retriever) or [Parent-Child Retrievers](/reference/graphrag/parent-child-retriever).
25
+
However, it also requires more pre-processing effort and cost in LLM calls for the question generation.
26
+
27
+
## Required pre-processing
28
+
29
+
Use an LLM to generate hypothetical questions answered within the chunks.
30
+
Embed the question using an embedding model.
31
+
Record the relationship between questions and the chunk that contains their answer.
32
+
33
+
## Retrieval Query
18
34
19
-
## Further reading
35
+
```cypher
36
+
MATCH (node)<-[:HAS_QUESTION]-(chunk)
37
+
WITH chunk, max(score) AS score // deduplicate chunks
38
+
RETURN chunk.text AS text, score, {} AS metadata
39
+
```
40
+
41
+
## Resources
20
42
21
43
-[Implementing advanced RAG strategies with Neo4j](https://blog.langchain.dev/implementing-advanced-retrieval-rag-strategies-with-neo4j/) (November 2023)
22
44
@@ -27,3 +49,12 @@ This pattern can yield better results in the vector similarity search than a que
The *Hypothetical Question Retriever* is in a way quite similar to the Hypothetical Document Embeddings (HyDE) Retriever (see [RAG using LangChain: Part 5 — Hypothetical Document Embeddings](https://jayant017.medium.com/rag-using-langchain-part-5-hypothetical-document-embeddings-hyde-050f57dfc252)).
57
+
The main idea behind them is to increase the similarity between the user question and the available text by moving them into a similar region of the vector space.
58
+
In the Hypothetical Question Retriever, we generate hypothetical questions that the user question is matched against
59
+
Whereas in the HyDE retriever, the LLM generates a hypothetical answer to the user question (without using the grounding database) and, subsequently, the hypothetical answer is matched against the actual chunks in the database to find the best fit.
60
+
We aren’t looking at the HyDE retriever in more detail since it lives in the pre-processing RAG phase, rather than in the retrieval phase, and also does not require a specific kind of underlying graph pattern.
0 commit comments