You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/config/yaml.md
-23Lines changed: 0 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -287,29 +287,6 @@ These are the settings used for Leiden hierarchical clustering of the graph to c
287
287
- `max_length`**int** - The maximum number of output tokens per report.
288
288
- `max_input_length`**int** - The maximum number of input tokens to use when generating reports.
289
289
290
-
### embed_graph
291
-
292
-
We use node2vec to embed the graph. This is primarily used for visualization, so it is not turned on by default.
293
-
294
-
#### Fields
295
-
296
-
- `enabled`**bool** - Whether to enable graph embeddings.
297
-
- `dimensions`**int** - Number of vector dimensions to produce.
298
-
- `num_walks`**int** - The node2vec number of walks.
299
-
- `walk_length`**int** - The node2vec walk length.
300
-
- `window_size`**int** - The node2vec window size.
301
-
- `iterations`**int** - The node2vec number of iterations.
302
-
- `random_seed`**int** - The node2vec random seed.
303
-
- `strategy`**dict** - Fully override the embed graph strategy.
304
-
305
-
### umap
306
-
307
-
Indicates whether we should run UMAP dimensionality reduction. This is used to provide an x/y coordinate to each graph node, suitable for visualization. If this is not enabled, nodes will receive a 0/0 x/y coordinate. If this is enabled, you *must* enable graph embedding as well.
308
-
309
-
#### Fields
310
-
311
-
- `enabled`**bool** - Whether to enable UMAP layouts.
Copy file name to clipboardExpand all lines: docs/index/default_dataflow.md
+2-22Lines changed: 2 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,8 +46,7 @@ flowchart TB
46
46
end
47
47
subgraph phase6[Phase 6: Network Visualization]
48
48
graph_outputs --> graph_embed[Graph Embedding]
49
-
graph_embed --> umap_entities[Umap Entities]
50
-
umap_entities --> combine_nodes[Final Entities]
49
+
graph_embed --> combine_nodes[Final Entities]
51
50
end
52
51
subgraph phase7[Phase 7: Text Embeddings]
53
52
textUnits --> text_embed[Text Embedding]
@@ -176,27 +175,8 @@ In this step, we link each document to the text-units that were created in the f
176
175
177
176
At this point, we can export the **Documents** table into the knowledge Model.
178
177
179
-
## Phase 6: Network Visualization (optional)
180
178
181
-
In this phase of the workflow, we perform some steps to support network visualization of our high-dimensional vector spaces within our existing graphs. At this point there are two logical graphs at play: the _Entity-Relationship_ graph and the _Document_ graph.
In this step, we generate a vector representation of our graph using the Node2Vec algorithm. This will allow us to understand the implicit structure of our graph and provide an additional vector-space in which to search for related concepts during our query phase.
194
-
195
-
### Dimensionality Reduction
196
-
197
-
For each of the logical graphs, we perform a UMAP dimensionality reduction to generate a 2D representation of the graph. This will allow us to visualize the graph in a 2D space and understand the relationships between the nodes in the graph. The UMAP embeddings are reduced to two dimensions as x/y coordinates.
198
-
199
-
## Phase 7: Text Embedding
179
+
## Phase 6: Text Embedding
200
180
201
181
For all artifacts that require downstream vector search, we generate text embeddings as a final step. These embeddings are written directly to a configured vector store. By default we embed entity descriptions, text unit text, and community report text.
Copy file name to clipboardExpand all lines: docs/index/methods.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,4 +41,4 @@ You can install it manually by running `python -m spacy download <model_name>`,
41
41
42
42
## Choosing a Method
43
43
44
-
Standard GraphRAG provides a rich description of real-world entities and relationships, but is more expensive that FastGraphRAG. We estimate graph extraction to constitute roughly 75% of indexing cost. FastGraphRAG is therefore much cheaper, but the tradeoff is that the extracted graph is less directly relevant for use outside of GraphRAG, and the graph tends to be quite a bit noisier. If high fidelity entities and graph exploration are important to your use case, we recommend staying with traditional GraphRAG. If your use case is primarily aimed at summary questions using global search, FastGraphRAG provides high quality summarization at much less LLM cost.
44
+
Standard GraphRAG provides a rich description of real-world entities and relationships, but is more expensive than FastGraphRAG. We estimate graph extraction to constitute roughly 75% of indexing cost. FastGraphRAG is therefore much cheaper, but the tradeoff is that the extracted graph is less directly relevant for use outside of GraphRAG, and the graph tends to be quite a bit noisier. If high fidelity entities and graph exploration are important to your use case, we recommend staying with traditional GraphRAG. If your use case is primarily aimed at summary questions using global search, FastGraphRAG provides high quality summarization at much less LLM cost.
0 commit comments