You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/config/yaml.md
+5-6Lines changed: 5 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,7 +66,7 @@ models:
66
66
- `parallelization_num_threads`**int** - The maximum number of work threads.
67
67
- `async_mode`**asyncio|threaded** The async mode to use. Either `asyncio` or `threaded.
68
68
69
-
### embeddings
69
+
### embed_text
70
70
71
71
By default, the GraphRAG indexer will only export embeddings required for our query methods. However, the model has embeddings defined for all plaintext fields, and these can be customized by setting the `target` and `names` fields.
72
72
@@ -143,7 +143,7 @@ This section controls the cache mechanism used by the pipeline. This is used to
143
143
- `base_dir`**str** - The base directory to write cache to, relative to the root.
144
144
- `storage_account_blob_url`**str** - The storage account blob URL to use.
145
145
146
-
### storage
146
+
### output
147
147
148
148
This section controls the storage mechanism used by the pipeline used for exporting output tables.
149
149
@@ -179,7 +179,7 @@ This section controls the reporting mechanism used by the pipeline, for common e
179
179
- `base_dir`**str** - The base directory to write reports to, relative to the root.
180
180
- `storage_account_blob_url`**str** - The storage account blob URL to use.
- noun_phrase_tags **list[str]** - List of noun phrase tags to ignore.
216
216
- noun_phrase_grammars **dict[str, str]** - Noun phrase grammars for the model (cfg-only).
217
217
218
-
### claim_extraction
218
+
### extract_claims
219
219
220
220
#### Fields
221
221
@@ -286,7 +286,6 @@ Indicates whether we should run UMAP dimensionality reduction. This is used to p
286
286
287
287
- `embeddings`**bool** - Export embeddings snapshots to parquet.
288
288
- `graphml`**bool** - Export graph snapshots to GraphML.
289
-
- `transient`**bool** - Export transient workflow tables snapshots to parquet.
290
289
291
290
## Query
292
291
@@ -376,4 +375,4 @@ Indicates whether we should run UMAP dimensionality reduction. This is used to p
376
375
377
376
### workflows
378
377
379
-
**str** - This is a list of workflow names to run, in order. GraphRAG has built-in pipelines to configure this, but you can run exactly and only what you want by specifying the list here. Useful if you have done part of the processing yourself.
378
+
**list[str]** - This is a list of workflow names to run, in order. GraphRAG has built-in pipelines to configure this, but you can run exactly and only what you want by specifying the list here. Useful if you have done part of the processing yourself.
Copy file name to clipboardExpand all lines: docs/index/outputs.md
+10-21Lines changed: 10 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ All tables have two identifier fields:
10
10
| id | str | Generated UUID, assuring global uniqueness |
11
11
| human_readable_id | int | This is an incremented short ID created per-run. For example, we use this short ID with generated summaries that print citations so they are easy to cross-reference visually. |
12
12
13
-
## create_final_communities
13
+
## communities
14
14
This is a list of the final communities generated by Leiden. Communities are strictly hierarchical, subdividing into children as the cluster affinity is narrowed.
15
15
16
16
| name | type | description |
@@ -25,7 +25,7 @@ This is a list of the final communities generated by Leiden. Communities are str
25
25
| period | str | Date of ingest, used for incremental update merges. ISO8601 |
26
26
| size | int | Size of the community (entity count), used for incremental update merges. |
27
27
28
-
## create_final_community_reports
28
+
## community_reports
29
29
This is the list of summarized reports for each community.
30
30
31
31
| name | type | description |
@@ -43,7 +43,7 @@ This is the list of summarized reports for each community.
43
43
| period | str | Date of ingest, used for incremental update merges. ISO8601 |
44
44
| size | int | Size of the community (entity count), used for incremental update merges. |
45
45
46
-
## create_final_covariates
46
+
## covariates
47
47
(Optional) If claim extraction is turned on, this is a list of the extracted covariates. Note that claims are typically oriented around identifying malicious behavior such as fraud, so they are not useful for all datasets.
48
48
49
49
| name | type | description |
@@ -59,7 +59,7 @@ This is the list of summarized reports for each community.
59
59
| source_text | str | Short string of text containing the claimed behavior. |
60
60
| text_unit_id | str | ID of the text unit the claim text was extracted from. |
61
61
62
-
## create_final_documents
62
+
## documents
63
63
List of document content after import.
64
64
65
65
| name | type | description |
@@ -69,7 +69,7 @@ List of document content after import.
69
69
| text_unit_ids | str[]| List of text units (chunks) that were parsed from the document. |
70
70
| metadata | dict | (optional) If specified during CSV import, this is a dict of metadata for the document. |
71
71
72
-
## create_final_entities
72
+
## entities
73
73
List of all entities found in the data by the LM.
74
74
75
75
| name | type | description |
@@ -78,22 +78,11 @@ List of all entities found in the data by the LM.
78
78
| type | str | Type of the entity. By default this will be "organization", "person", "geo", or "event" unless configured differently or auto-tuning is used. |
79
79
| description | str | Textual description of the entity. Entities may be found in many text units, so this is an LM-derived summary of all descriptions. |
80
80
| text_unit_ids | str[]| List of the text units containing the entity. |
81
+
| degree | int | Node degree (connectedness) in the graph. |
82
+
| x | float | X position of the node for visual layouts. If graph embeddings and UMAP are not turned on, this will be 0. |
83
+
| y | float | Y position of the node for visual layouts. If graph embeddings and UMAP are not turned on, this will be 0. |
81
84
82
-
## create_final_nodes
83
-
This is graph-related information for the entities. It contains only information relevant to the graph such as community. There is an entry for each entity at every community level it is found within, so you may see "duplicate" entities.
84
-
85
-
Note that the ID fields match those in create_final_entities and can be used for joining if additional information about a node is required.
86
-
87
-
| name | type | description |
88
-
| --------- | ----- | ----------- |
89
-
| title | str | Name of the referenced entity. Duplicated from create_final_entities for convenient cross-referencing. |
90
-
| community | int | Leiden community the node is found within. Entities are not always assigned a community (they may not be close enough to any), so they may have a ID of -1. |
91
-
| level | int | Level of the community the entity is in. |
92
-
| degree | int | Node degree (connectedness) in the graph. |
93
-
| x | float | X position of the node for visual layouts. If graph embeddings and UMAP are not turned on, this will be 0. |
94
-
| y | float | Y position of the node for visual layouts. If graph embeddings and UMAP are not turned on, this will be 0. |
95
-
96
-
## create_final_relationships
85
+
## relationships
97
86
List of all entity-to-entity relationships found in the data by the LM. This is also the _edge list_ for the graph.
98
87
99
88
| name | type | description |
@@ -105,7 +94,7 @@ List of all entity-to-entity relationships found in the data by the LM. This is
105
94
| combined_degree | int | Sum of source and target node degrees. |
106
95
| text_unit_ids | str[]| List of text units the relationship was found within. |
107
96
108
-
## create_final_text_units
97
+
## text_units
109
98
List of all text chunks parsed from the input documents.
0 commit comments