Skip to content

Commit a5c970a

Browse files
Merge branch 'main' into feat/optimize-community-reports
2 parents 22ce24c + dad2176 commit a5c970a

File tree

104 files changed

+1238
-523
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

104 files changed

+1238
-523
lines changed
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{
2+
"type": "patch",
3+
"description": "Fix question gen."
4+
}
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{
2+
"type": "patch",
3+
"description": "miscellaneous code cleanup and minor changes for better alignment of style across the codebase."
4+
}

dictionary.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ getcwd
2424
fillna
2525
noqa
2626
dtypes
27+
ints
2728

2829
# Azure
2930
abfs
@@ -167,6 +168,7 @@ FIRUZABAD
167168
Krohaara
168169
KROHAARA
169170
POKRALLY
171+
René
170172
Tazbah
171173
TIRUZIA
172174
Tiruzia

docs/blog_posts.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,4 +38,10 @@
3838

3939
By Bryan Li, Research Intern; [Ha Trinh](https://www.microsoft.com/en-us/research/people/trinhha/), Senior Data Scientist; [Darren Edge](https://www.microsoft.com/en-us/research/people/daedge/), Senior Director; [Jonathan Larson](https://www.microsoft.com/en-us/research/people/jolarso/), Senior Principal Data Architect</h6>
4040

41+
- [:octicons-arrow-right-24: __LazyGraphRAG: Setting a new standard for quality and cost__](https://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/)
42+
43+
---
44+
<h6>Published November 25, 2024
45+
46+
By [Darren Edge](https://www.microsoft.com/en-us/research/people/daedge/), Senior Director; [Ha Trinh](https://www.microsoft.com/en-us/research/people/trinhha/), Senior Data Scientist; [Jonathan Larson](https://www.microsoft.com/en-us/research/people/jolarso/), Senior Principal Data Architect</h6>
4147
</div>

docs/config/env_vars.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Text-Embeddings Customization
44

5-
By default, the GraphRAG indexer will only emit embeddings required for our query methods. However, the model has embeddings defined for all plaintext fields, and these can be generated by setting the `GRAPHRAG_EMBEDDING_TARGET` environment variable to `all`.
5+
By default, the GraphRAG indexer will only export embeddings required for our query methods. However, the model has embeddings defined for all plaintext fields, and these can be generated by setting the `GRAPHRAG_EMBEDDING_TARGET` environment variable to `all`.
66

77
If the embedding target is `all`, and you want to only embed a subset of these fields, you may specify which embeddings to skip using the `GRAPHRAG_EMBEDDING_SKIP` argument described below.
88

@@ -152,7 +152,7 @@ These settings control the data input used by the pipeline. Any settings with a
152152

153153
## Storage
154154

155-
This section controls the storage mechanism used by the pipeline used for emitting output tables.
155+
This section controls the storage mechanism used by the pipeline used for exporting output tables.
156156

157157
| Parameter | Description | Type | Required or Optional | Default |
158158
| ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- | -------------------- | ------- |

docs/config/yaml.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ This is the base LLM configuration section. Other steps may override this config
6767
- `async_mode` (see Async Mode top-level config)
6868
- `batch_size` **int** - The maximum batch size to use.
6969
- `batch_max_tokens` **int** - The maximum batch # of tokens.
70-
- `target` **required|all|none** - Determines which set of embeddings to emit.
70+
- `target` **required|all|none** - Determines which set of embeddings to export.
7171
- `skip` **list[str]** - Which embeddings to skip. Only useful if target=all to customize the list.
7272
- `vector_store` **dict** - The vector store to use. Configured for lancedb by default.
7373
- `type` **str** - `lancedb` or `azure_ai_search`. Default=`lancedb`
@@ -203,7 +203,7 @@ This is the base LLM configuration section. Other steps may override this config
203203

204204
#### Fields
205205

206-
- `max_cluster_size` **int** - The maximum cluster size to emit.
206+
- `max_cluster_size` **int** - The maximum cluster size to export.
207207
- `strategy` **dict** - Fully override the cluster_graph strategy.
208208

209209
### embed_graph
@@ -228,11 +228,11 @@ This is the base LLM configuration section. Other steps may override this config
228228

229229
#### Fields
230230

231-
- `embeddings` **bool** - Emit embeddings snapshots to parquet.
232-
- `graphml` **bool** - Emit graph snapshots to GraphML.
233-
- `raw_entities` **bool** - Emit raw entity snapshots to JSON.
234-
- `top_level_nodes` **bool** - Emit top-level-node snapshots to JSON.
235-
- `transient` **bool** - Emit transient workflow tables snapshots to parquet.
231+
- `embeddings` **bool** - Export embeddings snapshots to parquet.
232+
- `graphml` **bool** - Export graph snapshots to GraphML.
233+
- `raw_entities` **bool** - Export raw entity snapshots to JSON.
234+
- `top_level_nodes` **bool** - Export top-level-node snapshots to JSON.
235+
- `transient` **bool** - Export transient workflow tables snapshots to parquet.
236236

237237
### encoding_model
238238

docs/examples_notebooks/global_search.ipynb

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -75,9 +75,9 @@
7575
"source": [
7676
"### Load community reports as context for global search\n",
7777
"\n",
78-
"- Load all community reports in the `create_final_community_reports` table from the ire-indexing engine, to be used as context data for global search.\n",
79-
"- Load entities from the `create_final_nodes` and `create_final_entities` tables from the ire-indexing engine, to be used for calculating community weights for context ranking. Note that this is optional (if no entities are provided, we will not calculate community weights and only use the rank attribute in the community reports table for context ranking)\n",
80-
"- Load all communities in the `create_final_communites` table from the ire-indexing engine, to be used to reconstruct the community graph hierarchy for dynamic community selection."
78+
"- Load all community reports in the `create_final_community_reports` table from the GraphRAG, to be used as context data for global search.\n",
79+
"- Load entities from the `create_final_nodes` and `create_final_entities` tables from the GraphRAG, to be used for calculating community weights for context ranking. Note that this is optional (if no entities are provided, we will not calculate community weights and only use the rank attribute in the community reports table for context ranking)\n",
80+
"- Load all communities in the `create_final_communites` table from the GraphRAG, to be used to reconstruct the community graph hierarchy for dynamic community selection."
8181
]
8282
},
8383
{
@@ -379,21 +379,23 @@
379379
"text": [
380380
"### Overview of Cosmic Vocalization\n",
381381
"\n",
382-
"Cosmic Vocalization is a phenomenon that has garnered significant attention from various individuals and groups. It is perceived as a cosmic event with potential implications for security and interstellar communication. The Paranormal Military Squad is actively engaged with Cosmic Vocalization, indicating its strategic importance in security measures [Data: Reports (6)].\n",
382+
"Cosmic Vocalization is a phenomenon that has garnered significant attention within the community, involving various individuals and groups. It is perceived as an interstellar event with potential implications for both communication and security.\n",
383383
"\n",
384-
"### Key Perspectives and Concerns\n",
384+
"### Key Perspectives\n",
385385
"\n",
386-
"1. **Strategic Engagement**: The Paranormal Military Squad's involvement suggests that Cosmic Vocalization is not only a subject of interest but also a matter of strategic importance. This engagement highlights the potential security implications of these cosmic phenomena [Data: Reports (6)].\n",
386+
"**Alex Mercer's Viewpoint** \n",
387+
"Alex Mercer perceives Cosmic Vocalization as part of an interstellar duet, suggesting that it may be a responsive or communicative event. This perspective highlights the potential for Cosmic Vocalization to be part of a larger cosmic interaction or dialogue [Data: Reports (6)].\n",
387388
"\n",
388-
"2. **Community Interest**: Within the community, Cosmic Vocalization is a focal point of interest. Alex Mercer, for instance, perceives it as part of an interstellar duet, which suggests a responsive and perhaps communicative approach to these cosmic events [Data: Reports (6)].\n",
389+
"**Taylor Cruz's Concerns** \n",
390+
"Taylor Cruz raises concerns about the nature of Cosmic Vocalization, fearing it might be a homing tune. This adds a layer of urgency and potential threat, as it suggests that the vocalization could be attracting attention from unknown entities or forces [Data: Reports (6)].\n",
389391
"\n",
390-
"3. **Potential Threats**: Concerns have been raised by individuals like Taylor Cruz, who fears that Cosmic Vocalization might be a homing tune. This perspective adds a layer of urgency and suggests that there may be potential threats associated with these cosmic sounds [Data: Reports (6)].\n",
392+
"### Involvement of the Paranormal Military Squad\n",
391393
"\n",
392-
"### Metaphorical Interpretation\n",
394+
"The Paranormal Military Squad is actively engaged with Cosmic Vocalization, indicating its significance in security measures. Their involvement suggests that the phenomenon is not only of scientific interest but also of strategic importance, potentially impacting national or global security [Data: Reports (6)].\n",
393395
"\n",
394-
"The Universe is metaphorically treated as a concert hall by the Paranormal Military Squad, which suggests a broader perspective on how cosmic events are interpreted and responded to by human entities. This metaphorical view may influence how strategies and responses are formulated in relation to Cosmic Vocalization [Data: Reports (6)].\n",
396+
"### Conclusion\n",
395397
"\n",
396-
"In summary, Cosmic Vocalization is a complex phenomenon involving strategic, communicative, and potentially threatening elements. The involvement of the Paranormal Military Squad and the concerns raised by community members underscore its significance and the need for careful consideration of its implications.\n"
398+
"Cosmic Vocalization is a complex and multifaceted phenomenon that involves various stakeholders, each with their own perspectives and concerns. The involvement of both individuals like Alex Mercer and Taylor Cruz, as well as organized groups like the Paranormal Military Squad, underscores its importance and the need for further investigation and understanding.\n"
397399
]
398400
}
399401
],
@@ -638,7 +640,7 @@
638640
"name": "stdout",
639641
"output_type": "stream",
640642
"text": [
641-
"LLM calls: 2. Prompt tokens: 11292. Output tokens: 606.\n"
643+
"LLM calls: 2. Prompt tokens: 11237. Output tokens: 483.\n"
642644
]
643645
}
644646
],
@@ -652,7 +654,7 @@
652654
],
653655
"metadata": {
654656
"kernelspec": {
655-
"display_name": "graphrag",
657+
"display_name": ".venv",
656658
"language": "python",
657659
"name": "python3"
658660
},
@@ -666,7 +668,7 @@
666668
"name": "python",
667669
"nbconvert_exporter": "python",
668670
"pygments_lexer": "ipython3",
669-
"version": "3.12.5"
671+
"version": "3.11.9"
670672
}
671673
},
672674
"nbformat": 4,

0 commit comments

Comments
 (0)