Skip to content

Commit a004fa8

Browse files
committed
Merged with main.
2 parents fa0d911 + 1da1380 commit a004fa8

File tree

166 files changed

+2869
-2475
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

166 files changed

+2869
-2475
lines changed

.semversioner/2.4.0.json

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
{
2+
"changes": [
3+
{
4+
"description": "Allow injection of custom pipelines.",
5+
"type": "minor"
6+
},
7+
{
8+
"description": "Refactored StorageFactory to use a registration-based approach",
9+
"type": "minor"
10+
},
11+
{
12+
"description": "Fix default values for tpm and rpm limiters on embeddings",
13+
"type": "patch"
14+
},
15+
{
16+
"description": "Update typer.",
17+
"type": "patch"
18+
},
19+
{
20+
"description": "cleaned up logging to follow python standards.",
21+
"type": "patch"
22+
}
23+
],
24+
"created_at": "2025-07-15T00:04:15+00:00",
25+
"version": "2.4.0"
26+
}

.semversioner/next-release/patch-20250530204951787463.json

Lines changed: 0 additions & 4 deletions
This file was deleted.

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,14 @@
11
# Changelog
22
Note: version releases in the 0.x.y range may introduce breaking changes.
33

4+
## 2.4.0
5+
6+
- minor: Allow injection of custom pipelines.
7+
- minor: Refactored StorageFactory to use a registration-based approach
8+
- patch: Fix default values for tpm and rpm limiters on embeddings
9+
- patch: Update typer.
10+
- patch: cleaned up logging to follow python standards.
11+
412
## 2.3.0
513

614
- minor: Remove Dynamic Max Retries support. Refactor typer typing in cli interface

README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
# GraphRAG
22

3-
👉 [Use the GraphRAG Accelerator solution](https://github.com/Azure-Samples/graphrag-accelerator) <br/>
43
👉 [Microsoft Research Blog Post](https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/)<br/>
54
👉 [Read the docs](https://microsoft.github.io/graphrag)<br/>
65
👉 [GraphRAG Arxiv](https://arxiv.org/pdf/2404.16130)
@@ -28,7 +27,7 @@ To learn more about GraphRAG and how it can be used to enhance your LLM's abilit
2827

2928
## Quickstart
3029

31-
To get started with the GraphRAG system we recommend trying the [Solution Accelerator](https://github.com/Azure-Samples/graphrag-accelerator) package. This provides a user-friendly end-to-end experience with Azure resources.
30+
To get started with the GraphRAG system we recommend trying the [command line quickstart](https://microsoft.github.io/graphrag/get_started/).
3231

3332
## Repository Guidance
3433

breaking-changes.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,12 @@ There are five surface areas that may be impacted on any given release. They are
1212

1313
> TL;DR: Always run `graphrag init --path [path] --force` between minor version bumps to ensure you have the latest config format. Run the provided migration notebook between major version bumps if you want to avoid re-indexing prior datasets. Note that this will overwrite your configuration and prompts, so backup if necessary.
1414
15+
# v2
16+
17+
Run the [migration notebook](./docs/examples_notebooks/index_migration_to_v2.ipynb) to convert older tables to the v2 format.
18+
19+
The v2 release renamed all of our index tables to simply name the items each table contains. The previous naming was a leftover requirement of our use of DataShaper, which is no longer necessary.
20+
1521
# v1
1622

1723
Run the [migration notebook](./docs/examples_notebooks/index_migration_to_v1.ipynb) to convert older tables to the v1 format.
@@ -27,7 +33,7 @@ All of the breaking changes listed below are accounted for in the four steps abo
2733
- Alignment of fields from `create_final_entities` (such as name -> title) with `create_final_nodes`, and removal of redundant content across these tables
2834
- Rename of `document.raw_content` to `document.text`
2935
- Rename of `entity.name` to `entity.title`
30-
- Rename `rank` to `combined_degree` in `create_final_relationships` and removal of `source_degree` and `target_degree`fields
36+
- Rename `rank` to `combined_degree` in `create_final_relationships` and removal of `source_degree` and `target_degree` fields
3137
- Fixed community tables to use a proper UUID for the `id` field, and retain `community` and `human_readable_id` for the short IDs
3238
- Removal of all embeddings columns from parquet files in favor of direct vector store writes
3339

dictionary.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ itertuples
102102
isin
103103
nocache
104104
nbconvert
105+
levelno
105106

106107
# HTML
107108
nbsp
@@ -186,6 +187,7 @@ Verdantis's
186187
# English
187188
skippable
188189
upvote
190+
unconfigured
189191

190192
# Misc
191193
Arxiv

docs/config/env_vars.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ As of version 1.3, GraphRAG no longer supports a full complement of pre-built en
44

55
The only standard environment variable we expect, and include in the default settings.yml, is `GRAPHRAG_API_KEY`. If you are already using a number of the previous GRAPHRAG_* environment variables, you can insert them with template syntax into settings.yml and they will be adopted.
66

7-
> **The environment variables below are documented as an aid for migration, but they WILL NOT be read unless you use template syntax in your settings.yml.**
7+
> **The environment variables below are documented as an aid for migration, but they WILL NOT be read unless you use template syntax in your settings.yml. We also WILL NOT be updating this page as the main config object changes.**
88
99
---
1010

@@ -178,11 +178,11 @@ This section controls the cache mechanism used by the pipeline. This is used to
178178

179179
### Reporting
180180

181-
This section controls the reporting mechanism used by the pipeline, for common events and error messages. The default is to write reports to a file in the output directory. However, you can also choose to write reports to the console or to an Azure Blob Storage container.
181+
This section controls the reporting mechanism used by the pipeline, for common events and error messages. The default is to write reports to a file in the output directory. However, you can also choose to write reports to an Azure Blob Storage container.
182182

183183
| Parameter | Description | Type | Required or Optional | Default |
184184
| --------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- | -------------------- | ------- |
185-
| `GRAPHRAG_REPORTING_TYPE` | The type of reporter to use. Options are `file`, `console`, or `blob` | `str` | optional | `file` |
185+
| `GRAPHRAG_REPORTING_TYPE` | The type of reporter to use. Options are `file` or `blob` | `str` | optional | `file` |
186186
| `GRAPHRAG_REPORTING_STORAGE_ACCOUNT_BLOB_URL` | The Azure Storage blob endpoint to use when in `blob` mode and using managed identity. Will have the format `https://<storage_account_name>.blob.core.windows.net` | `str` | optional | None |
187187
| `GRAPHRAG_REPORTING_CONNECTION_STRING` | The Azure Storage connection string to use when in `blob` mode. | `str` | optional | None |
188188
| `GRAPHRAG_REPORTING_CONTAINER_NAME` | The Azure Storage container name to use when in `blob` mode. | `str` | optional | None |

docs/config/yaml.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ models:
4040
#### Fields
4141
4242
- `api_key` **str** - The OpenAI API key to use.
43-
- `auth_type` **api_key|managed_identity** - Indicate how you want to authenticate requests.
43+
- `auth_type` **api_key|azure_managed_identity** - Indicate how you want to authenticate requests.
4444
- `type` **openai_chat|azure_openai_chat|openai_embedding|azure_openai_embedding|mock_chat|mock_embeddings** - The type of LLM to use.
4545
- `model` **str** - The model name.
4646
- `encoding_model` **str** - The text encoding model to use. Default is to use the encoding model aligned with the language model (i.e., it is retrieved from tiktoken if unset).
@@ -73,16 +73,18 @@ models:
7373

7474
### input
7575

76-
Our pipeline can ingest .csv, .txt, or .json data from an input folder. See the [inputs page](../index/inputs.md) for more details and examples.
76+
Our pipeline can ingest .csv, .txt, or .json data from an input location. See the [inputs page](../index/inputs.md) for more details and examples.
7777

7878
#### Fields
7979

80-
- `type` **file|blob** - The input type to use. Default=`file`
80+
- `storage` **StorageConfig**
81+
- `type` **file|blob|cosmosdb** - The storage type to use. Default=`file`
82+
- `base_dir` **str** - The base directory to write output artifacts to, relative to the root.
83+
- `connection_string` **str** - (blob/cosmosdb only) The Azure Storage connection string.
84+
- `container_name` **str** - (blob/cosmosdb only) The Azure Storage container name.
85+
- `storage_account_blob_url` **str** - (blob only) The storage account blob URL to use.
86+
- `cosmosdb_account_blob_url` **str** - (cosmosdb only) The CosmosDB account blob URL to use.
8187
- `file_type` **text|csv|json** - The type of input data to load. Default is `text`
82-
- `base_dir` **str** - The base directory to read input from, relative to the root.
83-
- `connection_string` **str** - (blob only) The Azure Storage connection string.
84-
- `storage_account_blob_url` **str** - The storage account blob URL to use.
85-
- `container_name` **str** - (blob only) The Azure Storage container name.
8688
- `encoding` **str** - The encoding of the input file. Default is `utf-8`
8789
- `file_pattern` **str** - A regex to match input files. Default is `.*\.csv$`, `.*\.txt$`, or `.*\.json$` depending on the specified `file_type`, but you can customize it if needed.
8890
- `file_filter` **dict** - Key/value pairs to filter. Default is None.
@@ -147,11 +149,11 @@ This section controls the cache mechanism used by the pipeline. This is used to
147149

148150
### reporting
149151

150-
This section controls the reporting mechanism used by the pipeline, for common events and error messages. The default is to write reports to a file in the output directory. However, you can also choose to write reports to the console or to an Azure Blob Storage container.
152+
This section controls the reporting mechanism used by the pipeline, for common events and error messages. The default is to write reports to a file in the output directory. However, you can also choose to write reports to an Azure Blob Storage container.
151153

152154
#### Fields
153155

154-
- `type` **file|console|blob** - The reporting type to use. Default=`file`
156+
- `type` **file|blob** - The reporting type to use. Default=`file`
155157
- `base_dir` **str** - The base directory to write reports to, relative to the root.
156158
- `connection_string` **str** - (blob only) The Azure Storage connection string.
157159
- `container_name` **str** - (blob only) The Azure Storage container name.

docs/get_started.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@
66

77
To get started with the GraphRAG system, you have a few options:
88

9-
👉 [Use the GraphRAG Accelerator solution](https://github.com/Azure-Samples/graphrag-accelerator) <br/>
109
👉 [Install from pypi](https://pypi.org/project/graphrag/). <br/>
1110
👉 [Use it from source](developing.md)<br/>
1211

docs/index.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
# Welcome to GraphRAG
22

33
👉 [Microsoft Research Blog Post](https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/) <br/>
4-
👉 [GraphRAG Accelerator](https://github.com/Azure-Samples/graphrag-accelerator) <br/>
54
👉 [GraphRAG Arxiv](https://arxiv.org/pdf/2404.16130)
65

76
<p align="center">
@@ -16,10 +15,6 @@ approaches using plain text snippets. The GraphRAG process involves extracting a
1615

1716
To learn more about GraphRAG and how it can be used to enhance your language model's ability to reason about your private data, please visit the [Microsoft Research Blog Post](https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/).
1817

19-
## Solution Accelerator 🚀
20-
21-
To quickstart the GraphRAG system we recommend trying the [Solution Accelerator](https://github.com/Azure-Samples/graphrag-accelerator) package. This provides a user-friendly end-to-end experience with Azure resources.
22-
2318
## Get Started with GraphRAG 🚀
2419

2520
To start using GraphRAG, check out the [_Get Started_](get_started.md) guide.

0 commit comments

Comments
 (0)