|
| 1 | +# Indexing, Querying, and Prompt Tuning in GraphRAG for .NET |
| 2 | + |
| 3 | +GraphRAG for .NET keeps feature parity with the Python reference project described in the [Microsoft Research blog](https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/) and the [GraphRAG paper](https://arxiv.org/pdf/2404.16130). This document explains how the .NET workflows map to the concepts documented on [microsoft.github.io/graphrag](https://microsoft.github.io/graphrag/), highlights the supported query modes, and shows how to customise prompts via manual or auto tuning outputs. |
| 4 | + |
| 5 | +## Indexing Architecture |
| 6 | + |
| 7 | +- **Workflow parity.** Each indexing stage matches the Python pipeline and the [default data flow](https://microsoft.github.io/graphrag/index/default_dataflow/): |
| 8 | + - `load_input_documents` → `create_base_text_units` → `summarize_descriptions` |
| 9 | + - `extract_graph` persists `entities` and `relationships` |
| 10 | + - `create_communities` produces `communities` |
| 11 | + - `community_summaries` writes `community_reports` |
| 12 | + - `extract_covariates` stores `covariates` |
| 13 | +- **Storage schema.** Tables share the column layout described under [index outputs](https://microsoft.github.io/graphrag/index/outputs/). The new strongly-typed records (`CommunityRecord`, `CovariateRecord`, etc.) mirror the JSON representation used by the Python implementation. |
| 14 | +- **Cluster configuration.** `GraphRagConfig.ClusterGraph` exposes the same knobs as the Python `cluster_graph` settings, enabling largest-component filtering and deterministic seeding. |
| 15 | + |
| 16 | +## Language Model Registration |
| 17 | + |
| 18 | +Workflows resolve language models from the DI container via [Microsoft.Extensions.AI](https://learn.microsoft.com/dotnet/ai/overview). Register keyed services for every `ModelId` you plan to reference: |
| 19 | + |
| 20 | +```csharp |
| 21 | +using Azure; |
| 22 | +using Azure.AI.OpenAI; |
| 23 | +using GraphRag.Config; |
| 24 | +using Microsoft.Extensions.AI; |
| 25 | + |
| 26 | +var openAi = new OpenAIClient(new Uri(endpoint), new AzureKeyCredential(key)); |
| 27 | +const string chatModelId = "chat_model"; |
| 28 | +const string embeddingModelId = "embedding_model"; |
| 29 | + |
| 30 | +services.AddKeyedSingleton<IChatClient>(chatModelId, _ => openAi.GetChatClient(chatDeployment)); |
| 31 | +services.AddKeyedSingleton<IEmbeddingGenerator<string, Embedding>>(embeddingModelId, _ => openAi.GetEmbeddingClient(embeddingDeployment)); |
| 32 | +``` |
| 33 | + |
| 34 | +Configure retries, rate limits, and logging when you construct the concrete clients. `GraphRagConfig.Models` simply records the set of registered keys so configuration overrides can validate references. |
| 35 | + |
| 36 | +## Pipeline Cache |
| 37 | + |
| 38 | +`IPipelineCache` is intentionally infrastructure-neutral. To mirror ASP.NET Core's in-memory behaviour, register the built-in cache services alongside the provided adapter: |
| 39 | + |
| 40 | +```csharp |
| 41 | +services.AddMemoryCache(); |
| 42 | +services.AddSingleton<IPipelineCache, MemoryPipelineCache>(); |
| 43 | +``` |
| 44 | + |
| 45 | +Need Redis or something else? Implement `IPipelineCache` yourself and register it through DI; the pipeline will automatically consume your custom cache. |
| 46 | + |
| 47 | +## Query Capabilities |
| 48 | + |
| 49 | +The query layer ports the orchestrators documented in the [GraphRAG query overview](https://microsoft.github.io/graphrag/query/overview/): |
| 50 | + |
| 51 | +- **Global search** ([docs](https://microsoft.github.io/graphrag/query/global_search/)) traverses community summaries and graph context to craft answers spanning the corpus. |
| 52 | +- **Local search** ([docs](https://microsoft.github.io/graphrag/query/local_search/)) anchors on a document neighbourhood when you need focused context. |
| 53 | +- **Drift search** ([docs](https://microsoft.github.io/graphrag/query/drift_search/)) monitors narrative changes across time slices. |
| 54 | +- **Question generation** ([docs](https://microsoft.github.io/graphrag/query/question_generation/)) produces follow-up questions to extend an investigation. |
| 55 | + |
| 56 | +Every orchestrator consumes the same indexed tables as the Python project, so the .NET stack interoperates with BYOG scenarios described in the [index architecture guide](https://microsoft.github.io/graphrag/index/architecture/). |
| 57 | + |
| 58 | +## Prompt Tuning |
| 59 | + |
| 60 | +Manual and auto prompt tuning are both available without code changes: |
| 61 | + |
| 62 | +1. **Manual overrides** follow the rules from [manual prompt tuning](https://microsoft.github.io/graphrag/prompt_tuning/manual_prompt_tuning/). |
| 63 | + - Place custom templates under a directory referenced by `GraphRagConfig.PromptTuning.Manual.Directory` and set `Enabled = true`. |
| 64 | + - Filenames follow the stage key pattern `section/workflow/kind.txt` (see table below). |
| 65 | +2. **Auto tuning** integrates the outputs documented in [auto prompt tuning](https://microsoft.github.io/graphrag/prompt_tuning/auto_prompt_tuning/). |
| 66 | + - Point `GraphRagConfig.PromptTuning.Auto.Directory` at the folder containing the generated prompts and set `Enabled = true`. |
| 67 | + - The runtime prefers explicit paths from workflow configs, then manual overrides, then auto-tuned files, and finally the built-in defaults in `prompts/`. |
| 68 | +3. **Inline overrides** can be injected directly from code: set `ExtractGraphConfig.SystemPrompt`, `ExtractGraphConfig.Prompt`, or the equivalent properties to either a multi-line string or a value prefixed with `inline:`. Inline values bypass template file lookups and are used as-is. |
| 69 | + |
| 70 | +### Stage Keys and Placeholders |
| 71 | + |
| 72 | +| Workflow | Stage key | Purpose | Supported placeholders | |
| 73 | +|----------|-----------|---------|------------------------| |
| 74 | +| `extract_graph` (system) | `index/extract_graph/system.txt` | System prompt that instructs the extractor. | _N/A_ | |
| 75 | +| `extract_graph` (user) | `index/extract_graph/user.txt` | User prompt template for individual text units. | `{{max_entities}}`, `{{text}}` | |
| 76 | +| `community_summaries` (system) | `index/community_reports/system.txt` | System guidance for cluster summarisation. | _N/A_ | |
| 77 | +| `community_summaries` (user) | `index/community_reports/user.txt` | User prompt template for entity lists. | `{{max_length}}`, `{{entities}}` | |
| 78 | + |
| 79 | +Placeholders are replaced at runtime with values drawn from workflow configuration: |
| 80 | + |
| 81 | +- `{{max_entities}}` → `ExtractGraphConfig.EntityTypes.Count + 5` (minimum 1) |
| 82 | +- `{{text}}` → the original text unit content |
| 83 | +- `{{max_length}}` → `CommunityReportsConfig.MaxLength` |
| 84 | +- `{{entities}}` → bullet list of entity titles and descriptions |
| 85 | + |
| 86 | +If a template is omitted, the runtime falls back to the built-in prompts defined in `GraphRagPromptLibrary`. |
| 87 | + |
| 88 | +## Integration Tests |
| 89 | + |
| 90 | +`tests/ManagedCode.GraphRag.Tests/Integration/CommunitySummariesIntegrationTests.cs` exercises the new prompt loader end-to-end using the file-backed pipeline storage. Combined with the existing Aspire-powered suites, the tests demonstrate how indexing, community detection, and summarisation behave with tuned prompts while remaining faithful to the [GraphRAG BYOG guidance](https://microsoft.github.io/graphrag/index/byog/). |
| 91 | + |
| 92 | +## Further Reading |
| 93 | + |
| 94 | +- [GraphRAG prompt tuning overview](https://microsoft.github.io/graphrag/prompt_tuning/overview/) |
| 95 | +- [GraphRAG index methods](https://microsoft.github.io/graphrag/index/methods/) |
| 96 | +- [GraphRAG query overview](https://microsoft.github.io/graphrag/query/overview/) |
| 97 | +- [GraphRAG default dataflow](https://microsoft.github.io/graphrag/index/default_dataflow/) |
| 98 | + |
| 99 | +These resources underpin the .NET implementation and provide broader context for customising or extending the library. |
0 commit comments