microsoft
diff --git a/‎docs/config/env_vars.md‎
Lines changed: 0 additions & 219 deletions b/‎docs/config/env_vars.md‎
Lines changed: 0 additions & 219 deletions
diff --git a/‎docs/config/models.md‎
Lines changed: 5 additions & 6 deletions b/‎docs/config/models.md‎
Lines changed: 5 additions & 6 deletions
diff --git a/‎docs/config/yaml.md‎
Lines changed: 5 additions & 3 deletions b/‎docs/config/yaml.md‎
Lines changed: 5 additions & 3 deletions
diff --git a/‎docs/examples_notebooks/custom_vector_store.ipynb‎
Lines changed: 3 additions & 20 deletions b/‎docs/examples_notebooks/custom_vector_store.ipynb‎
Lines changed: 3 additions & 20 deletions
diff --git a/‎docs/get_started.md‎
Lines changed: 4 additions & 2 deletions b/‎docs/get_started.md‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎graphrag/config/enums.py‎
Lines changed: 0 additions & 4 deletions b/‎graphrag/config/enums.py‎
Lines changed: 0 additions & 4 deletions
diff --git a/‎graphrag/config/models/language_model_config.py‎
Lines changed: 14 additions & 75 deletions b/‎graphrag/config/models/language_model_config.py‎
Lines changed: 14 additions & 75 deletions
diff --git a/‎graphrag/index/operations/chunk_text/strategies.py‎
Lines changed: 4 additions & 21 deletions b/‎graphrag/index/operations/chunk_text/strategies.py‎
Lines changed: 4 additions & 21 deletions
@@ -6,9 +6,7 @@ This page contains information on selecting a model to use and options to supply
 
 GraphRAG was built and tested using OpenAI models, so this is the default model set we support. This is not intended to be a limiter or statement of quality or fitness for your use case, only that it's the set we are most familiar with for prompting, tuning, and debugging.
 
-GraphRAG also utilizes a language model wrapper library used by several projects within our team, called fnllm. fnllm provides two important functions for GraphRAG: rate limiting configuration to help us maximize throughput for large indexing jobs, and robust caching of API calls to minimize consumption on repeated indexes for testing, experimentation, or incremental ingest. fnllm uses the OpenAI Python SDK under the covers, so OpenAI-compliant endpoints are a base requirement out-of-the-box.
-
-Starting with version 2.6.0, GraphRAG supports using [LiteLLM](https://docs.litellm.ai/) instead of fnllm for calling language models. LiteLLM provides support for 100+ models though it is important to note that when choosing a model it must support returning [structured outputs](https://openai.com/index/introducing-structured-outputs-in-the-api/) adhering to a [JSON schema](https://docs.litellm.ai/docs/completion/json_mode). 
+Starting with version 2.6.0, GraphRAG supports using [LiteLLM](https://docs.litellm.ai/) for calling language models. LiteLLM provides support for 100+ models though it is important to note that when choosing a model it must support returning [structured outputs](https://openai.com/index/introducing-structured-outputs-in-the-api/) adhering to a [JSON schema](https://docs.litellm.ai/docs/completion/json_mode). 
 
 Example using LiteLLm as the language model tool for GraphRAG:
 
@@ -54,13 +52,15 @@ Example config with asymmetric model use:
 models:
   extraction_chat_model:
     api_key: ${GRAPHRAG_API_KEY}
-    type: openai_chat
+    type: chat
+    model_provider: openai
     auth_type: api_key
     model: gpt-4o
     model_supports_json: true
   query_chat_model:
     api_key: ${GRAPHRAG_API_KEY}
-    type: openai_chat
+    type: chat
+    model_provider: openai
     auth_type: api_key
     model: o1
     model_supports_json: true
@@ -98,7 +98,6 @@ Many users have used platforms such as [ollama](https://ollama.com/) and [LiteLL
 As of GraphRAG 2.0.0, we support model injection through the use of a standard chat and embedding Protocol and an accompanying ModelFactory that you can use to register your model implementation. This is not supported with the CLI, so you'll need to use GraphRAG as a library.
 
 - Our Protocol is [defined here](https://github.com/microsoft/graphrag/blob/main/graphrag/language_model/protocol/base.py)
-- Our base implementation, which wraps fnllm, [is here](https://github.com/microsoft/graphrag/blob/main/graphrag/language_model/providers/fnllm/models.py)
 - We have a simple mock implementation in our tests that you can [reference here](https://github.com/microsoft/graphrag/blob/main/tests/mock_provider.py)
 
 Once you have a model implementation, you need to register it with our ModelFactory:
 
@@ -28,20 +28,22 @@ For example:
 models:
   default_chat_model:
     api_key: ${GRAPHRAG_API_KEY}
-    type: openai_chat
+    type: chat
+    model_provider: openai
     model: gpt-4.1
     model_supports_json: true
   default_embedding_model:
     api_key: ${GRAPHRAG_API_KEY}
-    type: openai_embedding
+    type: embedding
+    model_provider: openai
     model: text-embedding-3-large
 ```
 
 #### Fields
 
 - `api_key` **str** - The OpenAI API key to use.
 - `auth_type` **api_key|azure_managed_identity** - Indicate how you want to authenticate requests.
-- `type` **chat**|**embedding**|**openai_chat|azure_openai_chat|openai_embedding|azure_openai_embedding|mock_chat|mock_embeddings** - The type of LLM to use.
+- `type` **chat**|**embedding**|mock_chat|mock_embeddings** - The type of LLM to use.
 - `model_provider` **str|None** - The model provider to use, e.g., openai, azure, anthropic, etc. Required when `type == chat|embedding`. When `type == chat|embedding`, [LiteLLM](https://docs.litellm.ai/) is used under the hood which has support for calling 100+ models. [View LiteLLm basic usage](https://docs.litellm.ai/docs/#basic-usage) for details on how models are called (The `model_provider` is the portion prior to `/` while the `model` is the portion following the `/`). [View Language Model Selection](models.md) for more details and examples on using LiteLLM.
 - `model` **str** - The model name.
 - `encoding_model` **str** - The text encoding model to use. Default is to use the encoding model aligned with the language model (i.e., it is retrieved from tiktoken if unset).
 
@@ -318,27 +318,19 @@
     "sample_documents = [\n",
     "    VectorStoreDocument(\n",
     "        id=\"doc_1\",\n",
-    "        text=\"GraphRAG is a powerful knowledge graph extraction and reasoning framework.\",\n",
     "        vector=create_mock_embedding(),\n",
-    "        attributes={\"category\": \"technology\", \"source\": \"documentation\"},\n",
     "    ),\n",
     "    VectorStoreDocument(\n",
     "        id=\"doc_2\",\n",
-    "        text=\"Vector stores enable efficient similarity search over high-dimensional data.\",\n",
     "        vector=create_mock_embedding(),\n",
-    "        attributes={\"category\": \"technology\", \"source\": \"research\"},\n",
     "    ),\n",
     "    VectorStoreDocument(\n",
     "        id=\"doc_3\",\n",
-    "        text=\"Machine learning models can process and understand natural language text.\",\n",
     "        vector=create_mock_embedding(),\n",
-    "        attributes={\"category\": \"AI\", \"source\": \"article\"},\n",
     "    ),\n",
     "    VectorStoreDocument(\n",
     "        id=\"doc_4\",\n",
-    "        text=\"Custom implementations allow for specialized behavior and integration.\",\n",
     "        vector=create_mock_embedding(),\n",
-    "        attributes={\"category\": \"development\", \"source\": \"tutorial\"},\n",
     "    ),\n",
     "]\n",
     "\n",
@@ -395,9 +387,7 @@
     "for i, result in enumerate(search_results, 1):\n",
     "    doc = result.document\n",
     "    print(f\"{i}. ID: {doc.id}\")\n",
-    "    print(f\"   Text: {doc.text[:60]}...\")\n",
     "    print(f\"   Similarity Score: {result.score:.4f}\")\n",
-    "    print(f\"   Category: {doc.attributes.get('category', 'N/A')}\")\n",
     "    print()"
    ]
   },
@@ -412,14 +402,8 @@
     "    found_doc = vector_store.search_by_id(\"doc_2\")\n",
     "    print(\"✅ Found document by ID:\")\n",
     "    print(f\"   ID: {found_doc.id}\")\n",
-    "    print(f\"   Text: {found_doc.text}\")\n",
-    "    print(f\"   Attributes: {found_doc.attributes}\")\n",
     "except KeyError as e:\n",
-    "    print(f\"❌ Error: {e}\")\n",
-    "\n",
-    "# Test filter by ID\n",
-    "id_filter = vector_store.filter_by_id([\"doc_1\", \"doc_3\"])\n",
-    "print(f\"\\n🔧 ID filter result: {id_filter}\")"
+    "    print(f\"❌ Error: {e}\")"
    ]
   },
   {
@@ -450,7 +434,8 @@
     "    # Other GraphRAG configuration...\n",
     "    \"models\": {\n",
     "        \"default_embedding_model\": {\n",
-    "            \"type\": \"openai_embedding\",\n",
+    "            \"type\": \"embedding\",\n",
+    "            \"model_provider\": \"openai\",\n",
     "            \"model\": \"text-embedding-3-small\",\n",
     "        }\n",
     "    },\n",
@@ -500,9 +485,7 @@
     "    entity_documents = [\n",
     "        VectorStoreDocument(\n",
     "            id=f\"entity_{i}\",\n",
-    "            text=f\"Entity {i} description: Important concept in the knowledge graph\",\n",
     "            vector=create_mock_embedding(),\n",
-    "            attributes={\"type\": \"entity\", \"importance\": i % 3 + 1},\n",
     "        )\n",
     "        for i in range(10)\n",
     "    ]\n",
 
@@ -60,12 +60,14 @@ If running in OpenAI mode, you only need to update the value of `GRAPHRAG_API_KE
 In addition to setting your API key, Azure OpenAI users should set the variables below in the settings.yaml file. To find the appropriate sections, just search for the `models:` root configuration; you should see two sections, one for the default chat endpoint and one for the default embeddings endpoint. Here is an example of what to add to the chat model config:
 
 ```yaml
-type: azure_openai_chat # Or azure_openai_embedding for embeddings
+type: chat
+model_provider: azure
 api_base: https://<instance>.openai.azure.com
 api_version: 2024-02-15-preview # You can customize this for other versions
-deployment_name: <azure_model_deployment_name>
 ```
 
+Most people tend to name their deployments the same as their model - if yours are different, add the `deployment_name` as well.
+
 #### Using Managed Auth on Azure
 To use managed auth, edit the auth_type in your model config and *remove* the api_key line:
 
 
@@ -84,13 +84,9 @@ class ModelType(str, Enum):
     """LLMType enum class definition."""
 
     # Embeddings
-    OpenAIEmbedding = "openai_embedding"
-    AzureOpenAIEmbedding = "azure_openai_embedding"
     Embedding = "embedding"
 
     # Chat Completion
-    OpenAIChat = "openai_chat"
-    AzureOpenAIChat = "azure_openai_chat"
     Chat = "chat"
 
     # Debug
 
@@ -4,9 +4,7 @@
 """Language model configuration."""
 
 import logging
-from typing import Literal
 
-import tiktoken
 from pydantic import BaseModel, Field, model_validator
 
 from graphrag.config.defaults import language_model_defaults
@@ -77,9 +75,7 @@ def _validate_auth_type(self) -> None:
         """
         if (
             self.auth_type == AuthType.AzureManagedIdentity
-            and self.type != ModelType.AzureOpenAIChat
-            and self.type != ModelType.AzureOpenAIEmbedding
-            and self.model_provider != "azure"  # indicates Litellm + AOI
+            and self.model_provider != "azure"
         ):
             msg = f"auth_type of azure_managed_identity is not supported for model type {self.type}. Please rerun `graphrag init` and set the auth_type to api_key."
             raise ConflictingSettingsError(msg)
@@ -98,14 +94,6 @@ def _validate_type(self) -> None:
         if not ModelFactory.is_supported_model(self.type):
             msg = f"Model type {self.type} is not recognized, must be one of {ModelFactory.get_chat_models() + ModelFactory.get_embedding_models()}."
             raise KeyError(msg)
-        if self.type in [
-            "openai_chat",
-            "openai_embedding",
-            "azure_openai_chat",
-            "azure_openai_embedding",
-        ]:
-            msg = f"Model config based on fnllm is deprecated and will be removed in GraphRAG v3, please use {ModelType.Chat} or {ModelType.Embedding} instead to switch to LiteLLM config."
-            logger.warning(msg)
 
     model_provider: str | None = Field(
         description="The model provider to use.",
@@ -134,32 +122,6 @@ def _validate_model_provider(self) -> None:
         default=language_model_defaults.encoding_model,
     )
 
-    def _validate_encoding_model(self) -> None:
-        """Validate the encoding model.
-
-        The default behavior is to use an encoding model that matches the LLM model.
-        LiteLLM supports 100+ models and their tokenization. There is no need to
-        set the encoding model when using the new LiteLLM provider as was done with fnllm provider.
-
-        Users can still manually specify a tiktoken based encoding model to use even with the LiteLLM provider
-        in which case the specified encoding model will be used regardless of the LLM model being used, even if
-        it is not an openai based model.
-
-        If not using LiteLLM provider, set the encoding model based on the LLM model name.
-        This is for backward compatibility with existing fnllm provider until fnllm is removed.
-
-        Raises
-        ------
-        KeyError
-            If the model name is not recognized.
-        """
-        if (
-            self.type != ModelType.Chat
-            and self.type != ModelType.Embedding
-            and self.encoding_model.strip() == ""
-        ):
-            self.encoding_model = tiktoken.encoding_name_for_model(self.model)
-
     api_base: str | None = Field(
         description="The base URL for the LLM API.",
         default=language_model_defaults.api_base,
@@ -175,11 +137,9 @@ def _validate_api_base(self) -> None:
         AzureApiBaseMissingError
             If the API base is missing and is required.
         """
-        if (
-            self.type == ModelType.AzureOpenAIChat
-            or self.type == ModelType.AzureOpenAIEmbedding
-            or self.model_provider == "azure"  # indicates Litellm + AOI
-        ) and (self.api_base is None or self.api_base.strip() == ""):
+        if (self.model_provider == "azure") and (
+            self.api_base is None or self.api_base.strip() == ""
+        ):
             raise AzureApiBaseMissingError(self.type)
 
     api_version: str | None = Field(
@@ -197,11 +157,9 @@ def _validate_api_version(self) -> None:
         AzureApiBaseMissingError
             If the API base is missing and is required.
         """
-        if (
-            self.type == ModelType.AzureOpenAIChat
-            or self.type == ModelType.AzureOpenAIEmbedding
-            or self.model_provider == "azure"  # indicates Litellm + AOI
-        ) and (self.api_version is None or self.api_version.strip() == ""):
+        if (self.model_provider == "azure") and (
+            self.api_version is None or self.api_version.strip() == ""
+        ):
             raise AzureApiVersionMissingError(self.type)
 
     deployment_name: str | None = Field(
@@ -219,11 +177,9 @@ def _validate_deployment_name(self) -> None:
         AzureDeploymentNameMissingError
             If the deployment name is missing and is required.
         """
-        if (
-            self.type == ModelType.AzureOpenAIChat
-            or self.type == ModelType.AzureOpenAIEmbedding
-            or self.model_provider == "azure"  # indicates Litellm + AOI
-        ) and (self.deployment_name is None or self.deployment_name.strip() == ""):
+        if (self.model_provider == "azure") and (
+            self.deployment_name is None or self.deployment_name.strip() == ""
+        ):
             msg = f"deployment_name is not set for Azure-hosted model. This will default to your model name ({self.model}). If different, this should be set."
             logger.debug(msg)
 
@@ -247,7 +203,7 @@ def _validate_deployment_name(self) -> None:
         description="The request timeout to use.",
         default=language_model_defaults.request_timeout,
     )
-    tokens_per_minute: int | Literal["auto"] | None = Field(
+    tokens_per_minute: int | None = Field(
         description="The number of tokens per minute to use for the LLM service.",
         default=language_model_defaults.tokens_per_minute,
     )
@@ -262,18 +218,10 @@ def _validate_tokens_per_minute(self) -> None:
         """
         # If the value is a number, check if it is less than 1
         if isinstance(self.tokens_per_minute, int) and self.tokens_per_minute < 1:
-            msg = f"Tokens per minute must be a non zero positive number, 'auto' or null. Suggested value: {language_model_defaults.tokens_per_minute}."
+            msg = f"Tokens per minute must be a non zero positive number or null. Suggested value: {language_model_defaults.tokens_per_minute}."
             raise ValueError(msg)
 
-        if (
-            (self.type == ModelType.Chat or self.type == ModelType.Embedding)
-            and self.rate_limit_strategy is not None
-            and self.tokens_per_minute == "auto"
-        ):
-            msg = f"tokens_per_minute cannot be set to 'auto' when using type '{self.type}'. Please set it to a positive integer or null to disable."
-            raise ValueError(msg)
-
-    requests_per_minute: int | Literal["auto"] | None = Field(
+    requests_per_minute: int | None = Field(
         description="The number of requests per minute to use for the LLM service.",
         default=language_model_defaults.requests_per_minute,
     )
@@ -288,15 +236,7 @@ def _validate_requests_per_minute(self) -> None:
         """
         # If the value is a number, check if it is less than 1
         if isinstance(self.requests_per_minute, int) and self.requests_per_minute < 1:
-            msg = f"Requests per minute must be a non zero positive number, 'auto' or null. Suggested value: {language_model_defaults.requests_per_minute}."
-            raise ValueError(msg)
-
-        if (
-            (self.type == ModelType.Chat or self.type == ModelType.Embedding)
-            and self.rate_limit_strategy is not None
-            and self.requests_per_minute == "auto"
-        ):
-            msg = f"requests_per_minute cannot be set to 'auto' when using type '{self.type}'. Please set it to a positive integer or null to disable."
+            msg = f"Requests per minute must be a non zero positive number or null. Suggested value: {language_model_defaults.requests_per_minute}."
             raise ValueError(msg)
 
     rate_limit_strategy: str | None = Field(
@@ -399,5 +339,4 @@ def _validate_model(self):
         self._validate_requests_per_minute()
         self._validate_max_retries()
         self._validate_azure_settings()
-        self._validate_encoding_model()
         return self
@@ -6,7 +6,6 @@
 from collections.abc import Iterable
 
 import nltk
-import tiktoken
 
 from graphrag.config.models.chunking_config import ChunkingConfig
 from graphrag.index.operations.chunk_text.typing import TextChunk
@@ -15,21 +14,7 @@
     split_multiple_texts_on_tokens,
 )
 from graphrag.logger.progress import ProgressTicker
-
-
-def get_encoding_fn(encoding_name):
-    """Get the encoding model."""
-    enc = tiktoken.get_encoding(encoding_name)
-
-    def encode(text: str) -> list[int]:
-        if not isinstance(text, str):
-            text = f"{text}"
-        return enc.encode(text)
-
-    def decode(tokens: list[int]) -> str:
-        return enc.decode(tokens)
-
-    return encode, decode
+from graphrag.tokenizer.get_tokenizer import get_tokenizer
 
 
 def run_tokens(
@@ -40,16 +25,14 @@ def run_tokens(
     """Chunks text into chunks based on encoding tokens."""
     tokens_per_chunk = config.size
     chunk_overlap = config.overlap
-    encoding_name = config.encoding_model
-
-    encode, decode = get_encoding_fn(encoding_name)
+    tokenizer = get_tokenizer(encoding_model=config.encoding_model)
     return split_multiple_texts_on_tokens(
         input,
         TokenChunkerOptions(
             chunk_overlap=chunk_overlap,
             tokens_per_chunk=tokens_per_chunk,
-            encode=encode,
-            decode=decode,
+            encode=tokenizer.encode,
+            decode=tokenizer.decode,
         ),
         tick,
     )