diff --git a/site/content/ai-suite/graphrag/web-interface.md b/site/content/ai-suite/graphrag/web-interface.md
index 7438127d6f..6caa04b177 100644
--- a/site/content/ai-suite/graphrag/web-interface.md
+++ b/site/content/ai-suite/graphrag/web-interface.md
@@ -159,10 +159,10 @@ See also the [GraphRAG Retriever](../reference/retriever.md) documentation.
 ## Chat with your Knowledge Graph
 
 The Retriever service provides two search methods:
-- [Local search](../reference/retriever.md#local-search): Local queries let you
-  explore specific nodes and their direct connections.
-- [Global search](../reference/retriever.md#global-search): Global queries uncover
-  broader patters and relationships across the entire Knowledge Graph.
+- [Instant search](../reference/retriever.md#instant-search): Instant
+  queries provide fast responses.
+- [Deep search](../reference/retriever.md#deep-search): This option will take
+  longer to return a response.
 
 ![Chat with your Knowledge Graph](../../images/graphrag-ui-chat.png)
 
diff --git a/site/content/ai-suite/reference/gen-ai.md b/site/content/ai-suite/reference/gen-ai.md
index f545a7e255..078b3037db 100644
--- a/site/content/ai-suite/reference/gen-ai.md
+++ b/site/content/ai-suite/reference/gen-ai.md
@@ -33,22 +33,15 @@ in the platform. All services support the `profiles` field, which you can use
 to define the profile to use for the service. For example, you can define a
 GPU profile that enables the service to run an LLM on GPU resources.
 
-## LLM Host Service Creation Request Body
+## Service Creation Request Body
 
-```json
-{
-    "env": {
-        "model_name": "<registered_model_name>"
-    }
-}
-```
-
-## Using Labels in Creation Request Body
+The following example shows a complete request body with all available options:
 
 ```json
 {
     "env": {
-        "model_name": "<registered_model_name>"
+        "model_name": "<registered_model_name>",
+        "profiles": "gpu,internal"
     },
     "labels": {
         "key1": "value1",
@@ -57,32 +50,116 @@ GPU profile that enables the service to run an LLM on GPU resources.
 }
 ```
 
-{{< info >}}
-Labels are optional. Labels can be used to filter and identify services in
-the Platform. If you want to use labels, define them as a key-value pair in `labels`
-within the `env` field.
-{{< /info >}}
+**Optional fields:**
+
+- **labels**: Key-value pairs used to filter and identify services in the platform.
+- **profiles**: A comma-separated string defining which profiles to use for the 
+  service (e.g., `"gpu,internal"`). If not set, the service is created with the 
+  default profile. Profiles must be present and created in the platform before 
+  they can be used.
+
+The parameters required for the deployment of each service are defined in the
+corresponding service documentation. See [Importer](importer.md)
+and [Retriever](retriever.md).
+
+## Projects
+
+Projects help you organize your GraphRAG work by grouping related services and 
+keeping your data separate. When the Importer service creates ArangoDB collections 
+(such as documents, chunks, entities, relationships, and communities), it uses 
+your project name as a prefix. For example, a project named `docs` will have 
+collections like `docs_Documents`, `docs_Chunks`, and so on.
 
-## Using Profiles in Creation Request Body
+### Creating a project
+
+To create a new GraphRAG project, send a POST request to the project endpoint:
+
+```bash
+curl -X POST "https://<ExternalEndpoint>:8529/gen-ai/v1/project" \
+  -H "Authorization: Bearer <your-bearer-token>" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "project_name": "docs",
+    "project_type": "graphrag",
+    "project_db_name": "documentation",
+    "project_description": "A documentation project for GraphRAG."
+  }'
+```
+
+Where:
+- **project_name** (required): Unique identifier for your project. Must be 1-63 
+  characters and contain only letters, numbers, underscores (`_`), and hyphens (`-`).
+- **project_type** (required): Type of project (e.g., `"graphrag"`).
+- **project_db_name** (required): The ArangoDB database name where the project 
+  will be created.
+- **project_description** (optional): A description of your project.
+
+Once created, you can reference your project in service deployments using the 
+`genai_project_name` field:
 
 ```json
 {
-    "env": {
-        "model_name": "<registered_model_name>",
-        "profiles": "gpu,internal"
-    }
+  "env": {
+    "genai_project_name": "docs"
+  }
 }
 ```
 
-{{< info >}}
-The `profiles` field is optional. If it is not set, the service is created with
-the default profile. Profiles must be present and created in the Platform before
-they can be used. If you want to use profiles, define them as a comma-separated
-string in `profiles` within the `env` field.
-{{< /info >}}
+### Listing projects
 
-The parameters required for the deployment of each service are defined in the
-corresponding service documentation.
+**List all project names in a database:**
+
+```bash
+curl -X GET "https://<ExternalEndpoint>:8529/gen-ai/v1/all_project_names/<database_name>" \
+  -H "Authorization: Bearer <your-bearer-token>"
+```
+
+This returns only the project names for quick reference.
+
+**List all projects with full metadata in a database:**
+
+```bash
+curl -X GET "https://<ExternalEndpoint>:8529/gen-ai/v1/all_projects/<database_name>" \
+  -H "Authorization: Bearer <your-bearer-token>"
+```
+
+This returns complete project objects including metadata, associated services, 
+and knowledge graph information.
+
+### Getting project details
+
+Retrieve comprehensive metadata for a specific project:
+
+```bash
+curl -X GET "https://<ExternalEndpoint>:8529/gen-ai/v1/project_by_name/<database_name>/<project_name>" \
+  -H "Authorization: Bearer <your-bearer-token>"
+```
+
+The response includes:
+- Project configuration
+- Associated Importer and Retriever services
+- Knowledge graph metadata
+- Service status information
+- Last modification timestamp
+
+### Deleting a project
+
+Remove a project's metadata from the GenAI service:
+
+```bash
+curl -X DELETE "https://<ExternalEndpoint>:8529/gen-ai/v1/project/<database_name>/<project_name>" \
+  -H "Authorization: Bearer <your-bearer-token>"
+```
+
+{{< warning >}}
+Deleting a project only removes the project metadata from the GenAI service. 
+It does **not** delete:
+- Services associated with the project (must be deleted separately)
+- ArangoDB collections and data
+- Knowledge graphs
+
+You must manually delete services and collections if needed.
+{{< /warning >}}
 
 ## Obtaining a Bearer Token
 
@@ -101,7 +178,7 @@ documentation.
 
 ## Complete Service lifecycle example
 
-The example below shows how to install, monitor, and uninstall the Importer service.
+The example below shows how to install, monitor, and uninstall the [Importer](importer.md) service.
 
 ### Step 1: Installing the service
 
@@ -111,11 +188,10 @@ curl -X POST https://<ExternalEndpoint>:8529/ai/v1/graphragimporter \
   -H "Content-Type: application/json" \
   -d '{
     "env": {
-      "username": "<your-username>",
       "db_name": "<your-database-name>",
-      "api_provider": "<your-api-provider>",
-      "triton_url": "<your-arangodb-llm-host-url>",
-      "triton_model": "<your-triton-model>"
+      "chat_api_provider": "<your-api-provider>",
+      "chat_api_key": "<your-llm-provider-api-key>",
+      "chat_model": "<model-name>"
     }
   }'
 ```
@@ -176,16 +252,6 @@ curl -X DELETE https://<ExternalEndpoint>:8529/ai/v1/service/arangodb-graphrag-i
 - **Authentication**: All requests use the same Bearer token in the `Authorization` header
 {{< /info >}}
 
-### Customizing the example
-
-Replace the following values with your actual configuration:
-- `<your-username>` - Your database username.
-- `<your-database-name>` - Target database name.
-- `<your-api-provider>` - Your API provider (e.g., `triton`)
-- `<your-arangodb-llm-host-url>` - Your LLM host service URL.
-- `<your-triton-model>` - Your Triton model name (e.g., `mistral-nemo-instruct`).
-- `<your-bearer-token>` - Your authentication token.
-
 ## Service configuration
 
 The AI orchestrator service is **started by default**. 
diff --git a/site/content/ai-suite/reference/importer.md b/site/content/ai-suite/reference/importer.md
index e4cce5d200..5f66ecbe3e 100644
--- a/site/content/ai-suite/reference/importer.md
+++ b/site/content/ai-suite/reference/importer.md
@@ -28,40 +28,17 @@ different concepts in your document with the Retriever service.
 You can also use the GraphRAG Importer service via the [Data Platform web interface](../graphrag/web-interface.md).
 {{< /tip >}}
 
-## Creating a new project
-
-To create a new GraphRAG project, use the `CreateProject` method by sending a
-`POST` request to the `/ai/v1/project` endpoint. You must provide a unique
-`project_name` and a `project_type` in the request body. Optionally, you can
-provide a `project_description`.
-
-```curl
-curl -X POST "https://<ExternalEndpoint>:8529/ai/v1/project" \
--H "Content-Type: application/json" \
--d '{
-  "project_name": "docs",
-  "project_type": "graphrag",
-  "project_description": "A documentation project for GraphRAG."
-}'
-```
-All the relevant ArangoDB collections (such as documents, chunks, entities,
-relationships, and communities) created during the import process will
-have the project name as a prefix. For example, the Documents collection will
-become `<project_name>_Documents`. The Knowledge Graph will also use the project
-name as a prefix. If no project name is specified, then all collections
-are prefixed with `default_project`, e.g., `default_project_Documents`.
-
-### Project metadata
+## Prerequisites
 
-Additional project metadata is accessible via the following endpoint, replacing
-`<your_project>` with the actual name of your project:
+Before importing data, you need to create a GraphRAG project. Projects help you 
+organize your work and keep your data separate from other projects.
 
-```
-GET /ai/v1/project_by_name/<your_project>
-```
+For detailed instructions on creating and managing projects, see the 
+[Projects](gen-ai.md#projects) section in the GenAI Orchestration Service 
+documentation.
 
-The endpoint provides comprehensive metadata about your project's components,
-including its importer and retriever services and their status.
+Once you have created a project, you can reference it when deploying the Importer 
+service using the `genai_project_name` field in the service configuration.
 
 ## Deployment options
 
@@ -98,54 +75,34 @@ To start the service, use the AI service endpoint `/v1/graphragimporter`.
 Please refer to the documentation of [AI service](gen-ai.md) for more
 information on how to use it.
 
-### Using Triton Inference Server (Private LLM)
-
-The first step is to install the LLM Host service with the LLM and
-embedding models of your choice. The setup will the use the 
-Triton Inference Server and MLflow at the backend. 
-For more details, please refer to the [Triton Inference Server](triton-inference-server.md)
-and [Mlflow](mlflow.md) documentation.
-
-Once the `llmhost` service is up-and-running, then you can start the Importer
-service using the below configuration:
-
-```json
-{
-  "env": {
-    "username": "your_username",
-    "db_name": "your_database_name",
-    "api_provider": "triton",
-    "triton_url": "your-arangodb-llm-host-url",
-    "triton_model": "mistral-nemo-instruct"
-  },
-}
-```
-
-Where:
-- `username`: ArangoDB database user with permissions to create and modify collections.
-- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored.
-- `api_provider`: Specifies which LLM provider to use.
-- `triton_url`: URL of your Triton Inference Server instance. This should be the URL where your `llmhost` service is running.
-- `triton_model`: Name of the LLM model to use for text processing.
-
-### Using OpenAI (Public LLM)
+### Using OpenAI for chat and embedding
 
 ```json
 {
   "env": {
-    "openai_api_key": "your_openai_api_key",
-    "username": "your_username",
     "db_name": "your_database_name",
-    "api_provider": "openai"
+    "chat_api_provider": "openai",
+    "chat_api_url": "https://api.openai.com/v1",
+    "embedding_api_provider": "openai",
+    "embedding_api_url": "https://api.openai.com/v1",
+    "chat_model": "gpt-4o",
+    "embedding_model": "text-embedding-3-small",
+    "chat_api_key": "your_openai_api_key",
+    "embedding_api_key": "your_openai_api_key"
   },
 }
 ```
 
 Where:
-- `username`: ArangoDB database user with permissions to create and modify collections
 - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored
-- `api_provider`: Specifies which LLM provider to use
-- `openai_api_key`: Your OpenAI API key
+- `chat_api_provider`: API provider for language model services
+- `chat_api_url`: API endpoint URL for the chat/language model service
+- `embedding_api_provider`: API provider for embedding model services
+- `embedding_api_url`: API endpoint URL for the embedding model service
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+- `chat_api_key`: API key for authenticating with the chat/language model service
+- `embedding_api_key`: API key for authenticating with the embedding model service
 
 {{< info >}}
 By default, for OpenAI API, the service is using
@@ -153,7 +110,7 @@ By default, for OpenAI API, the service is using
 embedding model respectively.
 {{< /info >}}
 
-### Using OpenRouter (Gemini, Anthropic, etc.)
+### Using OpenRouter for chat and OpenAI for embedding
 
 OpenRouter makes it possible to connect to a huge array of LLM API
 providers, including non-OpenAI LLMs like Gemini Flash, Anthropic Claude
@@ -166,28 +123,68 @@ while OpenAI is used for the embedding model.
     {
       "env": {
         "db_name": "your_database_name",
-        "username": "your_username",
-        "api_provider": "openrouter",
-        "openai_api_key": "your_openai_api_key",
-        "openrouter_api_key": "your_openrouter_api_key",
-        "openrouter_model": "mistralai/mistral-nemo"  // Specify a model here
+        "chat_api_provider": "openai",
+        "embedding_api_provider": "openai",
+        "chat_api_url": "https://openrouter.ai/api/v1",
+        "embedding_api_url": "https://api.openai.com/v1",
+        "chat_model": "mistral-nemo",
+        "embedding_model": "text-embedding-3-small",
+        "chat_api_key": "your_openrouter_api_key",
+        "embedding_api_key": "your_openai_api_key"
       },
     }
 ```
 
 Where:
-- `username`: ArangoDB database user with permissions to access collections  
-- `db_name`: Name of the ArangoDB database where the knowledge graph is stored  
-- `api_provider`: Specifies which LLM provider to use  
-- `openai_api_key`: Your OpenAI API key (for the embedding model)  
-- `openrouter_api_key`: Your OpenRouter API key (for the LLM)  
-- `openrouter_model`: Desired LLM (optional; default is `mistral-nemo`)
+- `db_name`: Name of the ArangoDB database where the knowledge graph is stored
+- `chat_api_provider`: API provider for language model services
+- `chat_api_url`: API endpoint URL for the chat/language model service
+- `embedding_api_provider`: API provider for embedding model services
+- `embedding_api_url`: API endpoint URL for the embedding model service
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+- `chat_api_key`: API key for authenticating with the chat/language model service
+- `embedding_api_key`: API key for authenticating with the embedding model service
 
 {{< info >}}
 When using OpenRouter, the service defaults to `mistral-nemo` for generation
 (via OpenRouter) and `text-embedding-3-small` for embeddings (via OpenAI).
 {{< /info >}}
 
+### Using Triton Inference Server for chat and embedding
+
+The first step is to install the LLM Host service with the LLM and
+embedding models of your choice. The setup will the use the 
+Triton Inference Server and MLflow at the backend. 
+For more details, please refer to the [Triton Inference Server](triton-inference-server.md)
+and [Mlflow](mlflow.md) documentation.
+
+Once the `llmhost` service is up-and-running, then you can start the Importer
+service using the below configuration:
+
+```json
+{
+  "env": {
+    "db_name": "your_database_name",
+    "chat_api_provider": "triton",
+    "embedding_api_provider": "triton",
+    "chat_api_url": "your-arangodb-llm-host-url",
+    "embedding_api_url": "your-arangodb-llm-host-url",
+    "chat_model": "mistral-nemo-instruct",
+    "embedding_model": "nomic-embed-text-v1"
+  },
+}
+```
+
+Where:
+- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored
+- `chat_api_provider`: Specifies which LLM provider to use for language model services
+- `embedding_api_provider`: API provider for embedding model services (e.g., "triton")
+- `chat_api_url`: API endpoint URL for the chat/language model service
+- `embedding_api_url`: API endpoint URL for the embedding model service
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+
 ## Building Knowledge Graphs
 
 Once the service is installed successfully, you can follow these steps
diff --git a/site/content/ai-suite/reference/retriever.md b/site/content/ai-suite/reference/retriever.md
index 5949d8a369..8eb9d64711 100644
--- a/site/content/ai-suite/reference/retriever.md
+++ b/site/content/ai-suite/reference/retriever.md
@@ -15,10 +15,10 @@ the Arango team.
 ## Overview
 
 The Retriever service offers two distinct search methods:
-- **Global search**: Analyzes entire document to identify themes and patterns,
-  perfect for high-level insights and comprehensive summaries.
-- **Local search**: Focuses on specific entities and their relationships, ideal
-  for detailed queries about particular concepts.
+- **Instant search**: Focuses on specific entities and their relationships, ideal
+  for fast queries about particular concepts.
+- **Deep search**: Analyzes the knowledge graph structure to identify themes and patterns,
+  perfect for comprehensive insights and detailed summaries.
 
 The service supports both private (Triton Inference Server) and public (OpenAI)
 LLM deployments, making it flexible for various security and infrastructure
@@ -33,24 +33,38 @@ graph and get contextually relevant responses.
 - Configurable community hierarchy levels
 
 {{< tip >}}
-You can also use the GraphRAG Retriever service via the ArangoDB [web interface](../graphrag/web-interface.md).
+You can also use the GraphRAG Retriever service via the [web interface](../graphrag/web-interface.md).
 {{< /tip >}}
 
+## Prerequisites
+
+Before using the Retriever service, you need to:
+
+1. **Create a GraphRAG project** - For detailed instructions on creating and 
+   managing projects, see the [Projects](gen-ai.md#projects) section in the 
+   GenAI Orchestration Service documentation.
+
+2. **Import data** - Use the [Importer](importer.md) service to transform your 
+   text documents into a knowledge graph stored in ArangoDB.
+
 ## Search methods
 
 The Retriever service enables intelligent search and retrieval of information
-from your knowledge graph. It provides two powerful search methods, global Search
-and local Search, that leverage the structured knowledge graph created by the Importer
+from your knowledge graph. It provides two powerful search methods, instant search
+and deep search, that leverage the structured knowledge graph created by the Importer
 to deliver accurate and contextually relevant responses to your natural language queries.
 
-### Global search
+### Deep Search
 
-Global search is designed for queries that require understanding and aggregation
-of information across your entire document. It's particularly effective for questions
-about overall themes, patterns, or high-level insights in your data.
+Deep Search is designed for highly detailed, accurate responses that require understanding
+what kind of information is available in different parts of the knowledge graph and
+sequentially retrieving information in an LLM-guided research process. Use whenever
+detail and accuracy are required (e.g. aggregation of highly technical details) and
+very short latency is not (i.e. caching responses for frequently asked questions,
+or use case with agents or research use cases).
 
 - **Community-Based Analysis**: Uses pre-generated community reports from your
-  knowledge graph to understand the overall structure and themes of your data,
+  knowledge graph to understand the overall structure and themes of your data.
 - **Map-Reduce Processing**:
    - **Map Stage**: Processes community reports in parallel, generating intermediate responses with rated points.
    - **Reduce Stage**: Aggregates the most important points to create a comprehensive final response.
@@ -60,11 +74,12 @@ about overall themes, patterns, or high-level insights in your data.
 - "Summarize the key findings across all documents"
 - "What are the most important concepts discussed?"
 
-### Local search
+### Instant Search
 
-Local search focuses on specific entities and their relationships within your
-knowledge graph. It is ideal for detailed queries about particular concepts,
-entities, or relationships.
+Instant Search is designed for responses with very short latency. It triggers
+fast unified retrieval over relevant parts of the knowledge graph via hybrid
+(semantic and lexical) search and graph expansion algorithms, producing a fast,
+streamed natural-language response with clickable references to the relevant documents.
 
 - **Entity Identification**: Identifies relevant entities from the knowledge graph based on the query.
 - **Context Gathering**: Collects:
@@ -88,54 +103,35 @@ To start the service, use the AI service endpoint `/v1/graphragretriever`.
 Please refer to the documentation of [AI service](gen-ai.md) for more
 information on how to use it.
 
-### Using Triton Inference Server (Private LLM)
+### Using OpenAI for chat and embedding
 
-The first step is to install the LLM Host service with the LLM and
-embedding models of your choice. The setup will the use the 
-Triton Inference Server and MLflow at the backend. 
-For more details, please refer to the [Triton Inference Server](triton-inference-server.md)
-and [Mlflow](mlflow.md) documentation.
-
-Once the `llmhost` service is up-and-running, then you can start the Importer
-service using the below configuration:
 
 ```json
 {
   "env": {
-    "username": "your_username",
     "db_name": "your_database_name",
-    "api_provider": "triton",
-    "triton_url": "your-arangodb-llm-host-url",
-    "triton_model": "mistral-nemo-instruct"
+    "chat_api_provider": "openai",
+    "chat_api_url": "https://api.openai.com/v1",
+    "embedding_api_provider": "openai",
+    "embedding_api_url": "https://api.openai.com/v1",
+    "chat_model": "gpt-4o",
+    "embedding_model": "text-embedding-3-small",
+    "chat_api_key": "your_openai_api_key",
+    "embedding_api_key": "your_openai_api_key"
   },
 }
 ```
 
 Where:
-- `username`: ArangoDB database user with permissions to access collections.
-- `db_name`: Name of the ArangoDB database where the knowledge graph is stored.
-- `api_provider`: Specifies which LLM provider to use.
-- `triton_url`: URL of your Triton Inference Server instance. This should be the URL where your `llmhost` service is running.
-- `triton_model`: Name of the LLM model to use for text processing.
-
-### Using OpenAI (Public LLM)
-
-```json
-{
-  "env": {
-    "openai_api_key": "your_openai_api_key",
-    "username": "your_username",
-    "db_name": "your_database_name",
-    "api_provider": "openai"
-  },
-}
-```
-
-Where:
-- `username`: ArangoDB database user with permissions to access collections.
-- `db_name`: Name of the ArangoDB database where the knowledge graph is stored.
-- `api_provider`: Specifies which LLM provider to use.
-- `openai_api_key`: Your OpenAI API key.
+- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored
+- `chat_api_provider`: API provider for language model services
+- `chat_api_url`: API endpoint URL for the chat/language model service
+- `embedding_api_provider`: API provider for embedding model services
+- `embedding_api_url`: API endpoint URL for the embedding model service
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+- `chat_api_key`: API key for authenticating with the chat/language model service
+- `embedding_api_key`: API key for authenticating with the embedding model service
 
 {{< info >}}
 By default, for OpenAI API, the service is using
@@ -143,7 +139,7 @@ By default, for OpenAI API, the service is using
 embedding model respectively.
 {{< /info >}}
 
-### Using OpenRouter (Gemini, Anthropic, etc.)
+### Using OpenRouter for chat and OpenAI for embedding
 
 OpenRouter makes it possible to connect to a huge array of LLM API providers,
 including non-OpenAI LLMs like Gemini Flash, Anthropic Claude and publicly hosted
@@ -156,28 +152,67 @@ OpenAI is used for the embedding model.
     {
       "env": {
         "db_name": "your_database_name",
-        "username": "your_username",
-        "api_provider": "openrouter",
-        "openai_api_key": "your_openai_api_key",
-        "openrouter_api_key": "your_openrouter_api_key",
-        "openrouter_model": "mistralai/mistral-nemo"  // Specify a model here
+        "chat_api_provider": "openai",
+        "embedding_api_provider": "openai",
+        "chat_api_url": "https://openrouter.ai/api/v1",
+        "embedding_api_url": "https://api.openai.com/v1",
+        "chat_model": "mistral-nemo",
+        "embedding_model": "text-embedding-3-small",
+        "chat_api_key": "your_openrouter_api_key",
+        "embedding_api_key": "your_openai_api_key"
       },
     }
 ```
 
 Where:
-- `username`: ArangoDB database user with permissions to access collections.
-- `db_name`: Name of the ArangoDB database where the knowledge graph is stored.
-- `api_provider`: Specifies which LLM provider to use.
-- `openai_api_key`: Your OpenAI API key (for the embedding model).
-- `openrouter_api_key`: Your OpenRouter API key (for the LLM).
-- `openrouter_model`: Desired LLM (optional; default is `mistral-nemo`).
+- `db_name`: Name of the ArangoDB database where the knowledge graph is stored
+- `chat_api_provider`: API provider for language model services
+- `embedding_api_provider`: API provider for embedding model services
+- `embedding_api_url`: API endpoint URL for the embedding model service
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+- `chat_api_key`: API key for authenticating with the chat/language model service
+- `embedding_api_key`: API key for authenticating with the embedding model service
 
 {{< info >}}
 When using OpenRouter, the service defaults to `mistral-nemo` for generation
 (via OpenRouter) and `text-embedding-3-small` for embeddings (via OpenAI).
 {{< /info >}}
 
+### Using Triton Inference Server for chat and embedding
+
+The first step is to install the LLM Host service with the LLM and
+embedding models of your choice. The setup will the use the 
+Triton Inference Server and MLflow at the backend. 
+For more details, please refer to the [Triton Inference Server](triton-inference-server.md)
+and [Mlflow](mlflow.md) documentation.
+
+Once the `llmhost` service is up-and-running, then you can start the Importer
+service using the below configuration:
+
+```json
+{
+  "env": {
+    "db_name": "your_database_name",
+    "chat_api_provider": "triton",
+    "embedding_api_provider": "triton",
+    "chat_api_url": "your-arangodb-llm-host-url",
+    "embedding_api_url": "your-arangodb-llm-host-url",
+    "chat_model": "mistral-nemo-instruct",
+    "embedding_model": "nomic-embed-text-v1"
+  },
+}
+```
+
+Where:
+- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored
+- `chat_api_provider`: Specifies which LLM provider to use for language model services
+- `embedding_api_provider`: API provider for embedding model services (e.g., "triton")
+- `chat_api_url`: API endpoint URL for the chat/language model service
+- `embedding_api_url`: API endpoint URL for the embedding model service
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+
 ## Executing queries
 
 After the Retriever service is installed successfully, you can interact with 
@@ -185,28 +220,32 @@ it using the following HTTP endpoints, based on the selected search method.
 
 {{< tabs "executing-queries" >}}
 
-{{< tab "Local search" >}}
+{{< tab "Instant search" >}}
 ```bash
-curl -X POST /v1/graphrag-query \
+curl -X POST /v1/graphrag-query-stream \
   -H "Content-Type: application/json" \
   -d '{
     "query": "What is the AR3 Drone?",
-    "query_type": 2,
-    "provider": 0
+    "query_type": "UNIFIED",
+    "provider": 0,
+    "include_metadata": true,
+    "use_llm_planner": false
   }'
 ```
 {{< /tab >}}
 
-{{< tab "Global search" >}}
+{{< tab "Deep search" >}}
 
 ```bash
 curl -X POST /v1/graphrag-query \
   -H "Content-Type: application/json" \
   -d '{
-    "query": "What is the AR3 Drone?",
+    "query": "What are the main themes and topics discussed in the documents?",
     "level": 1,
-    "query_type": 1,
-    "provider": 0
+    "query_type": "LOCAL",
+    "provider": 0,
+    "include_metadata": true,
+    "use_llm_planner": true
   }'
 ```
 {{< /tab >}}
@@ -215,13 +254,15 @@ curl -X POST /v1/graphrag-query \
 
 The request parameters are the following:
 - `query`: Your search query text.
-- `level`: The community hierarchy level to use for the search (`1` for top-level communities).
+- `level`: The community hierarchy level to use for the search (`1` for top-level communities). Defaults to `2` if not provided.
 - `query_type`: The type of search to perform.
-  - `1`: Global search.
-  - `2`: Local search.
-- `provider`: The LLM provider to use
+  - `UNIFIED`: Instant search.
+  - `LOCAL`: Deep search.
+- `provider`: The LLM provider to use:
   - `0`: OpenAI (or OpenRouter)
   - `1`: Triton
+- `include_metadata`: Whether to include metadata in the response. If not specified, defaults to `true`.
+- `use_llm_planner`: Whether to use the LLM planner for intelligent query processing. If not specified, defaults to `true`.
 
 ## Health check
 
@@ -249,17 +290,6 @@ properties:
 }
 ```
 
-## Best Practices
-
-- **Choose the right search method**:
-   - Use global search for broad, thematic queries.
-   - Use local search for specific entity or relationship queries.
-
-
-- **Performance considerations**:
-   - Global search may take longer due to its map-reduce process.
-   - Local search is typically faster for concrete queries.
-
 ## API Reference
 
 For detailed API documentation, see the
diff --git a/site/content/ai-suite/reference/triton-inference-server.md b/site/content/ai-suite/reference/triton-inference-server.md
index 458226743e..1e1b982932 100644
--- a/site/content/ai-suite/reference/triton-inference-server.md
+++ b/site/content/ai-suite/reference/triton-inference-server.md
@@ -26,8 +26,8 @@ following steps:
 
 1. Install the Triton LLM Host service.
 2. Register your LLM model to MLflow by uploading the required files.
-3. Configure the [Importer](importer.md#using-triton-inference-server-private-llm) service to use your LLM model.
-4. Configure the [Retriever](retriever.md#using-triton-inference-server-private-llm) service to use your LLM model.
+3. Configure the [Importer](importer.md#using-triton-inference-server-for-chat-and-embedding) service to use your LLM model.
+4. Configure the [Retriever](retriever.md#using-triton-inference-server-for-chat-and-embedding) service to use your LLM model.
 
 {{< tip >}}
 Check out the dedicated [ArangoDB MLflow](mlflow.md) documentation page to learn