couchbase-examples · shyam-cb · Dec 12, 2025 · Dec 11, 2025 · Dec 11, 2025 · Dec 11, 2025
diff --git a/capella-model-services/llamaindex/__frontmatter.__md b/capella-model-services/llamaindex/__frontmatter.__md
diff --git a/...odel-services/llamaindex/query_based/RAG_with_Capella_Model_Services_and_LlamaIndex.ipynb b/...odel-services/llamaindex/query_based/RAG_with_Capella_Model_Services_and_LlamaIndex.ipynb
diff --git a/capella-model-services/llamaindex/query_based/frontmatter.md b/capella-model-services/llamaindex/query_based/frontmatter.md
@@ -0,0 +1,22 @@
+---
+# frontmatter
+path: "/tutorial-capella-model-services-llamaindex-rag-with-hyperscale-and-composite-vector-index"
+title: "RAG with LlamaIndex, Capella Model Services and Couchbase Hyperscale & Composite Vector Indexes"
+short_title: "RAG with LlamaIndex, Capella Model Services and Hyperscale & Composite Vector Indexes"
+description:
+  - Learn how to build a semantic search engine using Couchbase Hyperscale and Composite Vector Indexes.
+  - This tutorial demonstrates how LlamaIndex integrates Couchbase vector search capabilities with embeddings generated by Capella Model Services.
+  - Perform Retrieval-Augmented Generation (RAG) using LlamaIndex with Couchbase and Capella Model Services.
+content_type: tutorial
+filter: sdk
+technology:
+  - vector search
+tags:
+  - Artificial Intelligence
+  - LlamaIndex
+  - Hyperscale Vector Index
+  - Composite Vector Index
+sdk_language:
+  - python
+length: 60 Mins
+---
diff --git a/...amaindex/RAG_with_Couchbase_Capella.ipynb → ...pella_Model_Services_and_LlamaIndex.ipynb b/...amaindex/RAG_with_Couchbase_Capella.ipynb → ...pella_Model_Services_and_LlamaIndex.ipynb
@@ -6,15 +6,15 @@
    "source": [
     "# Introduction\n",
     "\n",
-    "In this guide, we will walk you through building a Retrieval Augmented Generation (RAG) application using Couchbase Capella as the database, [Llama 3.1 8B Instruct](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1/) model as the large language model provided by Couchbase Capella AI Services. We will use the [e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) model for generating embeddings via the Capella AI Services.\n",
+    "In this guide, we will walk you through building a Retrieval Augmented Generation (RAG) application with LlamaIndex orchestrating Capella Model Services and Couchbase Capella. We will use the models hosted on Capella Model Services for response generation and generating embeddings.\n",
     "\n",
     "This notebook demonstrates how to build a RAG system using:\n",
     "- The [BBC News dataset](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime) containing news articles\n",
     "- Couchbase Capella as the vector store\n",
     "- LlamaIndex framework for the RAG pipeline\n",
     "- Capella AI Services for embeddings and text generation\n",
     "\n",
-    "Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial will equip you with the knowledge to create a fully functional RAG system using Capella AI Services and LlamaIndex."
+    "Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial will equip you with the knowledge to create a fully functional RAG system using Capella Model Services and LlamaIndex."
    ]
   },
   {
@@ -46,19 +46,21 @@
     "\n",
     "In order to create the RAG application, we need an embedding model to ingest the documents for Vector Search and a large language model (LLM) for generating the responses based on the context. \n",
     "\n",
-    "Capella Model Service allows you to create both the embedding model and the LLM in the same VPC as your database. Currently, the service offers Llama 3.1 Instruct model with 8 Billion parameters as an LLM and the mistral model for embeddings. \n",
+    "Capella Model Service allows you to create both the embedding model and the LLM in the same VPC as your database. There are multiple options for both the Embedding & Large Language Models, along with Value Adds to the models.\n",
     "\n",
-    "Create the models using the Capella AI Services interface. While creating the model, it is possible to cache the responses (both standard and semantic cache) and apply guardrails to the LLM responses.\n",
+    "Create the models using the Capella Model Services interface. While creating the model, it is possible to cache the responses (both standard and semantic cache) and apply guardrails to the LLM responses.\n",
     "\n",
-    "For more details, please refer to the [documentation](https://preview2.docs-test.couchbase.com/ai/get-started/about-ai-services.html#model).\n"
+    "For more details, please refer to the [documentation](https://docs.couchbase.com/ai/build/model-service/model-service.html). These models are compatible with the [Haystack OpenAI integration](https://haystack.deepset.ai/integrations/openai).\n",
+    "\n",
+    "After the models are deployed, please create the API keys for them and whitelist the keys on the IP on which the tutorial is being run. For more details, please refer to the documentation on [generating the API keys](https://docs.couchbase.com/ai/api-guide/api-start.html#model-service-keys).\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "# Installing Necessary Libraries\n",
-    "To build our RAG system, we need a set of libraries. The libraries we install handle everything from connecting to databases to performing AI tasks. Each library has a specific role: Couchbase libraries manage database operations, LlamaIndex handles AI model integrations, and we will use the OpenAI SDK for generating embeddings and calling the LLM in Capella AI services.\n"
+    "To build our RAG system, we need a set of libraries. The libraries we install handle everything from connecting to databases to performing AI tasks. Each library has a specific role: Couchbase libraries manage database operations, LlamaIndex handles AI model integrations, and we will use the OpenAI SDK (compatible with Capella Model Services) for generating embeddings and calling language models.\n"
    ]
   },
   {
@@ -68,7 +70,7 @@
    "outputs": [],
    "source": [
     "# Install required packages\n",
-    "%pip install datasets llama-index-vector-stores-couchbase==0.4.0 llama-index-embeddings-openai==0.3.1 llama-index-llms-openai-like==0.3.5 llama-index==0.12.37"
+    "%pip install datasets llama-index-vector-stores-couchbase==0.6.0 llama-index-embeddings-openai==0.5.1 llama-index-llms-openai-like==0.5.3 llama-index==0.14.10"
    ]
   },
   {
@@ -86,7 +88,6 @@
    "outputs": [],
    "source": [
     "import getpass\n",
-    "import base64\n",
     "import logging\n",
     "import sys\n",
     "import time\n",
@@ -116,16 +117,16 @@
     "\n",
     "The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code.\n",
     "\n",
-    "CAPELLA_AI_ENDPOINT is the Capella AI Services endpoint found in the models section.\n",
+    "CAPELLA_MODEL_SERVICES_ENDPOINT is the Capella Model Services endpoint found in the models section.\n",
     "\n",
-    "> Note that the Capella AI Endpoint also requires an additional `/v1` from the endpoint shown on the UI if it is not shown on the UI.\n",
+    "> Note that the Capella Model Services Endpoint also requires an additional `/v1` from the endpoint shown on the UI if it is not shown on the UI.\n",
     "\n",
     "INDEX_NAME is the name of the search index we will use for the vector search."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -135,15 +136,19 @@
     "CB_BUCKET_NAME = input(\"Couchbase Bucket: \")\n",
     "SCOPE_NAME = input(\"Couchbase Scope: \")\n",
     "COLLECTION_NAME = input(\"Couchbase Collection: \")\n",
-    "INDEX_NAME = input(\"Vector Search Index: \")\n",
-    "CAPELLA_AI_ENDPOINT = input(\"Enter your Capella AI Services Endpoint: \")\n",
+    "INDEX_NAME = \"vector_search\" # need to be matched with the search index name in the search_index.json file\n",
     "\n",
-    "# Check if the variables are correctly loaded\n",
-    "if not all([CB_CONNECTION_STRING, CB_USERNAME, CB_PASSWORD, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME, INDEX_NAME, CAPELLA_AI_ENDPOINT]):\n",
-    "    raise ValueError(\"All configuration variables must be provided.\")\n",
+    "# Get Capella AI endpoint\n",
+    "CAPELLA_MODEL_SERVICES_ENDPOINT = input(\"Enter your Capella Model Services Endpoint: \")\n",
+    "LLM_MODEL_NAME = input(\"Enter the LLM name\")\n",
+    "LLM_API_KEY = getpass.getpass(\"Enter your Couchbase LLM API Key: \")\n",
+    "EMBEDDING_MODEL_NAME = input(\"Enter the Embedding Model name:\")\n",
+    "EMBEDDING_API_KEY = getpass.getpass(\"Enter your Couchbase Embedding Model API Key: \")\n",
     "\n",
-    "# Generate a Capella AI key from the username and password\n",
-    "CAPELLA_AI_KEY = base64.b64encode(f\"{CB_USERNAME}:{CB_PASSWORD}\".encode(\"utf-8\")).decode(\"utf-8\")"
+    "# Check if the variables are correctly loaded\n",
+    "if not all([CB_CONNECTION_STRING, CB_USERNAME, CB_PASSWORD, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME, INDEX_NAME, \n",
+    "CAPELLA_MODEL_SERVICES_ENDPOINT, LLM_MODEL_NAME, LLM_API_KEY, EMBEDDING_MODEL_NAME, EMBEDDING_API_KEY]):\n",
+    "    raise ValueError(\"All configuration variables must be provided.\")"
    ]
   },
   {
@@ -156,7 +161,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -264,7 +269,7 @@
    "outputs": [],
    "source": [
     "# Create search index from search_index.json file at scope level\n",
-    "with open('fts_index.json', 'r') as search_file:\n",
+    "with open('search_index.json', 'r') as search_file:\n",
     "    search_index_definition = SearchIndex.from_json(json.load(search_file))\n",
     "    \n",
     "    # Update search index definition with user inputs\n",
@@ -287,7 +292,7 @@
     "        existing_index = scope_search_manager.get_index(search_index_name)\n",
     "        print(f\"Search index '{search_index_name}' already exists at scope level.\")\n",
     "    except Exception as e:\n",
-    "        print(f\"Search index '{search_index_name}' does not exist at scope level. Creating search index from fts_index.json...\")\n",
+    "        print(f\"Search index '{search_index_name}' does not exist at scope level. Creating search index from search_index.json...\")\n",
     "        scope_search_manager.upsert_index(search_index_definition)\n",
     "        print(f\"Search index '{search_index_name}' created successfully at scope level.\")"
    ]
@@ -370,8 +375,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Creating Embeddings using Capella AI Service\n",
-    "Embeddings are numerical representations of text that capture semantic meaning. Unlike keyword-based search, embeddings enable semantic search to understand context and retrieve documents that are conceptually similar even without exact keyword matches. We'll use Capella AI's OpenAI-compatible API to create embeddings with the intfloat/e5-mistral-7b-instruct model. This model transforms our text data into vector representations that can be efficiently searched, with a batch size of 30 for optimal processing.\n"
+    "# Creating Embeddings using Capella Model Service\n",
+    "Embeddings are numerical representations of text that capture semantic meaning. Unlike keyword-based search, embeddings enable semantic search to understand context and retrieve documents that are conceptually similar even without exact keyword matches. We'll use the model deployed on Capella Model Services to create high-quality embeddings. This model transforms our text data into vector representations that can be efficiently searched, with a batch size of 30 for optimal processing.\n"
    ]
   },
   {
@@ -383,9 +388,9 @@
     "try:\n",
     "    # Set up the embedding model\n",
     "    embed_model = OpenAIEmbedding(\n",
-    "        api_key=CAPELLA_AI_KEY,\n",
-    "        api_base=CAPELLA_AI_ENDPOINT,\n",
-    "        model_name=\"intfloat/e5-mistral-7b-instruct\",\n",
+    "        api_key=EMBEDDING_API_KEY,\n",
+    "        api_base=CAPELLA_MODEL_SERVICES_ENDPOINT,\n",
+    "        model_name=EMBEDDING_MODEL_NAME,\n",
     "        embed_batch_size=30\n",
     "    )\n",
     "    \n",
@@ -528,12 +533,12 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Using the Large Language Model (LLM) in Capella AI\n",
-    "Language language models are AI systems that are trained to understand and generate human language. We'll be using the `Llama3.1-8B-Instruct` large language model via the Capella AI services inside the same network as the Capella operational database to process user queries and generate meaningful responses. This model is a key component of our RAG system, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By creating this language model, we equip our RAG system with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses.\n",
+    "# Using Capella Model Services Large Language Model (LLM)\n",
+    "Large language models are AI systems that are trained to understand and generate human language. We'll be using the model deployed on Capella Model Services to process user queries and generate meaningful responses based on the retrieved context from our Couchbase vector store. This model is a key component of our RAG system, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By integrating the LLM, we equip our RAG system with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses.\n",
     "\n",
     "The language model's ability to understand context and generate coherent responses is what makes our RAG system truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user.\n",
     "\n",
-    "The LLM has been created using the LangChain OpenAI provider as well with the model name, URL and the API key based on the Capella AI Services."
+    "The LLM is configured using LlamaIndex's OpenAI-like provider with your Capella Model Services API key for seamless integration."
    ]
   },
   {
@@ -545,10 +550,9 @@
     "try:\n",
     "    # Set up the LLM\n",
     "    llm = OpenAILike(\n",
-    "        api_base=CAPELLA_AI_ENDPOINT,\n",
-    "        api_key=CAPELLA_AI_KEY,\n",
-    "        model=\"meta-llama/Llama-3.1-8B-Instruct\",\n",
-    "        \n",
+    "        api_base=CAPELLA_MODEL_SERVICES_ENDPOINT,\n",
+    "        api_key=LLM_API_KEY,\n",
+    "        model=LLM_MODEL_NAME,\n",
     "    )\n",
     "    \n",
     "    \n",
@@ -620,7 +624,7 @@
     "\n",
     "    # Display search results\n",
     "    print(f\"\\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):\")\n",
-    "    print(response)\n",
+    "    print(response.response)\n",
     "\n",
     "except RecursionError as e:\n",
     "    raise RuntimeError(f\"Error performing semantic search: {e}\")"
@@ -685,13 +689,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## LLM Guardrails in Capella AI Services\n",
-    "\n",
-    "Capella AI services also provide input and response moderation using configurable LLM guardrails. These services can integrate with the LlamaGuard3-8B model from Meta.\n",
-    "- Categories to be blocked can be configured during the model creation process.\n",
-    "- Helps prevent unsafe or undesirable interactions with the LLM.\n",
-    "\n",
-    "By implementing caching and moderation mechanisms, Capella AI services ensure an efficient, cost-effective, and responsible approach to AI-powered recommendations."
+    "# LLM Guardrails in Capella Model Services\n",
+    "Capella Model services also have the ability to moderate the user inputs and the responses generated by the LLM. Capella Model Services can be configured to use the [Llama 3.1 NemoGuard 8B safety model](https://build.nvidia.com/nvidia/llama-3_1-nemoguard-8b-content-safety/modelcard) guardrails model from Meta. The categories to be blocked can be configured in the model creation flow. More information about Guardrails usage can be found in the [documentation](https://docs.couchbase.com/ai/build/model-service/configure-guardrails-security.html#guardrails).\n",
+    " \n",
+    "Here is an example of the Guardrails in action"
    ]
   },
   {
@@ -727,7 +728,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "base",
+   "display_name": "haystack",
    "language": "python",
    "name": "python3"
   },
@@ -741,7 +742,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.7"
+   "version": "3.12.4"
   }
  },
  "nbformat": 4,

diff --git a/capella-model-services/llamaindex/README.md → ...ervices/llamaindex/search_based/README.md b/capella-model-services/llamaindex/README.md → ...ervices/llamaindex/search_based/README.md
diff --git a/capella-model-services/llamaindex/search_based/frontmatter.md b/capella-model-services/llamaindex/search_based/frontmatter.md
@@ -0,0 +1,21 @@
+---
+# frontmatter
+path: "/tutorial-capella-model-services-llamaindex-rag-with-search-vector-index"
+title: "RAG with LlamaIndex, Capella Model Services and Couchbase Search Vector Index"
+short_title: "RAG with LlamaIndex, Capella Model Services and Couchbase Search Vector Index"
+description:
+  - Learn how to build a semantic search engine using Couchbase Search Vector Index.
+  - This tutorial demonstrates how LlamaIndex integrates Couchbase vector search capabilities with embeddings generated by Capella Model Services.
+  - Perform Retrieval-Augmented Generation (RAG) using LlamaIndex with Couchbase and Capella Model Services.
+content_type: tutorial
+filter: sdk
+technology:
+  - vector search
+tags:
+  - Artificial Intelligence
+  - LlamaIndex
+  - Search Vector Index
+sdk_language:
+  - python
+length: 60 Mins
+---
diff --git a/...-model-services/llamaindex/fts_index.json → ...llamaindex/search_based/search_index.json b/...-model-services/llamaindex/fts_index.json → ...llamaindex/search_based/search_index.json
@@ -48,7 +48,7 @@
                   {
                     "vector_index_optimized_for": "recall",
                     "docvalues": true,
-                    "dims": 4096,
+                    "dims": 1024,
                     "include_in_all": false,
                     "include_term_vectors": false,
                     "index": true,