dimensions update

HeidiSteen · HeidiSteen · commit 46348e07ae55 · 2025-01-09T09:29:25.000-08:00
diff --git a/articles/search/cognitive-search-skill-azure-openai-embedding.md b/articles/search/cognitive-search-skill-azure-openai-embedding.md
@@ -9,7 +9,7 @@ ms.custom:
   - ignite-2023
   - build-2024
 ms.topic: reference
-ms.date: 10/16/2024
+ms.date: 01/09/2025
 ---
 
 #	Azure OpenAI Embedding skill
@@ -46,7 +46,7 @@ Parameters are case-sensitive.
 | `deploymentId`   | The name of the deployed Azure OpenAI embedding model. The model should be an embedding model, such as text-embedding-ada-002. See the [List of Azure OpenAI models](/azure/ai-services/openai/concepts/models) for supported models.|
 | `authIdentity`   | A user-managed identity used by the search service for connecting to Azure OpenAI. You can use either a [system or user managed identity](search-howto-managed-identities-data-sources.md). To use a system manged identity, leave `apiKey` and `authIdentity` blank. The system-managed identity is used automatically. A managed identity must have [Cognitive Services OpenAI User](/azure/ai-services/openai/how-to/role-based-access-control#azure-openai-roles) permissions to send text to Azure OpenAI. |
 | `modelName` | This property is required if your skillset is created using the 2024-05-01-preview or 2024-07-01 REST API. Set this property to the deployment name of an Azure OpenAI embedding model deployed on the provider specified through `resourceUri` and identified through `deploymentId`. Currently, the supported values are `text-embedding-ada-002`, `text-embedding-3-large`, and `text-embedding-3-small`.  |
-| `dimensions` | (Optional, introduced in the 2024-05-01-preview REST API). The dimensions of embeddings that you would like to generate if the model supports reducing the embedding dimensions. Supported ranges are listed below. Defaults to the maximum dimensions for each model if not specified. For skillsets created using the 2023-10-01-preview, dimensions are fixed at 1536. |
+| `dimensions` | Optional, starting in the 2024-05-01-preview REST API, the dimensions of embeddings that you would like to generate, assuming the model supports a range of dimensions. Supported ranges are listed below, and currently only apply to the text-embedding-3 model series. The default is the maximum dimensions for each model. For skillsets created using earlier RESt API versions dating back to the 2023-10-01-preview, dimensions are fixed at 1536. When setting the dimensions property on a skill, make sure to set the `dimensions` property on the [vector field definition](vector-search-how-to-create-index,md#add-a-vector-field-to-the-fields-collection) to the same value. |
 
 ## Supported dimensions by `modelName`
 
diff --git a/articles/search/tutorial-rag-build-solution-index-schema.md b/articles/search/tutorial-rag-build-solution-index-schema.md
@@ -106,6 +106,8 @@ A minimal index for LLM is designed to store chunks of content. It typically inc
 
    Like the basic schema, it's organized around chunks. The `chunk_id` uniquely identifies each chunk. The `text_vector` field is an embedding of the chunk. The nonvector `chunk` field is a readable string. The `title` maps to a unique metadata storage path for the blobs. The `parent_id` is the only parent-level field, and it's a base64-encoded version of the parent file URI. 
 
+   In integrated vectorization workloads, the `dimensions` property on your vector fields should be identical to the number of `dimensions` generated by the embedding skill you're using to vectorize your data. In this tutorial, the embedding skill is the Azure OpenAI embedding skill that calls the text-embedding-3-large model. The skill is specified in the next tutorial.
+
    The schema also includes a `locations` field for storing generated content that's created by the [indexing pipeline](tutorial-rag-build-solution-pipeline.md).
 
    ```python
diff --git a/articles/search/tutorial-rag-build-solution-pipeline.md b/articles/search/tutorial-rag-build-solution-pipeline.md
@@ -10,7 +10,7 @@ ms.service: azure-ai-search
 ms.custom:
   - ignite-2024
 ms.topic: tutorial
-ms.date: 11/19/2024
+ms.date: 01/09/2025
 ---
 
 # Tutorial: Build an indexing pipeline for RAG on Azure AI Search
@@ -51,7 +51,7 @@ If you don't have an Azure subscription, create a [free account](https://azure.m
 
 Open or create a Jupyter notebook (`.ipynb`) in Visual Studio Code to contain the scripts that comprise the pipeline. Initial steps install packages and collect variables for the connections. After you complete the setup steps, you're ready to begin with the components of the indexing pipeline. 
 
-Let's start with the index schema from the [previous tutorial](tutorial-rag-build-solution-index-schema.md). It's organized around vectorized and nonvectorized chunks. It includes a `locations` field that stores AI-generated content created by the skillset.  
+Let's start with the index schema from the [previous tutorial](tutorial-rag-build-solution-index-schema.md). It's organized around vectorized and nonvectorized chunks. It includes a `locations` field that stores AI-generated content created by the skillset.
 
 ```python
 from azure.identity import DefaultAzureCredential
diff --git a/articles/search/tutorial-rag-build-solution-query.md b/articles/search/tutorial-rag-build-solution-query.md
@@ -10,7 +10,7 @@ ms.service: azure-ai-search
 ms.custom:
   - ignite-2024
 ms.topic: tutorial
-ms.date: 10/04/2024
+ms.date: 01/09/2025
 ---
 
 # Tutorial: Search your data using a chat model (RAG in Azure AI Search)
@@ -167,16 +167,10 @@ search_results = search_client.search(
     vector_queries= [vector_query],
     filter="search.ismatch('ice*', 'locations', 'full', 'any')",
     select=["title", "chunk", "locations"],
-    top=5,
+    top=5
 )
 
 sources_formatted = "=================\n".join([f'TITLE: {document["title"]}, CONTENT: {document["chunk"]}, LOCATIONS: {document["locations"]}' for document in search_results])
-
-search_results = search_client.search(
-    search_text=query,
-    top=10,
-    filter="search.ismatch('ice*', 'locations', 'full', 'any')",
-    select="title, chunk, locations"
 ```
 
 Results from the filtered query should now look similar to the following response. Notice the emphasis on ice cover.
diff --git a/articles/search/vector-search-how-to-create-index.md b/articles/search/vector-search-how-to-create-index.md
@@ -9,7 +9,7 @@ ms.service: azure-ai-search
 ms.custom:
   - ignite-2024
 ms.topic: how-to
-ms.date: 08/05/2024
+ms.date: 01/09/2025
 ---
 
 # Create a vector index
@@ -281,7 +281,7 @@ Vector fields are characterized by [their data type](/rest/api/searchservice/sup
 1. Define a vector field with the following attributes. You can store one generated embedding per field. For each vector field:
 
    + `type` must be a [vector data types](/rest/api/searchservice/supported-data-types#edm-data-types-for-vector-fields). `Collection(Edm.Single)` is the most common for embedding models.
-   + `dimensions` is the number of dimensions generated by the embedding model. For text-embedding-ada-002, it's 1536.
+   + `dimensions` is the number of dimensions generated by the embedding model. For text-embedding-ada-002, it's fixed at 1536. For text-embedding-3 model series, there's a range of values. If you're using integrated vectorization and an embedding skill to generate vectors, make sure this property is set to the [same dimensions value](cognitive-search-skill-azure-openai-embedding.md#supported-dimensions-by-modelname) used by the embedding skill.
    + `vectorSearchProfile` is the name of a profile defined elsewhere in the index.
    + `searchable` must be true.
    + `retrievable` can be true or false. True returns the raw vectors (1536 of them) as plain text and consumes storage space. Set to true if you're passing a vector result to a downstream app.