Doc fixes

pamelafox · pamelafox · commit 51fd2981f4e0 · 2025-06-30T07:38:37.000-07:00
diff --git a/docs/deploy_features.md b/docs/deploy_features.md
@@ -6,7 +6,7 @@ You should typically enable these features before running `azd up`. Once you've
 * [Using different chat completion models](#using-different-chat-completion-models)
 * [Using reasoning models](#using-reasoning-models)
 * [Using different embedding models](#using-different-embedding-models)
-* [Enabling GPT vision feature](#enabling-gpt-vision-feature)
+* [Enabling multimodal embeddings and answering](#enabling-multimodal-embeddings-and-answering)
 * [Enabling media description with Azure Content Understanding](#enabling-media-description-with-azure-content-understanding)
 * [Enabling client-side chat history](#enabling-client-side-chat-history)
 * [Enabling persistent chat history with Azure Cosmos DB](#enabling-persistent-chat-history-with-azure-cosmos-db)
@@ -226,34 +226,7 @@ If you have already deployed:
 When your documents include images, you can optionally enable this feature that can
 use image embeddings when searching and also use images when answering questions.
 
-To enable multimodal embeddings and answering, run:
-
-```shell
-azd env set USE_MULTIMODAL true
-```
-
-With this feature enabled, the data ingestion process will extract images from your documents
-using Document Intelligence, store the images in Azure Blob Storage, vectorize the images using the Azure AI Vision service, and store the image embeddings in the Azure AI Search index.
-
-During the RAG flow, the app will perform a multi-vector query using both text and image embeddings, and then send any images associated with the retrieved document chunks to the chat completion model for answering questions. This feature assumes that your chat completion model supports multimodal inputs, such as `gpt-4o` or `gpt-4o-mini`.
-
-You can customize the RAG flow approach with a few additional environment variables.
-
-To only use the text embeddings for the search step (no image embeddings), run:
-
-```shell
-azd env set RAG_VECTOR_FIELDS_DEFAULT "textEmbeddingOnly"
-```
-
-To only send text sources to the chat completion model (no images), run:
-
-```shell
-azd env set RAG_LLM_INPUTS_OVERRIDE "texts"
-```
-
-You can also modify those settings in the "Developer Settings" in the chat UI,
-to experiment with different options before committing to them.
-
+Learn more in the [multimodal guide](./multimodal.md).
 
 ## Enabling media description with Azure Content Understanding
 
diff --git a/docs/multimodal.md b/docs/multimodal.md
@@ -8,6 +8,13 @@ This repository includes an optional feature that uses the GPT vision model to g
 
 ## How it works
 
+
+With this feature enabled, the data ingestion process will extract images from your documents
+using Document Intelligence, store the images in Azure Blob Storage, vectorize the images using the Azure AI Vision service, and store the image embeddings in the Azure AI Search index.
+
+During the RAG flow, the app will perform a multi-vector query using both text and image embeddings, and then send any images associated with the retrieved document chunks to the chat completion model for answering questions. This feature assumes that your chat completion model supports multimodal inputs, such as `gpt-4o` or `gpt-4o-mini`.
+
+----OLD----
 When this feature is enabled, the following changes are made to the application:
 
 * **Search index**: We added a new field to the Azure AI Search index to store the embedding returned by the multimodal Azure AI Vision API (while keeping the existing field that stores the OpenAI text embeddings).
@@ -32,31 +39,69 @@ For more details on how this feature works, read [this blog post](https://techco
 
 ### Deployment
 
-1. **Enable GPT vision approach:**
+1. **Enable multimodal capabilities:**
 
    First, make sure you do *not* have integrated vectorization enabled, since that is currently incompatible:
 
    ```shell
    azd env set USE_FEATURE_INT_VECTORIZATION false
    ```
 
-   Then set the environment variable for enabling vision support:
+   Then set the azd environment variable to enable the multimodal feature:
+
+   ```shell
+   azd env set USE_MULTIMODAL true
+   ```
+
+3. **Provision the multimodal resources:**
+
+   Either run `azd up` if you haven't run it before, or run `azd provision` to provision the multimodal resources. This will create a new Azure AI Vision account and update the Azure AI Search index to include the new image embedding field.
+
+4. **Re-index the data:**
+
+   If you have already indexed data, you will need to re-index it to include the new image embeddings.
+   We recommend creating a new Azure AI Search index to avoid conflicts with the existing index.
 
    ```shell
-   azd env set USE_GPT4V true
+   azd env set AZURE_SEARCH_INDEX multimodal-index
    ```
 
-   When set, that flag will provision a Azure AI Vision resource and gpt-4o model, upload image versions of PDFs to Blob storage, upload embeddings of images in a new `imageEmbedding` field, and enable the vision approach in the UI.
+   Then run the data ingestion process again to re-index the data:
 
-2. **Clean old deployments (optional):**
-   Run `azd down --purge` for a fresh setup.
+   Linux/Mac:
 
-3. **Start the application:**
-   Execute `azd up` to build, provision, deploy, and initiate document preparation.
+   ```shell
+   ./scripts/prepdocs.sh
+   ```
+
+   Windows:
+
+   ```shell
+   .\scripts\prepdocs.ps1
+   ```
 
 4. **Try out the feature:**
     ![GPT4V configuration screenshot](./images/gpt4v.png)
    * Access the developer options in the web app and select "Use GPT vision model".
    * New sample questions will show up in the UI that are based on the sample financial document.
    * Try out a question and see the answer generated by the GPT vision model.
    * Check the 'Thought process' and 'Supporting content' tabs.
+
+5. **Customize the multimodal approach:**
+
+   You can customize the RAG flow approach with a few additional environment variables.
+
+   To only use the text embeddings for the search step (no image embeddings), run:
+
+   ```shell
+   azd env set RAG_VECTOR_FIELDS_DEFAULT "textEmbeddingOnly"
+   ```
+
+   To only send text sources to the chat completion model (no images), run:
+
+   ```shell
+   azd env set RAG_LLM_INPUTS_OVERRIDE "texts"
+   ```
+
+   You can also modify those settings in the "Developer Settings" in the chat UI,
+   to experiment with different options before committing to them.