Use gpt-4o for vision approach (#1656)

pamelafox · web-flow · commit a5c4cce840c6 · 2024-06-03T09:19:20.000-07:00
* Update readme

* Gpt40 changes

* Default to gpt-4o for vision

* Revert unneeded change

* GPt4 version

* Merge
diff --git a/app/frontend/src/components/GPT4VSettings/GPT4VSettings.tsx b/app/frontend/src/components/GPT4VSettings/GPT4VSettings.tsx
@@ -34,7 +34,7 @@ export const GPT4VSettings = ({ updateGPT4VInputs, updateUseGPT4V, isUseGPT4V, g
 
     return (
         <Stack className={styles.container} tokens={{ childrenGap: 10 }}>
-            <Checkbox checked={useGPT4V} label="Use GPT-4 Turbo with Vision" onChange={onuseGPT4V} />
+            <Checkbox checked={useGPT4V} label="Use GPT vision model" onChange={onuseGPT4V} />
             {useGPT4V && (
                 <Dropdown
                     selectedKey={vectorFieldOption}
diff --git a/docs/customization.md b/docs/customization.md
@@ -45,7 +45,7 @@ The `system_message_chat_conversation` variable is currently tailored to the sam
 
 ##### Chat with vision
 
-If you followed the instructions in [docs/gpt4v.md](gpt4v.md) to enable the GPT-4 Vision model and then select "Use GPT-4 Turbo with Vision", then the chat tab will use the `chatreadretrievereadvision.py` approach instead. This approach is similar to the `chatreadretrieveread.py` approach, with a few differences:
+If you followed the instructions in [docs/gpt4v.md](gpt4v.md) to enable a GPT Vision model and then select "Use GPT vision model", then the chat tab will use the `chatreadretrievereadvision.py` approach instead. This approach is similar to the `chatreadretrieveread.py` approach, with a few differences:
 
 1. Step 1 is the same as before, except it uses the GPT-4 Vision model instead of the default GPT-3.5 model.
 2. For this step, it also calculates a vector embedding for the user question using [the Computer Vision vectorize text API](https://learn.microsoft.com/azure/ai-services/computer-vision/how-to/image-retrieval#call-the-vectorize-text-api), and passes that to the Azure AI Search to compare against the `imageEmbeddings` fields in the indexed documents. For each matching document, it downloads the image blob and converts it to a base 64 encoding.
@@ -62,7 +62,7 @@ The `system_chat_template` variable is currently tailored to the sample data sin
 
 #### Read with vision
 
-If you followed the instructions in [docs/gpt4v.md](gpt4v.md) to enable the GPT-4 Vision model and then select "Use GPT-4 Turbo with Vision", then the ask tab will use the `retrievethenreadvision.py` approach instead. This approach is similar to the `retrievethenread.py` approach, with a few differences:
+If you followed the instructions in [docs/gpt4v.md](gpt4v.md) to enable the GPT-4 Vision model and then select "Use GPT vision model", then the ask tab will use the `retrievethenreadvision.py` approach instead. This approach is similar to the `retrievethenread.py` approach, with a few differences:
 
 1. For this step, it also calculates a vector embedding for the user question using [the Computer Vision vectorize text API](https://learn.microsoft.com/azure/ai-services/computer-vision/how-to/image-retrieval#call-the-vectorize-text-api), and passes that to the Azure AI Search to compare against the `imageEmbeddings` fields in the indexed documents. For each matching document, it downloads the image blob and converts it to a base 64 encoding.
 2. When it combines the search results and user question, it includes the base 64 encoded images, and sends along both the text and images to the GPT4 Vision model (similar to this [documentation example](https://platform.openai.com/docs/guides/vision/quick-start)). The model generates a response that includes citations to the images, and the UI renders the base64 encoded images when a citation is clicked.
diff --git a/docs/gpt4v.md b/docs/gpt4v.md
@@ -21,11 +21,11 @@ This repository now includes an example of integrating GPT-4 Turbo with Vision w
    |`gpt-4`|`vision-preview`|
 
 - Ensure that you can deploy the Azure OpenAI resource group in [a region where all required components are available](https://learn.microsoft.com/azure/cognitive-services/openai/concepts/models#model-summary-table-and-region-availability):
-    - Azure OpenAI models
-        - gpt-35-turbo
-        - text-embedding-ada-002
-        - gpt-4v
-    - [Azure AI Vision](https://learn.microsoft.com/azure/ai-services/computer-vision/)
+  - Azure OpenAI models
+    - gpt-35-turbo
+    - text-embedding-ada-002
+    - gpt-4v
+  - [Azure AI Vision](https://learn.microsoft.com/azure/ai-services/computer-vision/)
 
 ### Setup and Usage
 
@@ -46,20 +46,19 @@ This repository now includes an example of integrating GPT-4 Turbo with Vision w
    azd env set USE_GPT4V true
    ```
 
-   When set, that flag will provision a Computer Vision resource and GPT-4-vision model, upload image versions of PDFs to Blob storage, upload embeddings of images in a new `imageEmbedding` field, and enable the vision approach in the UI.
+   When set, that flag will provision a Computer Vision resource and gpt-4o model, upload image versions of PDFs to Blob storage, upload embeddings of images in a new `imageEmbedding` field, and enable the vision approach in the UI.
 
 3. **Clean old deployments (optional):**
    Run `azd down --purge` for a fresh setup.
 
 4. **Start the application:**
    Execute `azd up` to build, provision, deploy, and initiate document preparation.
 
-
 5. **Web Application Usage:**
     ![GPT4V configuration screenshot](./images/gpt4v.png)
-   - Access the developer options in the web app and select "Use GPT-4 Turbo with Vision".
+   - Access the developer options in the web app and select "Use GPT vision model".
    - Sample questions will be updated for testing.
    - Interact with the questions to view responses.
-   - The 'Thought Process' tab shows the retrieved data and its processing by GPT-4 Turbo with Vision.
+   - The 'Thought Process' tab shows the retrieved data and its processing by the GPT vision model.
 
 Feel free to explore and contribute to enhancing this feature. For questions or feedback, use the repository's issue tracker.
diff --git a/infra/main.bicep b/infra/main.bicep
@@ -114,9 +114,9 @@ var embedding = {
   dimensions: embeddingDimensions != 0 ? embeddingDimensions : 1536
 }
 
-param gpt4vModelName string = 'gpt-4'
-param gpt4vDeploymentName string = 'gpt-4v'
-param gpt4vModelVersion string = 'vision-preview'
+param gpt4vModelName string = 'gpt-4o'
+param gpt4vDeploymentName string = 'gpt-4o'
+param gpt4vModelVersion string = '2024-05-13'
 param gpt4vDeploymentCapacity int = 10
 
 param tenantId string = tenant().tenantId