Small comments to image_understanding notebook

robtinn · robtinn · commit 419d4428b6be · 2025-05-13T09:54:22.000+01:00
diff --git a/examples/multimodal/image_understanding_with_rag.ipynb b/examples/multimodal/image_understanding_with_rag.ipynb
@@ -6,11 +6,11 @@
    "source": [
     "# Image Understanding with RAG using OpenAI's Vision & Responses APIs\n",
     "\n",
-    "Welcome! This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) system using OpenAI’s Vision and Responses APIs. It focuses on multimodal data, specifically, combining image and text inputs to analyze customer experiences. The system leverages GPT-4.1 and integrates image understanding with file search to provide context-aware responses.\n",
+    "Welcome! This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) system using OpenAI’s Vision and Responses APIs. It focuses on multimodal data, combining image and text inputs to analyze customer experiences. The system leverages GPT-4.1 and integrates image understanding with file search to provide context-aware responses.\n",
     "\n",
     "Multimodal datasets are increasingly common, particularly in domains like healthcare, where records often contain both visual data (e.g. radiology scans) and accompanying text (e.g. clinical notes). Real-world datasets also tend to be noisy, with incomplete or missing information, making it critical to analyze multiple modalities in tandem.\n",
     "\n",
-    "This guide focuses on a customer service use case: evaluating customer feedback that may include screenshots, photos, and written complaints. You’ll learn how to synthetically generate both image and text inputs, use file search for context retrieval, and apply the Evals API to assess how incorporating image understanding impacts overall performance.\n",
+    "This guide focuses on a customer service use case: evaluating customer feedback that may include photos, and written reviews. You’ll learn how to synthetically generate both image and text inputs, use file search for context retrieval, and apply the Evals API to assess how incorporating image understanding impacts overall performance.\n",
     "\n",
     "---\n",
     "\n",
@@ -251,7 +251,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This example uses OpenAI's built-in vector store and file search capabilities to build a RAG system that can analyse customer experiences, from their feedback which can be both visual and text-based. We create two vector stores for comparisons, one with image understanding and one without."
+    "This example uses OpenAI's built-in vector store and file search capabilities to build a RAG system that can analyse customer experiences from their feedback, which can be both visual and text-based. We create two vector stores for comparisons, one with image understanding and one without."
    ]
   },
   {
@@ -323,7 +323,7 @@
    "outputs": [],
    "source": [
     "upload_files_to_vector_store(text_image_vector_store_id, df)\n",
-    "upload_files_to_vector_store(text_vector_store_id, df, column_name=\"text\") "
+    "upload_files_to_vector_store(text_vector_store_id, df, column_name=\"text\")"
    ]
   },
   {
@@ -332,14 +332,9 @@
    "source": [
     "# Retrieval and Filtering\n",
     "\n",
-    "We can analyse our dataset with natural language queries with the help of File Search. For the text-only dataset, we see that information is missing that could inform our analysis.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The only positive review for spaghetti in July has visual feedback and we can see the RAG system with only text based context available is uncertain about positive details. However with image context provided the second RAG system is able to provide a more accurate response."
+    "We can analyse our dataset with natural language queries with the help of File Search. For the text-only dataset, we see that information is missing that could inform our analysis.\n",
+    "\n",
+    "The only positive review for spaghetti in July has visual feedback and we can see the RAG system with only text based context available is uncertain about positive details. However with image context provided the second RAG system is able to provide a more accurate response.\n"
    ]
   },
   {
@@ -461,7 +456,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Likewise we can test this for negative reviews in June."
+    "Likewise we can test this for negative reviews in June concerning any burnt pizza."
    ]
   },
   {
@@ -665,6 +660,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "# you may need to wait a few seconds before running this cell for the eval runs to finish up\n",
+    "\n",
     "text_only_run_output_items = client.evals.runs.output_items.list(eval_id=eval_id, run_id=text_only_run_id)\n",
     "text_image_run_output_items = client.evals.runs.output_items.list(eval_id=eval_id, run_id=text_image_run_id)"
    ]
@@ -772,13 +769,6 @@
     ")\n",
     "print(deleted_vector_store)"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {