eval driven system design cookbook updates merge

shikhar-cyber · shikhar-cyber · commit 45c09f13d409 · 2025-06-03T09:41:57.000-07:00
diff --git a/examples/partners/eval_driven_system_design/receipt_inspection.ipynb b/examples/partners/eval_driven_system_design/receipt_inspection.ipynb
@@ -112,6 +112,7 @@
    "source": [
     "## Project Lifecycle\n",
     "\n",
+<<<<<<< HEAD
     "Not every project will proceed in the same way, but projects generally have some \n",
     "important components in common.\n",
     "\n",
@@ -121,6 +122,10 @@
     "represents the ongoing nature of problem understanding - uncovering more about\n",
     "the customer domain will influence every step of the process. We wil examine \n",
     "several of these iterative cycles of refinement in detail below. \n",
+=======
+    "Not every project will proceed in the same way, but projects generally have some common\n",
+    "important components.\n",
+>>>>>>> origin/main
     "\n",
     "### 1. Understand the Problem\n",
     "\n",
@@ -140,11 +145,18 @@
     "It's very rare that a real-world project will start with all the data necessary to get\n",
     "to a satisfactory solution, much less to establish confidence.\n",
     "\n",
+<<<<<<< HEAD
     "In our case, we're going to assume that we have a decent sample of system *inputs*, \n",
     "in the form of but receipt images, but start without any fully annotated data. We find \n",
     "this is a not-unusual situation when automating an existing process. Instead, \n",
     "we'll walk through the process of building that out as we go along by collaborating with\n",
     "domain experts, and make our evals progressively more comprehensive.\n",
+=======
+    "In our case, we're going to assume that we have a decent sample of system *inputs*\n",
+    "(here, photographs of receipts), but start without any fully annotated data. We'll walk\n",
+    "through the process of incrementally expanding our test and training sets as we go along\n",
+    "and make our evals progressively more comprehensive.\n",
+>>>>>>> origin/main
     "\n",
     "### 3. Build an End-to-End V0 System\n",
     "\n",
@@ -402,7 +414,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+<<<<<<< HEAD
     "![Walmart_image](../../../images/Supplies_20240322_220858_Raven_Scan_3_jpeg.rf.50852940734939c8838819d7795e1756.jpg)"
+=======
+    "<img src=\"../../../images/Supplies_20240322_220858_Raven_Scan_3_jpeg.rf.50852940734939c8838819d7795e1756.jpg\" alt=\"Walmart_image\" width=\"400\"/>"
+>>>>>>> origin/main
    ]
   },
   {
@@ -505,6 +521,7 @@
    "source": [
     "### Action Decision\n",
     "\n",
+<<<<<<< HEAD
     "Next, we need to close the loop and get to an actual decision based on receipts. \n",
     "\n",
     "Ordinarily one would start with the most capable model - `o3`, at this time - for a \n",
@@ -521,6 +538,10 @@
     "\n",
     "Otherwise, this is pretty similar to the last, so we'll present the code without \n",
     "further comment."
+=======
+    "Next, we need to close the loop and get to an actual decision based on receipts. This\n",
+    "looks pretty similar, so we'll present the code without comment."
+>>>>>>> origin/main
    ]
   },
   {
@@ -909,10 +930,14 @@
    "metadata": {},
    "source": [
     "After you run that eval you'll be able to view it in the UI, and should see something\n",
+<<<<<<< HEAD
     "like the below. \n",
     "\n",
     "(Note, if you have a Zero-Data-Retention agreement, this data is not stored\n",
     "by OpenAI, so will not be available in this interface.)\n",
+=======
+    "like:\n",
+>>>>>>> origin/main
     "\n",
     "![Summary UI](../../../images/partner_summary_ui.png)\n",
     "\n",
@@ -1642,7 +1667,11 @@
     "ARE NOT TRAVEL-RELATED, THEN IT MUST BE AUDITED.\n",
     "```\n",
     "\n",
+<<<<<<< HEAD
     "4. We added three examples, JSON input/output pairs wrapped in XML tags.\n",
+=======
+    "3. We added three examples, JSON input/output pairs wrapped in XML tags.\n",
+>>>>>>> origin/main
     "\n",
     "With our prompt revisions, we'll regenerate the data to evaluate and re-run the same\n",
     "eval to compare our results:"
diff --git a/registry.yaml b/registry.yaml
@@ -9,13 +9,18 @@
   date: 2025-06-01
   authors:
     - shikhar-cyber
+<<<<<<< HEAD
     - moredatarequired
     - tooluser
     - eddiesiegel
   tags:
     - evals
     - API Flywheel
     - completions
+=======
+  tags:
+    - evals
+>>>>>>> origin/main
     - responses
     - functions
     - tracing