|
112 | 112 | "source": [
|
113 | 113 | "## Project Lifecycle\n",
|
114 | 114 | "\n",
|
115 |
| - "Not every project will proceed in the same way, but projects generally have some common\n", |
116 |
| - "important components.\n", |
| 115 | + "Not every project will proceed in the same way, but projects generally have some \n", |
| 116 | + "important components in common.\n", |
| 117 | + "\n", |
| 118 | + "\n", |
| 119 | + "\n", |
| 120 | + "The solid arrows show the primary progressions or steps, while the dotted line \n", |
| 121 | + "represents the ongoing nature of problem understanding - uncovering more about\n", |
| 122 | + "the customer domain will influence every step of the process. We wil examine \n", |
| 123 | + "several of these iterative cycles of refinement in detail below. \n", |
117 | 124 | "\n",
|
118 | 125 | "### 1. Understand the Problem\n",
|
119 | 126 | "\n",
|
|
133 | 140 | "It's very rare that a real-world project will start with all the data necessary to get\n",
|
134 | 141 | "to a satisfactory solution, much less to establish confidence.\n",
|
135 | 142 | "\n",
|
136 |
| - "In our case, we're going to assume that we have a decent sample of system *inputs*\n", |
137 |
| - "(here, photographs of receipts), but start without any fully annotated data. We'll walk\n", |
138 |
| - "through the process of incrementally expanding our test and training sets as we go along\n", |
139 |
| - "and make our evals progressively more comprehensive.\n", |
| 143 | + "In our case, we're going to assume that we have a decent sample of system *inputs*, \n", |
| 144 | + "in the form of but receipt images, but start without any fully annotated data. We find \n", |
| 145 | + "this is a not-unusual situation when automating an existing process. Instead, \n", |
| 146 | + "we'll walk through the process of building that out as we go along by collaborating with\n", |
| 147 | + "domain experts, and make our evals progressively more comprehensive.\n", |
140 | 148 | "\n",
|
141 | 149 | "### 3. Build an End-to-End V0 System\n",
|
142 | 150 | "\n",
|
|
394 | 402 | "cell_type": "markdown",
|
395 | 403 | "metadata": {},
|
396 | 404 | "source": [
|
397 |
| - "<img src=\"../../../images/Supplies_20240322_220858_Raven_Scan_3_jpeg.rf.50852940734939c8838819d7795e1756.jpg\" alt=\"Walmart_image\" width=\"400\"/>" |
| 405 | + "" |
398 | 406 | ]
|
399 | 407 | },
|
400 | 408 | {
|
|
497 | 505 | "source": [
|
498 | 506 | "### Action Decision\n",
|
499 | 507 | "\n",
|
500 |
| - "Next, we need to close the loop and get to an actual decision based on receipts. This\n", |
501 |
| - "looks pretty similar, so we'll present the code without comment." |
| 508 | + "Next, we need to close the loop and get to an actual decision based on receipts. \n", |
| 509 | + "\n", |
| 510 | + "Ordinarily one would start with the most capable model - `o3`, at this time - for a \n", |
| 511 | + "first pass, and then once correctness is established experiment with different models\n", |
| 512 | + "to analyze any tradeoffs for their business impact, and potentially consider whether \n", |
| 513 | + "they are remediable with iteration. A client may be willing to take a certain accuracy \n", |
| 514 | + "hit for lower latency or cost, or it may be more effective to change the architecture\n", |
| 515 | + "to hit cost, latency, and accuracy goals. We'll get into how to make these tradeoffs\n", |
| 516 | + "explicitly and objectively later on. \n", |
| 517 | + "\n", |
| 518 | + "For this cookbook, `o3` might be too good. We'll use `o4-mini` for our first pass, so \n", |
| 519 | + "that we get a few reasoning errors we can use to illustrate the means of addressing\n", |
| 520 | + "them when they occur.\n", |
| 521 | + "\n", |
| 522 | + "Otherwise, this is pretty similar to the last, so we'll present the code without \n", |
| 523 | + "further comment." |
502 | 524 | ]
|
503 | 525 | },
|
504 | 526 | {
|
|
887 | 909 | "metadata": {},
|
888 | 910 | "source": [
|
889 | 911 | "After you run that eval you'll be able to view it in the UI, and should see something\n",
|
890 |
| - "like:\n", |
| 912 | + "like the below. \n", |
| 913 | + "\n", |
| 914 | + "(Note, if you have a Zero-Data-Retention agreement, this data is not stored\n", |
| 915 | + "by OpenAI, so will not be available in this interface.)\n", |
891 | 916 | "\n",
|
892 | 917 | "\n",
|
893 | 918 | "\n",
|
|
1617 | 1642 | "ARE NOT TRAVEL-RELATED, THEN IT MUST BE AUDITED.\n",
|
1618 | 1643 | "```\n",
|
1619 | 1644 | "\n",
|
1620 |
| - "3. We added three examples, JSON input/output pairs wrapped in XML tags.\n", |
| 1645 | + "4. We added three examples, JSON input/output pairs wrapped in XML tags.\n", |
1621 | 1646 | "\n",
|
1622 | 1647 | "With our prompt revisions, we'll regenerate the data to evaluate and re-run the same\n",
|
1623 | 1648 | "eval to compare our results:"
|
|
0 commit comments