|
6 | 6 | "source": [
|
7 | 7 | "# Evals API: Image Inputs\n",
|
8 | 8 | "\n",
|
9 |
| - "OpenAI’s Evals API now supports image inputs, in its step toward multimodal functionality! API users can use OpenAI's Evals API to evaluate their image use cases to see how their LLM integration is performing and improve it.\n", |
10 |
| - "\n", |
11 |
| - "In this cookbook, we'll walk through an image example with the Evals API. More specifically, we will use Evals API to evaluate model-generated responses to an image and its corresponding prompt, using **sampling** to generate model responses and **model grading** (LLM as a Judge) to score those model responses against the image and reference answer.\n", |
12 |
| - "\n", |
13 |
| - "Based on your use case, you might only need the sampling functionality or the model grader, and you can revise what you pass in during the eval and run creation to fit your needs. " |
| 9 | + "In this cookbook, we will use Evals API to grade model-generated responses to an image and prompt, using **sampling** to generate model responses and **model grading** (LLM as a Judge) to score the model responses against the image, prompt, and reference answer." |
14 | 10 | ]
|
15 | 11 | },
|
16 | 12 | {
|
|
180 | 176 | },
|
181 | 177 | {
|
182 | 178 | "cell_type": "code",
|
183 |
| - "execution_count": 22, |
| 179 | + "execution_count": null, |
184 | 180 | "metadata": {},
|
185 | 181 | "outputs": [],
|
186 | 182 | "source": [
|
187 | 183 | "from openai import OpenAI\n",
|
188 | 184 | "import os\n",
|
189 | 185 | "\n",
|
190 | 186 | "client = OpenAI(\n",
|
191 |
| - " api_key=os.getenv(\"OPENAI_API_KEY\"),\n", |
192 |
| - " base_url=\"https://api.openai.com/v1\",\n", |
| 187 | + " api_key=os.getenv(\"OPENAI_API_KEY\")\n", |
193 | 188 | ")"
|
194 | 189 | ]
|
195 | 190 | },
|
|
289 | 284 | "cell_type": "markdown",
|
290 | 285 | "metadata": {},
|
291 | 286 | "source": [
|
292 |
| - "To create the run, we pass in the eval object id and the data source (i.e., the data we compiled earlier) in addition to the chat message trajectory we'd like for sampling to get the model response. While we won't dive into it in this cookbook, EvalsAPI also supports stored completions containing images as a data source. \n", |
| 287 | + "To create the run, we pass in the eval object id and the data source (i.e., the data we compiled earlier) in addition to the chat message input we'd like for sampling to get the model response. While we won't dive into it in this cookbook, EvalsAPI also supports stored completions containing images as a data source. \n", |
293 | 288 | "\n",
|
294 |
| - "Here's the sampling message trajectory we'll use for this example." |
| 289 | + "Here's the sampling message input we'll use for this example." |
295 | 290 | ]
|
296 | 291 | },
|
297 | 292 | {
|
|
0 commit comments