Skip to content

Commit d093a1f

Browse files
committed
Cleanup
1 parent d1bfe0e commit d093a1f

File tree

1 file changed

+0
-4
lines changed

1 file changed

+0
-4
lines changed

examples/evaluation/Building_resilient_prompts_using_an_evaluation_flywheel.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,6 @@ It answers questions from prospective renters, such as:
4848

4949
Suppose we have a specific prompt within our application that we’d like to analyze. We can get started in the OpenAI Platform by adding in our prompt and uploading our input and output data to our Dataset (learn more about how to do this in [our docs](platform.openai.com/docs/evaluations-getting-started)).
5050

51-
<!-- TODO: insert image -->
5251
![Leasing agent data](/images/dataset.png)
5352

5453
With our prompt and traces loaded in, we’re ready to analyze prompt effectiveness.
@@ -76,7 +75,6 @@ For our apartment leasing assistant, our initial open codes might look like this
7675

7776
These specific, grounded-in-data labels become the raw material for the next step.
7877

79-
<!-- TODO: insert image -->
8078
![Open coding](/images/open-coding.png)
8179

8280
Here's our dataset after open coding.
@@ -113,8 +111,6 @@ Our formatting grader is a fairly straightforward directive.
113111
Our availability accuracy grader will reference additional input columns we’ve added to our dataset to capture business hours and day availability.
114112
![Creating availability grader](/images/creating-availability-grader.png)
115113
![Ground truth columns](/images/ground-truth-columns.png)
116-
<!-- TODO: insert image -->
117-
<!-- TODO: insert image -->
118114

119115
With automated graders in place, we can easily evaluate our performance on any change to our system — an updated prompt, updated model parameters, or newly discovered edge cases.
120116

0 commit comments

Comments
 (0)