Skip to content

Commit e4b3cc9

Browse files
authored
fix error in evals example (#852)
1 parent ef522e8 commit e4b3cc9

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

docs/testing-evals.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -400,6 +400,7 @@ Now we want a way to quantify the success of the SQL generation so we can judge
400400
We can use [`Agent.override`][pydantic_ai.agent.Agent.override] to replace the system prompt with a custom one that uses a subset of examples, and then run the application code (in this case `user_search`). We also run the actual SQL from the examples and compare the "correct" result from the example SQL to the SQL generated by the agent. (We compare the results of running the SQL rather than the SQL itself since the SQL might be semantically equivalent but written in a different way).
401401

402402
To get a quantitative measure of performance, we assign points to each run as follows:
403+
403404
* **-100** points if the generated SQL is invalid
404405
* **-1** point for each row returned by the agent (so returning lots of results is discouraged)
405406
* **+5** points for each row returned by the agent that matches the expected result
@@ -426,7 +427,7 @@ async def main():
426427
conn = DatabaseConn()
427428
scores = []
428429

429-
for i, fold in enumerate(folds, start=1):
430+
for i, fold in enumerate(folds):
430431
fold_score = 0
431432
# build all other folds into a list of examples
432433
other_folds = list(chain(*(f for j, f in enumerate(folds) if j != i)))

0 commit comments

Comments
 (0)