fix error in evals example (#852)

samuelcolvin · web-flow · commit e4b3cc9393ef · 2025-02-05T12:01:45.000-05:00
diff --git a/docs/testing-evals.md b/docs/testing-evals.md
@@ -400,6 +400,7 @@ Now we want a way to quantify the success of the SQL generation so we can judge
 We can use [`Agent.override`][pydantic_ai.agent.Agent.override] to replace the system prompt with a custom one that uses a subset of examples, and then run the application code (in this case `user_search`). We also run the actual SQL from the examples and compare the "correct" result from the example SQL to the SQL generated by the agent. (We compare the results of running the SQL rather than the SQL itself since the SQL might be semantically equivalent but written in a different way).
 
 To get a quantitative measure of performance, we assign points to each run as follows:
+
 * **-100** points if the generated SQL is invalid
 * **-1** point for each row returned by the agent (so returning lots of results is discouraged)
 * **+5** points for each row returned by the agent that matches the expected result
@@ -426,7 +427,7 @@ async def main():
     conn = DatabaseConn()
     scores = []
 
-    for i, fold in enumerate(folds, start=1):
+    for i, fold in enumerate(folds):
         fold_score = 0
         # build all other folds into a list of examples
         other_folds = list(chain(*(f for j, f in enumerate(folds) if j != i)))