Skip to content

Commit 34135e1

Browse files
committed
docs(cpt): Add description about the reasoning behind the default judge threshold
1 parent caac0be commit 34135e1

File tree

2 files changed

+6
-1
lines changed

2 files changed

+6
-1
lines changed

docs/apis-tools/testing/configuration.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -889,6 +889,11 @@ Unless noted otherwise, properties in the provider tables are required.
889889
| `judge.threshold` | `double` | `0.5` | Confidence threshold (0.0 to 1.0) for the judge to pass. |
890890
| `judge.custom-prompt` | `string` | | Custom evaluation prompt replacing the default criteria. |
891891

892+
The default threshold of `0.5` treats a response as acceptable when it is at least partially satisfied according to the
893+
judge rubric. This is a practical default for AI-generated output, where wording and level of detail may vary between
894+
runs even when the response is still useful. Increase the threshold when your assertion needs stricter semantic
895+
agreement.
896+
892897
#### Chat model settings
893898

894899
<Tabs groupId="provider" defaultValue="openai" queryString values={[

docs/apis-tools/testing/testing-agentic-processes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@ The judge evaluates matches using the following scoring scale:
174174
| 0.25 | Mostly not satisfied. Only marginal relevance. |
175175
| 0.0 | Not satisfied at all, or the actual value is empty. |
176176

177-
The LLM may return any value between these anchor points (for example, 0.6 or 0.85). The default threshold is 0.5. You can change it globally in the [judge configuration](configuration.md#judge-configuration) or per assertion using `withJudgeConfig`.
177+
The LLM may return any value between these anchor points (for example, 0.6 or 0.85). The default threshold is 0.5. This means the assertion passes when the response is at least partially satisfied according to the rubric, which is a practical default for AI-generated output that may vary in wording or completeness across runs. Use a higher threshold when the response must satisfy stricter semantic requirements. You can change the threshold globally in the [judge configuration](configuration.md#judge-configuration) or per assertion using `withJudgeConfig`.
178178

179179
### Set up an LLM provider
180180

0 commit comments

Comments
 (0)