docs(cpt): Add description about the reasoning behind the default judge threshold

nikonovd · nikonovd · commit 34135e107c25 · 2026-03-31T12:43:34.000+02:00
diff --git a/docs/apis-tools/testing/configuration.md b/docs/apis-tools/testing/configuration.md
@@ -889,6 +889,11 @@ Unless noted otherwise, properties in the provider tables are required.
 | `judge.threshold`     | `double` | `0.5`   | Confidence threshold (0.0 to 1.0) for the judge to pass. |
 | `judge.custom-prompt` | `string` |         | Custom evaluation prompt replacing the default criteria. |
 
+The default threshold of `0.5` treats a response as acceptable when it is at least partially satisfied according to the
+judge rubric. This is a practical default for AI-generated output, where wording and level of detail may vary between
+runs even when the response is still useful. Increase the threshold when your assertion needs stricter semantic
+agreement.
+
 #### Chat model settings
 
 <Tabs groupId="provider" defaultValue="openai" queryString values={[
diff --git a/docs/apis-tools/testing/testing-agentic-processes.md b/docs/apis-tools/testing/testing-agentic-processes.md
@@ -174,7 +174,7 @@ The judge evaluates matches using the following scoring scale:
 | 0.25  | Mostly not satisfied. Only marginal relevance.                                                                         |
 | 0.0   | Not satisfied at all, or the actual value is empty.                                                                    |
 
-The LLM may return any value between these anchor points (for example, 0.6 or 0.85). The default threshold is 0.5. You can change it globally in the [judge configuration](configuration.md#judge-configuration) or per assertion using `withJudgeConfig`.
+The LLM may return any value between these anchor points (for example, 0.6 or 0.85). The default threshold is 0.5. This means the assertion passes when the response is at least partially satisfied according to the rubric, which is a practical default for AI-generated output that may vary in wording or completeness across runs. Use a higher threshold when the response must satisfy stricter semantic requirements. You can change the threshold globally in the [judge configuration](configuration.md#judge-configuration) or per assertion using `withJudgeConfig`.
 
 ### Set up an LLM provider