You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+10-1Lines changed: 10 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
8
8
## [Unreleased]
9
9
10
+
## [1.1.23] - 2025-08-06
11
+
12
+
### Changed
13
+
14
+
- Updated `TLMOptions` to support `disable_trustworthiness` parameter
15
+
- Skips trustworthiness scoring when `disable_trustworthiness` is True, assuming either custom evaluation criteria (TLM) or RAG Evals (TrustworthyRAG) are provided
16
+
17
+
10
18
## [1.1.22] - 2025-07-29
11
19
12
20
### Added
@@ -291,7 +299,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
Copy file name to clipboardExpand all lines: src/cleanlab_tlm/tlm.py
+8Lines changed: 8 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -613,12 +613,14 @@ class TLMOptions(TypedDict):
613
613
num_self_reflections (int, default = 3): the number of different evaluations to perform where the LLM reflects on the response, a factor affecting trust scoring.
614
614
The maximum number currently supported is 3. Lower values can reduce runtimes.
615
615
Reflection helps quantify aleatoric uncertainty associated with challenging prompts and catches responses that are noticeably incorrect/bad upon further analysis.
616
+
This parameter has no effect when `disable_trustworthiness` is True.
616
617
617
618
num_consistency_samples (int, default = 8): the amount of internal sampling to measure LLM response consistency, a factor affecting trust scoring.
618
619
Must be between 0 and 20. Lower values can reduce runtimes.
619
620
Measuring consistency helps quantify the epistemic uncertainty associated with
620
621
strange prompts or prompts that are too vague/open-ended to receive a clearly defined 'good' response.
621
622
TLM measures consistency via the degree of contradiction between sampled responses that the model considers plausible.
623
+
This parameter has no effect when `disable_trustworthiness` is True.
622
624
623
625
similarity_measure ({"semantic", "string", "embedding", "embedding_large", "code", "discrepancy"}, default = "discrepancy"): how the
624
626
trustworthiness scoring's consistency algorithm measures similarity between alternative responses considered plausible by the model.
@@ -633,6 +635,11 @@ class TLMOptions(TypedDict):
633
635
You can auto-improve responses by increasing this parameter, but at higher runtimes/costs.
634
636
This parameter must be between 1 and 20. It has no effect on `TLM.score()`.
635
637
When this parameter is 1, `TLM.prompt()` simply returns a standard LLM response and does not attempt to auto-improve it.
638
+
This parameter has no effect when `disable_trustworthiness` is True.
639
+
640
+
disable_trustworthiness (bool, default = False): if True, trustworthiness scoring is disabled and TLM will not compute trust scores for responses.
641
+
This is useful when you only want to use custom evaluation criteria or when you want to minimize computational overhead and only need the base LLM response.
642
+
The following parameters will be ignored when `disable_trustworthiness` is True: `num_consistency_samples`, `num_self_reflections`, `num_candidate_responses`, `reasoning_effort`, `similarity_measure`.
options ([TLMOptions](../tlm/#class-tlmoptions), optional): a typed dict of advanced configurations you can optionally specify.
77
78
The "custom_eval_criteria" key for [TLM](../tlm/#class-tlm) is not supported for `TrustworthyRAG`, you can instead specify `evals`.
79
+
The "disable_trustworthiness" key is only supported for `TrustworthyRAG` when it's set to run `Evals`. See the `evals` argument description below for how evaluations are determined.
78
80
79
81
timeout (float, optional): timeout (in seconds) to apply to each request.
0 commit comments