Skip to content

Commit 97b441a

Browse files
committed
Add details to probabilistic metrics
1 parent 540df87 commit 97b441a

File tree

2 files changed

+47
-103
lines changed

2 files changed

+47
-103
lines changed

docs/src/content/docs/metrics/_custom.md

Lines changed: 0 additions & 100 deletions
This file was deleted.

docs/src/content/docs/metrics/probabilistic_metrics.md

Lines changed: 47 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,50 @@ sidebar:
66

77
Probabilistic LLM metrics are LLM-as-a-Judge metrics that provide score distributions with their associated confidence levels, enabling assessment of model certainty in its evaluations. These distributions are derived from the model's token-level log probabilities.
88

9-
## Define a Probabilistic Metric
9+
## Custom Probabilistic Metric
10+
11+
Similar to the custom LLM-as-a-Judge metric, you can define your own probabilistic metric by extending the `ProbabilisticCustomMetric` class.
12+
13+
```python
14+
from continuous_eval.metrics.base.metric import Arg
15+
from continuous_eval.metrics.base.response_type import Integer
16+
from continuous_eval.metrics.custom import ProbabilisticCustomMetric
17+
18+
rubric = """1: The joke is not funny or inappropriate.
19+
2: The joke is somewhat funny and appropriate.
20+
3: The joke is very funny and appropriate."""
21+
22+
metric = ProbabilisticCustomMetric(
23+
name="FunnyJoke",
24+
criteria="Joke is funny and appropriate",
25+
rubric=rubric,
26+
arguments={"joke": Arg(type=str, description="The joke to evaluate.")},
27+
response_format=Integer(ge=1, le=3),
28+
)
29+
30+
print(metric(
31+
joke="""Scientists released a new way to measure AI performance.
32+
It's so accurate, even the AI said, ‘Finally, someone understands me!’"""
33+
))
34+
```
35+
36+
Optionally, you can also add examples to the metric.
37+
38+
> Note: See the [limitations section](#current-limitations) for more information about the response format.
39+
40+
#### Example Output
41+
42+
```py
43+
{
44+
'FunnyJoke_score': 3,
45+
'FunnyJoke_reasoning': 'The joke is clever as it plays on the idea of AI having feelings.',
46+
'FunnyJoke_probabilities': {1: 0.0, 2: 4.22e-06, 3: 0.99}
47+
}
48+
```
49+
50+
## Define a new Probabilistic Metric
51+
52+
Sometimes the criteria, rubric and examples are not enough to define the metric. In this case, you can define your own probabilistic metric by extending the `ProbabilisticMetric` class.
1053

1154
### Classification
1255

@@ -119,5 +162,6 @@ print({"weighted_score": metric.prompt.response_format.weighted_score(result['Se
119162

120163
## Current limitations
121164

122-
1. The `response_format` must be a _single token value_, we predefined a few: `GoodOrBad`, `YesOrNo`, `Boolean` and `Integer`, but it is possible to define your own. In case of integer scoring, negative values are not supported (they are two tokens) as well as values greater than 9.
123-
2. Only OpenAI models are supported for probabilistic metrics.
165+
1. The `response_format` must be a **single token value**, we predefined a few: `GoodOrBad`, `YesOrNo`, `Boolean` and `Integer`, but it is possible to define your own. In case of integer scoring, negative values are not supported (they are two tokens) as well as values greater than 9.
166+
2. Arbitrary JSON format is not supported yet for probabilistic metrics.
167+
3. At the moment, **only OpenAI models are supported for probabilistic metrics**.

0 commit comments

Comments
 (0)