Skip to content

Commit 9ac6a92

Browse files
authored
feat: Add Hallucination Risk Calculator to OpenAIChatGenerator (#359)
* Initial commit for adding hallucination calculation * Fixes * Some cleaning * Use our OpenAIChatGenerator instead of their OpenAI backend * Refactoring, remove unneeded parts * More refactoring * Slight updates * Add MIT License * Update license headers * Create config object * Formatting * Ignore license headers of hallucination risk calculator * Refactoring * License header and reformatting * Small update * Formatting * Linting * Fix typing issues * PR comments * Remove example script to move to cookbook repo * Update readme * Add pydocs * Add docstrings and integration test * Fix docstring
1 parent 14e9ba8 commit 9ac6a92

File tree

14 files changed

+1011
-7
lines changed

14 files changed

+1011
-7
lines changed

LICENSE-MIT.txt

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
MIT License
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy
4+
of this software and associated documentation files (the "Software"), to deal
5+
in the Software without restriction, including without limitation the rights
6+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7+
copies of the Software, and to permit persons to whom the Software is
8+
furnished to do so, subject to the following conditions:
9+
10+
The above copyright notice and this permission notice shall be included in all
11+
copies or substantial portions of the Software.
12+
13+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19+
SOFTWARE.
20+

README.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,14 @@ that includes it. Once it reaches the end of its lifespan, the experiment will b
4141

4242
### Active experiments
4343

44-
| Name | Type | Expected End Date | Dependencies | Cookbook | Discussion |
45-
|---------------------------------------|------------------------------------|-------------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
46-
| [`InMemoryChatMessageStore`][1] | Memory Store | December 2024 | None | <a href="https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/conversational_rag_using_memory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> | [Discuss][4] |
47-
| [`ChatMessageRetriever`][2] | Memory Component | December 2024 | None | <a href="https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/conversational_rag_using_memory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> | [Discuss][4] |
48-
| [`ChatMessageWriter`][3] | Memory Component | December 2024 | None | <a href="https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/conversational_rag_using_memory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> | [Discuss][4] |
49-
| [`QueryExpander`][5] | Query Expansion Component | October 2025 | None | None | [Discuss][6] |
50-
| [`EmbeddingBasedDocumentSplitter`][8] | EmbeddingBasedDocumentSplitter | August 2025 | None | None | [Discuss][7] |
44+
| Name | Type | Expected End Date | Dependencies | Cookbook | Discussion |
45+
|---------------------------------------|--------------------------------|-------------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
46+
| [`InMemoryChatMessageStore`][1] | Memory Store | December 2024 | None | <a href="https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/conversational_rag_using_memory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> | [Discuss][4] |
47+
| [`ChatMessageRetriever`][2] | Memory Component | December 2024 | None | <a href="https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/conversational_rag_using_memory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> | [Discuss][4] |
48+
| [`ChatMessageWriter`][3] | Memory Component | December 2024 | None | <a href="https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/conversational_rag_using_memory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> | [Discuss][4] |
49+
| [`QueryExpander`][5] | Query Expansion Component | October 2025 | None | None | [Discuss][6] |
50+
| [`EmbeddingBasedDocumentSplitter`][8] | EmbeddingBasedDocumentSplitter | August 2025 | None | None | [Discuss][7] |
51+
| [`OpenAIChatGenerator`][9] | Chat Generator Component | November 2025 | None | None | [Discuss][10] |
5152

5253
[1]: https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/chat_message_stores/in_memory.py
5354
[2]: https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/components/retrievers/chat_message_retriever.py
@@ -57,6 +58,8 @@ that includes it. Once it reaches the end of its lifespan, the experiment will b
5758
[6]: https://github.com/deepset-ai/haystack-experimental/discussions/346
5859
[7]: https://github.com/deepset-ai/haystack-experimental/discussions/356
5960
[8]: https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/components/preprocessors/embedding_based_document_splitter.py
61+
[9]: https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/components/generators/chat/openai.py
62+
[10]: https://github.com/deepset-ai/haystack-experimental/discussions/XXX
6063

6164
### Adopted experiments
6265
| Name | Type | Final release |
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
loaders:
2+
- type: haystack_pydoc_tools.loaders.CustomPythonLoader
3+
search_path: [../../../]
4+
modules:
5+
[
6+
"haystack_experimental.components.generators.chat.openai",
7+
]
8+
ignore_when_discovered: ["__init__"]
9+
processors:
10+
- type: filter
11+
expression:
12+
documented_only: true
13+
do_not_filter_modules: false
14+
skip_empty_modules: true
15+
- type: smart
16+
- type: crossref
17+
renderer:
18+
type: haystack_pydoc_tools.renderers.ReadmeCoreRenderer
19+
excerpt: Enables text generation using LLMs.
20+
category_slug: experiments-api
21+
title: Generators
22+
slug: experimental-generators-api
23+
order: 42
24+
markdown:
25+
descriptive_class_title: false
26+
classdef_code_block: false
27+
descriptive_module_title: true
28+
add_method_class_prefix: true
29+
add_member_class_prefix: false
30+
filename: experimental_generators_api.md
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# SPDX-FileCopyrightText: 2022-present deepset GmbH <[email protected]>
2+
#
3+
# SPDX-License-Identifier: Apache-2.0
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# SPDX-FileCopyrightText: 2022-present deepset GmbH <[email protected]>
2+
#
3+
# SPDX-License-Identifier: Apache-2.0
4+
5+
import sys
6+
from typing import TYPE_CHECKING
7+
8+
from lazy_imports import LazyImporter
9+
10+
_import_structure = {
11+
"openai": ["OpenAIChatGenerator"],
12+
}
13+
14+
if TYPE_CHECKING:
15+
from .openai import OpenAIChatGenerator as OpenAIChatGenerator
16+
17+
else:
18+
sys.modules[__name__] = LazyImporter(name=__name__, module_file=__file__, import_structure=_import_structure)
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
# SPDX-FileCopyrightText: 2022-present deepset GmbH <[email protected]>
2+
#
3+
# SPDX-License-Identifier: Apache-2.0
4+
5+
from dataclasses import replace
6+
from typing import Any, Optional, Union
7+
8+
from haystack import component
9+
from haystack.components.generators.chat.openai import OpenAIChatGenerator as BaseOpenAIChatGenerator
10+
from haystack.dataclasses import ChatMessage, StreamingCallbackT
11+
from haystack.tools import Tool, Toolset
12+
13+
from haystack_experimental.utils.hallucination_risk_calculator.dataclasses import HallucinationScoreConfig
14+
from haystack_experimental.utils.hallucination_risk_calculator.openai_planner import calculate_hallucination_metrics
15+
16+
17+
@component
18+
class OpenAIChatGenerator(BaseOpenAIChatGenerator):
19+
"""
20+
An OpenAI chat-based text generator component that supports hallucination risk scoring.
21+
22+
This is based on the paper
23+
[LLMs are Bayesian, in Expectation, not in Realization](https://arxiv.org/abs/2507.11768).
24+
25+
## Usage Example:
26+
27+
```python
28+
from haystack.dataclasses import ChatMessage
29+
30+
from haystack_experimental.utils.hallucination_risk_calculator.dataclasses import HallucinationScoreConfig
31+
from haystack_experimental.components.generators.chat.openai import OpenAIChatGenerator
32+
33+
# Evidence-based Example
34+
llm = OpenAIChatGenerator(model="gpt-4o")
35+
rag_result = llm.run(
36+
messages=[
37+
ChatMessage.from_user(
38+
text="Task: Answer strictly based on the evidence provided below.\n"
39+
"Question: Who won the Nobel Prize in Physics in 2019?\n"
40+
"Evidence:\n"
41+
"- Nobel Prize press release (2019): James Peebles (1/2); Michel Mayor & Didier Queloz (1/2).\n"
42+
"Constraints: If evidence is insufficient or conflicting, refuse."
43+
)
44+
],
45+
hallucination_score_config=HallucinationScoreConfig(skeleton_policy="evidence_erase"),
46+
)
47+
print(f"Decision: {rag_result['replies'][0].meta['hallucination_decision']}")
48+
print(f"Risk bound: {rag_result['replies'][0].meta['hallucination_risk']:.3f}")
49+
print(f"Rationale: {rag_result['replies'][0].meta['hallucination_rationale']}")
50+
print(f"Answer:\n{rag_result['replies'][0].text}")
51+
print("---")
52+
```
53+
"""
54+
55+
@component.output_types(replies=list[ChatMessage])
56+
def run(
57+
self,
58+
messages: list[ChatMessage],
59+
streaming_callback: Optional[StreamingCallbackT] = None,
60+
generation_kwargs: Optional[dict[str, Any]] = None,
61+
*,
62+
tools: Optional[Union[list[Tool], Toolset]] = None,
63+
tools_strict: Optional[bool] = None,
64+
hallucination_score_config: Optional[HallucinationScoreConfig] = None,
65+
) -> dict[str, list[ChatMessage]]:
66+
"""
67+
Invokes chat completion based on the provided messages and generation parameters.
68+
69+
:param messages:
70+
A list of ChatMessage instances representing the input messages.
71+
:param streaming_callback:
72+
A callback function that is called when a new token is received from the stream.
73+
:param generation_kwargs:
74+
Additional keyword arguments for text generation. These parameters will
75+
override the parameters passed during component initialization.
76+
For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
77+
:param tools:
78+
A list of tools or a Toolset for which the model can prepare calls. If set, it will override the
79+
`tools` parameter set during component initialization. This parameter can accept either a list of
80+
`Tool` objects or a `Toolset` instance.
81+
:param tools_strict:
82+
Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
83+
the schema provided in the `parameters` field of the tool definition, but this may increase latency.
84+
If set, it will override the `tools_strict` parameter set during component initialization.
85+
:param hallucination_score_config:
86+
If provided, the generator will evaluate the hallucination risk of its responses using
87+
the OpenAIPlanner and annotate each response with hallucination metrics.
88+
This involves generating multiple samples and analyzing their consistency, which may increase
89+
latency and cost. Use this option when you need to assess the reliability of the generated content
90+
in scenarios where accuracy is critical.
91+
For details, see the [research paper](https://arxiv.org/abs/2507.11768)
92+
93+
:returns:
94+
A dictionary with the following key:
95+
- `replies`: A list containing the generated responses as ChatMessage instances. If hallucination
96+
scoring is enabled, each message will include additional metadata:
97+
- `hallucination_decision`: "ANSWER" if the model decided to answer, "REFUSE" if it abstained.
98+
- `hallucination_risk`: The EDFL hallucination risk bound.
99+
- `hallucination_rationale`: The rationale behind the hallucination decision.
100+
"""
101+
if len(messages) == 0:
102+
return {"replies": []}
103+
104+
# Call parent implementation
105+
result = super(OpenAIChatGenerator, self).run(
106+
messages=messages,
107+
streaming_callback=streaming_callback,
108+
generation_kwargs=generation_kwargs,
109+
tools=tools,
110+
tools_strict=tools_strict,
111+
)
112+
completions = result["replies"]
113+
114+
# Add hallucination scoring if configured
115+
if hallucination_score_config and messages[-1].text:
116+
hallucination_meta = calculate_hallucination_metrics(
117+
prompt=messages[-1].text, hallucination_score_config=hallucination_score_config, chat_generator=self
118+
)
119+
completions = [replace(m, _meta={**m.meta, **hallucination_meta}) for m in completions]
120+
121+
return {"replies": completions}
122+
123+
@component.output_types(replies=list[ChatMessage])
124+
async def run_async(
125+
self,
126+
messages: list[ChatMessage],
127+
streaming_callback: Optional[StreamingCallbackT] = None,
128+
generation_kwargs: Optional[dict[str, Any]] = None,
129+
*,
130+
tools: Optional[Union[list[Tool], Toolset]] = None,
131+
tools_strict: Optional[bool] = None,
132+
hallucination_score_config: Optional[HallucinationScoreConfig] = None,
133+
) -> dict[str, list[ChatMessage]]:
134+
"""
135+
Asynchronously invokes chat completion based on the provided messages and generation parameters.
136+
137+
This is the asynchronous version of the `run` method. It has the same parameters and return values
138+
but can be used with `await` in async code.
139+
140+
:param messages:
141+
A list of ChatMessage instances representing the input messages.
142+
:param streaming_callback:
143+
A callback function that is called when a new token is received from the stream.
144+
Must be a coroutine.
145+
:param generation_kwargs:
146+
Additional keyword arguments for text generation. These parameters will
147+
override the parameters passed during component initialization.
148+
For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
149+
:param tools:
150+
A list of tools or a Toolset for which the model can prepare calls. If set, it will override the
151+
`tools` parameter set during component initialization. This parameter can accept either a list of
152+
`Tool` objects or a `Toolset` instance.
153+
:param tools_strict:
154+
Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
155+
the schema provided in the `parameters` field of the tool definition, but this may increase latency.
156+
If set, it will override the `tools_strict` parameter set during component initialization.
157+
:param hallucination_score_config:
158+
If provided, the generator will evaluate the hallucination risk of its responses using
159+
the OpenAIPlanner and annotate each response with hallucination metrics.
160+
This involves generating multiple samples and analyzing their consistency, which may increase
161+
latency and cost. Use this option when you need to assess the reliability of the generated content
162+
in scenarios where accuracy is critical.
163+
For details, see the [research paper](https://arxiv.org/abs/2507.11768)
164+
165+
:returns:
166+
A dictionary with the following key:
167+
- `replies`: A list containing the generated responses as ChatMessage instances. If hallucination
168+
scoring is enabled, each message will include additional metadata:
169+
- `hallucination_decision`: "ANSWER" if the model decided to answer, "REFUSE" if it abstained.
170+
- `hallucination_risk`: The EDFL hallucination risk bound.
171+
- `hallucination_rationale`: The rationale behind the hallucination decision.
172+
"""
173+
if len(messages) == 0:
174+
return {"replies": []}
175+
176+
# Call parent implementation
177+
result = await super(OpenAIChatGenerator, self).run_async(
178+
messages=messages,
179+
streaming_callback=streaming_callback,
180+
generation_kwargs=generation_kwargs,
181+
tools=tools,
182+
tools_strict=tools_strict,
183+
)
184+
completions = result["replies"]
185+
186+
# Add hallucination scoring if configured
187+
if hallucination_score_config and messages[-1].text:
188+
hallucination_meta = calculate_hallucination_metrics(
189+
prompt=messages[-1].text, hallucination_score_config=hallucination_score_config, chat_generator=self
190+
)
191+
completions = [replace(m, _meta={**m.meta, **hallucination_meta}) for m in completions]
192+
193+
return {"replies": completions}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# SPDX-FileCopyrightText: 2022-present deepset GmbH <[email protected]>
2+
#
3+
# SPDX-License-Identifier: Apache-2.0
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# ruff: noqa: D103
2+
# Original code Copyright (c) 2024 Hassana Labs
3+
# Licensed under the MIT License (see LICENSE-MIT).
4+
# Modified by deepset, 2025.
5+
# Licensed under the Apache License, Version 2.0 (see LICENSE-APACHE).

0 commit comments

Comments
 (0)