Skip to content

Commit 3fc1950

Browse files
smodlichSebastian Modlichshahules786
authored
Fix adapt issue for keyphrase extractor (#766)
Hi, This fixes [issue #625](#625). This issue is happening when after adaption testset generation is called. This issue is caused by the fact, that after adaption the keyword extractor returns nothing. This leads to failure later. However the keywords are indeed extracted correctly (as can be seen when debugging) but not returned properly from the function. The key "topics" doesn't exist (after adaption) in the dict, instead it's "keyphrases". This PR works for both with/without adaption. --------- Co-authored-by: Sebastian Modlich <[email protected]> Co-authored-by: Shahules786 <[email protected]>
1 parent 0d1223c commit 3fc1950

File tree

3 files changed

+18
-21
lines changed

3 files changed

+18
-21
lines changed

src/ragas/testset/evolutions.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -298,7 +298,7 @@ async def _aevolve(
298298
results = await self.generator_llm.generate(
299299
prompt=self.seed_question_prompt.format(
300300
context=merged_node.page_content,
301-
topic=rng.choice(np.array(merged_node.keyphrases), size=1)[0],
301+
keyphrase=rng.choice(np.array(merged_node.keyphrases), size=1)[0],
302302
)
303303
)
304304
seed_question = results.generations[0][0].text

src/ragas/testset/extractor.py

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,7 @@
66
from dataclasses import dataclass, field
77

88
from ragas.llms.json_load import json_loader
9-
from ragas.testset.prompts import (
10-
keyphrase_extraction_prompt,
11-
main_topic_extraction_prompt,
12-
)
9+
from ragas.testset.prompts import keyphrase_extraction_prompt
1310

1411
if t.TYPE_CHECKING:
1512
from ragas.llms.base import BaseRagasLLM
@@ -43,30 +40,30 @@ def save(self, cache_dir: t.Optional[str] = None) -> None:
4340

4441
@dataclass
4542
class KeyphraseExtractor(Extractor):
46-
keyphrase_extraction_prompt: Prompt = field(
47-
default_factory=lambda: main_topic_extraction_prompt
43+
extractor_prompt: Prompt = field(
44+
default_factory=lambda: keyphrase_extraction_prompt
4845
)
4946

5047
async def extract(self, node: Node, is_async: bool = True) -> t.List[str]:
51-
prompt = self.keyphrase_extraction_prompt.format(text=node.page_content)
48+
prompt = self.extractor_prompt.format(text=node.page_content)
5249
results = await self.llm.generate(prompt=prompt, is_async=is_async)
5350
keyphrases = await json_loader.safe_load(
5451
results.generations[0][0].text.strip(), llm=self.llm, is_async=is_async
5552
)
5653
keyphrases = keyphrases if isinstance(keyphrases, dict) else {}
5754
logger.debug("topics: %s", keyphrases)
58-
return keyphrases.get("topics", [])
55+
return keyphrases.get("keyphrases", [])
5956

6057
def adapt(self, language: str, cache_dir: t.Optional[str] = None) -> None:
6158
"""
6259
Adapt the extractor to a different language.
6360
"""
64-
self.keyphrase_extraction_prompt = keyphrase_extraction_prompt.adapt(
61+
self.extractor_prompt = self.extractor_prompt.adapt(
6562
language, self.llm, cache_dir
6663
)
6764

6865
def save(self, cache_dir: t.Optional[str] = None) -> None:
6966
"""
7067
Save the extractor prompts to a path.
7168
"""
72-
self.keyphrase_extraction_prompt.save(cache_dir)
69+
self.extractor_prompt.save(cache_dir)

src/ragas/testset/prompts.py

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -335,22 +335,22 @@
335335
instruction="Generate a question that can be fully answered from given context. The question should be formed using topic",
336336
examples=[
337337
{
338-
"context": "The ecosystem of the Amazon rainforest is incredibly diverse, hosting thousands of species that are not found anywhere else on Earth. This biodiversity is crucial for the stability of the global climate and helps regulate the Earth's air and water cycles.",
339-
"topic": "biodiversity in the Amazon rainforest",
340-
"question": "Why is the biodiversity in the Amazon rainforest considered crucial for global climate stability?",
338+
"context": "Photosynthesis in plants involves converting light energy into chemical energy, using chlorophyll and other pigments to absorb light. This process is crucial for plant growth and the production of oxygen.",
339+
"keyphrase": "Photosynthesis",
340+
"question": "What is the role of photosynthesis in plant growth?",
341341
},
342342
{
343-
"context": "Quantum computing represents a significant leap forward in computational capability, utilizing the principles of quantum mechanics to process information in ways that traditional computers cannot. This technology has the potential to revolutionize various fields by performing complex calculations at unprecedented speeds.",
344-
"topic": "potential applications of quantum computing",
345-
"question": "What fields could potentially be revolutionized by the applications of quantum computing?",
343+
"context": "The Industrial Revolution, starting in the 18th century, marked a major turning point in history as it led to the development of factories and urbanization.",
344+
"keyphrase": "Industrial Revolution",
345+
"question": "How did the Industrial Revolution mark a major turning point in history?",
346346
},
347347
{
348-
"context": "Renewable energy sources, such as solar and wind power, are essential for transitioning to a more sustainable energy system. They offer the potential to reduce greenhouse gas emissions and dependency on fossil fuels, addressing key environmental and economic challenges.",
349-
"topic": "benefits of renewable energy sources",
350-
"question": "What are the primary benefits of transitioning to renewable energy sources?",
348+
"context": "The process of evaporation plays a crucial role in the water cycle, converting water from liquid to vapor and allowing it to rise into the atmosphere.",
349+
"keyphrase": "Evaporation",
350+
"question": "Why is evaporation important in the water cycle?",
351351
},
352352
],
353-
input_keys=["context", "topic"],
353+
input_keys=["context", "keyphrase"],
354354
output_key="question",
355355
output_type="string",
356356
)

0 commit comments

Comments
 (0)