Skip to content

Commit 69c4560

Browse files
committed
refine: improve accuracy evaluation prompt definitions
1 parent ffe827e commit 69c4560

File tree

1 file changed

+76
-28
lines changed

1 file changed

+76
-28
lines changed

graphgen/templates/evaluation/kg/accuracy_evaluation.py

Lines changed: 76 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,27 @@
11
ENTITY_EVALUATION_PROMPT_ZH = """你是一个知识图谱质量评估专家。你的任务是从给定的文本块和提取的实体列表,评估实体提取的质量。
22
33
评估维度:
4-
1. ACCURACY (准确性, 权重: 40%): 提取的实体是否正确,是否有误提取或错误识别
5-
2. COMPLETENESS (完整性, 权重: 40%): 是否遗漏了文本中的重要实体
6-
3. PRECISION (精确性, 权重: 20%): 提取的实体是否精确,命名是否准确
4+
1. ACCURACY (准确性, 权重: 40%): 提取的实体是否真实存在于文本中,是否存在误提取(False Positive)
5+
- 检查:实体是否在文本中实际出现,是否将非实体文本误识别为实体
6+
- 示例:文本提到"蛋白质A",但提取了文本中不存在的"蛋白质B" → 准确性低
7+
- 示例:将"研究显示"这样的非实体短语提取为实体 → 准确性低
8+
9+
2. COMPLETENESS (完整性, 权重: 40%): 是否遗漏了文本中的重要实体(Recall)
10+
- 检查:文本中的重要实体是否都被提取,是否存在遗漏(False Negative)
11+
- 示例:文本提到5个重要蛋白质,但只提取了3个 → 完整性低
12+
- 示例:所有关键实体都被提取 → 完整性高
13+
14+
3. PRECISION (精确性, 权重: 20%): 提取的实体命名是否精确、边界是否准确、类型是否正确
15+
- 检查:实体名称是否完整准确,边界是否正确,实体类型分类是否正确
16+
- 示例:应提取"人类胰岛素受体蛋白",但只提取了"胰岛素" → 精确性低(边界不准确)
17+
- 示例:应分类为"蛋白质",但分类为"基因" → 精确性低(类型错误)
18+
- 示例:应提取"COVID-19",但提取了"冠状病毒" → 精确性低(命名不够精确)
719
820
评分标准(每个维度 0-1 分):
9-
- EXCELLENT (0.8-1.0): 高质量提取
10-
- GOOD (0.6-0.79): 良好质量,有少量问题
11-
- ACCEPTABLE (0.4-0.59): 可接受,有明显问题
12-
- POOR (0.0-0.39): 质量差,需要改进
21+
- EXCELLENT (0.8-1.0): 高质量提取,错误率 < 20%
22+
- GOOD (0.6-0.79): 良好质量,有少量问题,错误率 20-40%
23+
- ACCEPTABLE (0.4-0.59): 可接受,有明显问题,错误率 40-60%
24+
- POOR (0.0-0.39): 质量差,需要改进,错误率 > 60%
1325
1426
综合评分 = 0.4 × Accuracy + 0.4 × Completeness + 0.2 × Precision
1527
@@ -38,15 +50,27 @@
3850
Your task is to evaluate the quality of entity extraction from a given text block and extracted entity list.
3951
4052
Evaluation Dimensions:
41-
1. ACCURACY (Weight: 40%): Whether the extracted entities are correct, and if there are any false extractions or misidentifications
42-
2. COMPLETENESS (Weight: 40%): Whether important entities from the text are missing
43-
3. PRECISION (Weight: 20%): Whether the extracted entities are precise and accurately named
53+
1. ACCURACY (Weight: 40%): Whether the extracted entities actually exist in the text, and if there are any false extractions (False Positives)
54+
- Check: Do entities actually appear in the text? Are non-entity phrases incorrectly identified as entities?
55+
- Example: Text mentions "Protein A", but "Protein B" (not in text) is extracted → Low accuracy
56+
- Example: Phrases like "research shows" are extracted as entities → Low accuracy
57+
58+
2. COMPLETENESS (Weight: 40%): Whether important entities from the text are missing (Recall, False Negatives)
59+
- Check: Are all important entities from the text extracted? Are there any omissions?
60+
- Example: Text mentions 5 important proteins, but only 3 are extracted → Low completeness
61+
- Example: All key entities are extracted → High completeness
62+
63+
3. PRECISION (Weight: 20%): Whether extracted entities are precisely named, have correct boundaries, and correct types
64+
- Check: Are entity names complete and accurate? Are boundaries correct? Are entity types correctly classified?
65+
- Example: Should extract "Human Insulin Receptor Protein", but only "Insulin" is extracted → Low precision (incorrect boundary)
66+
- Example: Should be classified as "Protein", but classified as "Gene" → Low precision (incorrect type)
67+
- Example: Should extract "COVID-19", but "Coronavirus" is extracted → Low precision (naming not precise enough)
4468
4569
Scoring Criteria (0-1 scale for each dimension):
46-
- EXCELLENT (0.8-1.0): High-quality extraction
47-
- GOOD (0.6-0.79): Good quality with minor issues
48-
- ACCEPTABLE (0.4-0.59): Acceptable with noticeable issues
49-
- POOR (0.0-0.39): Poor quality, needs improvement
70+
- EXCELLENT (0.8-1.0): High-quality extraction, error rate < 20%
71+
- GOOD (0.6-0.79): Good quality with minor issues, error rate 20-40%
72+
- ACCEPTABLE (0.4-0.59): Acceptable with noticeable issues, error rate 40-60%
73+
- POOR (0.0-0.39): Poor quality, needs improvement, error rate > 60%
5074
5175
Overall Score = 0.4 × Accuracy + 0.4 × Completeness + 0.2 × Precision
5276
@@ -74,15 +98,27 @@
7498
RELATION_EVALUATION_PROMPT_ZH = """你是一个知识图谱质量评估专家。你的任务是从给定的文本块和提取的关系列表,评估关系抽取的质量。
7599
76100
评估维度:
77-
1. ACCURACY (准确性, 权重: 40%): 提取的关系是否正确,关系描述是否准确
78-
2. COMPLETENESS (完整性, 权重: 40%): 是否遗漏了文本中的重要关系
79-
3. PRECISION (精确性, 权重: 20%): 关系描述是否精确,是否过于宽泛
101+
1. ACCURACY (准确性, 权重: 40%): 提取的关系是否真实存在于文本中,是否存在误提取(False Positive)
102+
- 检查:关系是否在文本中实际表达,是否将不存在的关系误识别为关系
103+
- 示例:文本中A和B没有关系,但提取了"A-作用于->B" → 准确性低
104+
- 示例:将文本中的并列关系误识别为因果关系 → 准确性低
105+
106+
2. COMPLETENESS (完整性, 权重: 40%): 是否遗漏了文本中的重要关系(Recall)
107+
- 检查:文本中表达的重要关系是否都被提取,是否存在遗漏(False Negative)
108+
- 示例:文本明确表达了5个关系,但只提取了3个 → 完整性低
109+
- 示例:所有关键关系都被提取 → 完整性高
110+
111+
3. PRECISION (精确性, 权重: 20%): 关系描述是否精确,关系类型是否正确,是否过于宽泛
112+
- 检查:关系类型是否准确,关系描述是否具体,是否使用了过于宽泛的关系类型
113+
- 示例:应提取"抑制"关系,但提取了"影响"关系 → 精确性低(类型不够精确)
114+
- 示例:应提取"直接结合",但提取了"相关" → 精确性低(描述过于宽泛)
115+
- 示例:关系方向是否正确(如"A激活B" vs "B被A激活")→ 精确性检查
80116
81117
评分标准(每个维度 0-1 分):
82-
- EXCELLENT (0.8-1.0): 高质量提取
83-
- GOOD (0.6-0.79): 良好质量,有少量问题
84-
- ACCEPTABLE (0.4-0.59): 可接受,有明显问题
85-
- POOR (0.0-0.39): 质量差,需要改进
118+
- EXCELLENT (0.8-1.0): 高质量提取,错误率 < 20%
119+
- GOOD (0.6-0.79): 良好质量,有少量问题,错误率 20-40%
120+
- ACCEPTABLE (0.4-0.59): 可接受,有明显问题,错误率 40-60%
121+
- POOR (0.0-0.39): 质量差,需要改进,错误率 > 60%
86122
87123
综合评分 = 0.4 × Accuracy + 0.4 × Completeness + 0.2 × Precision
88124
@@ -111,15 +147,27 @@
111147
Your task is to evaluate the quality of relation extraction from a given text block and extracted relation list.
112148
113149
Evaluation Dimensions:
114-
1. ACCURACY (Weight: 40%): Whether the extracted relations are correct and the relation descriptions are accurate
115-
2. COMPLETENESS (Weight: 40%): Whether important relations from the text are missing
116-
3. PRECISION (Weight: 20%): Whether the relation descriptions are precise and not overly broad
150+
1. ACCURACY (Weight: 40%): Whether the extracted relations actually exist in the text, and if there are any false extractions (False Positives)
151+
- Check: Do relations actually appear in the text? Are non-existent relations incorrectly identified?
152+
- Example: Text shows no relation between A and B, but "A-acts_on->B" is extracted → Low accuracy
153+
- Example: A parallel relationship in text is misidentified as a causal relationship → Low accuracy
154+
155+
2. COMPLETENESS (Weight: 40%): Whether important relations from the text are missing (Recall, False Negatives)
156+
- Check: Are all important relations expressed in the text extracted? Are there any omissions?
157+
- Example: Text explicitly expresses 5 relations, but only 3 are extracted → Low completeness
158+
- Example: All key relations are extracted → High completeness
159+
160+
3. PRECISION (Weight: 20%): Whether relation descriptions are precise, relation types are correct, and not overly broad
161+
- Check: Are relation types accurate? Are relation descriptions specific? Are overly broad relation types used?
162+
- Example: Should extract "inhibits" relation, but "affects" is extracted → Low precision (type not precise enough)
163+
- Example: Should extract "directly binds", but "related" is extracted → Low precision (description too broad)
164+
- Example: Is relation direction correct (e.g., "A activates B" vs "B is activated by A") → Precision check
117165
118166
Scoring Criteria (0-1 scale for each dimension):
119-
- EXCELLENT (0.8-1.0): High-quality extraction
120-
- GOOD (0.6-0.79): Good quality with minor issues
121-
- ACCEPTABLE (0.4-0.59): Acceptable with noticeable issues
122-
- POOR (0.0-0.39): Poor quality, needs improvement
167+
- EXCELLENT (0.8-1.0): High-quality extraction, error rate < 20%
168+
- GOOD (0.6-0.79): Good quality with minor issues, error rate 20-40%
169+
- ACCEPTABLE (0.4-0.59): Acceptable with noticeable issues, error rate 40-60%
170+
- POOR (0.0-0.39): Poor quality, needs improvement, error rate > 60%
123171
124172
Overall Score = 0.4 × Accuracy + 0.4 × Completeness + 0.2 × Precision
125173

0 commit comments

Comments
 (0)