|
1 | 1 | ENTITY_EVALUATION_PROMPT_ZH = """你是一个知识图谱质量评估专家。你的任务是从给定的文本块和提取的实体列表,评估实体提取的质量。 |
2 | 2 |
|
3 | 3 | 评估维度: |
4 | | -1. ACCURACY (准确性, 权重: 40%): 提取的实体是否正确,是否有误提取或错误识别 |
5 | | -2. COMPLETENESS (完整性, 权重: 40%): 是否遗漏了文本中的重要实体 |
6 | | -3. PRECISION (精确性, 权重: 20%): 提取的实体是否精确,命名是否准确 |
| 4 | +1. ACCURACY (准确性, 权重: 40%): 提取的实体是否真实存在于文本中,是否存在误提取(False Positive) |
| 5 | + - 检查:实体是否在文本中实际出现,是否将非实体文本误识别为实体 |
| 6 | + - 示例:文本提到"蛋白质A",但提取了文本中不存在的"蛋白质B" → 准确性低 |
| 7 | + - 示例:将"研究显示"这样的非实体短语提取为实体 → 准确性低 |
| 8 | +
|
| 9 | +2. COMPLETENESS (完整性, 权重: 40%): 是否遗漏了文本中的重要实体(Recall) |
| 10 | + - 检查:文本中的重要实体是否都被提取,是否存在遗漏(False Negative) |
| 11 | + - 示例:文本提到5个重要蛋白质,但只提取了3个 → 完整性低 |
| 12 | + - 示例:所有关键实体都被提取 → 完整性高 |
| 13 | +
|
| 14 | +3. PRECISION (精确性, 权重: 20%): 提取的实体命名是否精确、边界是否准确、类型是否正确 |
| 15 | + - 检查:实体名称是否完整准确,边界是否正确,实体类型分类是否正确 |
| 16 | + - 示例:应提取"人类胰岛素受体蛋白",但只提取了"胰岛素" → 精确性低(边界不准确) |
| 17 | + - 示例:应分类为"蛋白质",但分类为"基因" → 精确性低(类型错误) |
| 18 | + - 示例:应提取"COVID-19",但提取了"冠状病毒" → 精确性低(命名不够精确) |
7 | 19 |
|
8 | 20 | 评分标准(每个维度 0-1 分): |
9 | | -- EXCELLENT (0.8-1.0): 高质量提取 |
10 | | -- GOOD (0.6-0.79): 良好质量,有少量问题 |
11 | | -- ACCEPTABLE (0.4-0.59): 可接受,有明显问题 |
12 | | -- POOR (0.0-0.39): 质量差,需要改进 |
| 21 | +- EXCELLENT (0.8-1.0): 高质量提取,错误率 < 20% |
| 22 | +- GOOD (0.6-0.79): 良好质量,有少量问题,错误率 20-40% |
| 23 | +- ACCEPTABLE (0.4-0.59): 可接受,有明显问题,错误率 40-60% |
| 24 | +- POOR (0.0-0.39): 质量差,需要改进,错误率 > 60% |
13 | 25 |
|
14 | 26 | 综合评分 = 0.4 × Accuracy + 0.4 × Completeness + 0.2 × Precision |
15 | 27 |
|
|
38 | 50 | Your task is to evaluate the quality of entity extraction from a given text block and extracted entity list. |
39 | 51 |
|
40 | 52 | Evaluation Dimensions: |
41 | | -1. ACCURACY (Weight: 40%): Whether the extracted entities are correct, and if there are any false extractions or misidentifications |
42 | | -2. COMPLETENESS (Weight: 40%): Whether important entities from the text are missing |
43 | | -3. PRECISION (Weight: 20%): Whether the extracted entities are precise and accurately named |
| 53 | +1. ACCURACY (Weight: 40%): Whether the extracted entities actually exist in the text, and if there are any false extractions (False Positives) |
| 54 | + - Check: Do entities actually appear in the text? Are non-entity phrases incorrectly identified as entities? |
| 55 | + - Example: Text mentions "Protein A", but "Protein B" (not in text) is extracted → Low accuracy |
| 56 | + - Example: Phrases like "research shows" are extracted as entities → Low accuracy |
| 57 | +
|
| 58 | +2. COMPLETENESS (Weight: 40%): Whether important entities from the text are missing (Recall, False Negatives) |
| 59 | + - Check: Are all important entities from the text extracted? Are there any omissions? |
| 60 | + - Example: Text mentions 5 important proteins, but only 3 are extracted → Low completeness |
| 61 | + - Example: All key entities are extracted → High completeness |
| 62 | +
|
| 63 | +3. PRECISION (Weight: 20%): Whether extracted entities are precisely named, have correct boundaries, and correct types |
| 64 | + - Check: Are entity names complete and accurate? Are boundaries correct? Are entity types correctly classified? |
| 65 | + - Example: Should extract "Human Insulin Receptor Protein", but only "Insulin" is extracted → Low precision (incorrect boundary) |
| 66 | + - Example: Should be classified as "Protein", but classified as "Gene" → Low precision (incorrect type) |
| 67 | + - Example: Should extract "COVID-19", but "Coronavirus" is extracted → Low precision (naming not precise enough) |
44 | 68 |
|
45 | 69 | Scoring Criteria (0-1 scale for each dimension): |
46 | | -- EXCELLENT (0.8-1.0): High-quality extraction |
47 | | -- GOOD (0.6-0.79): Good quality with minor issues |
48 | | -- ACCEPTABLE (0.4-0.59): Acceptable with noticeable issues |
49 | | -- POOR (0.0-0.39): Poor quality, needs improvement |
| 70 | +- EXCELLENT (0.8-1.0): High-quality extraction, error rate < 20% |
| 71 | +- GOOD (0.6-0.79): Good quality with minor issues, error rate 20-40% |
| 72 | +- ACCEPTABLE (0.4-0.59): Acceptable with noticeable issues, error rate 40-60% |
| 73 | +- POOR (0.0-0.39): Poor quality, needs improvement, error rate > 60% |
50 | 74 |
|
51 | 75 | Overall Score = 0.4 × Accuracy + 0.4 × Completeness + 0.2 × Precision |
52 | 76 |
|
|
74 | 98 | RELATION_EVALUATION_PROMPT_ZH = """你是一个知识图谱质量评估专家。你的任务是从给定的文本块和提取的关系列表,评估关系抽取的质量。 |
75 | 99 |
|
76 | 100 | 评估维度: |
77 | | -1. ACCURACY (准确性, 权重: 40%): 提取的关系是否正确,关系描述是否准确 |
78 | | -2. COMPLETENESS (完整性, 权重: 40%): 是否遗漏了文本中的重要关系 |
79 | | -3. PRECISION (精确性, 权重: 20%): 关系描述是否精确,是否过于宽泛 |
| 101 | +1. ACCURACY (准确性, 权重: 40%): 提取的关系是否真实存在于文本中,是否存在误提取(False Positive) |
| 102 | + - 检查:关系是否在文本中实际表达,是否将不存在的关系误识别为关系 |
| 103 | + - 示例:文本中A和B没有关系,但提取了"A-作用于->B" → 准确性低 |
| 104 | + - 示例:将文本中的并列关系误识别为因果关系 → 准确性低 |
| 105 | +
|
| 106 | +2. COMPLETENESS (完整性, 权重: 40%): 是否遗漏了文本中的重要关系(Recall) |
| 107 | + - 检查:文本中表达的重要关系是否都被提取,是否存在遗漏(False Negative) |
| 108 | + - 示例:文本明确表达了5个关系,但只提取了3个 → 完整性低 |
| 109 | + - 示例:所有关键关系都被提取 → 完整性高 |
| 110 | +
|
| 111 | +3. PRECISION (精确性, 权重: 20%): 关系描述是否精确,关系类型是否正确,是否过于宽泛 |
| 112 | + - 检查:关系类型是否准确,关系描述是否具体,是否使用了过于宽泛的关系类型 |
| 113 | + - 示例:应提取"抑制"关系,但提取了"影响"关系 → 精确性低(类型不够精确) |
| 114 | + - 示例:应提取"直接结合",但提取了"相关" → 精确性低(描述过于宽泛) |
| 115 | + - 示例:关系方向是否正确(如"A激活B" vs "B被A激活")→ 精确性检查 |
80 | 116 |
|
81 | 117 | 评分标准(每个维度 0-1 分): |
82 | | -- EXCELLENT (0.8-1.0): 高质量提取 |
83 | | -- GOOD (0.6-0.79): 良好质量,有少量问题 |
84 | | -- ACCEPTABLE (0.4-0.59): 可接受,有明显问题 |
85 | | -- POOR (0.0-0.39): 质量差,需要改进 |
| 118 | +- EXCELLENT (0.8-1.0): 高质量提取,错误率 < 20% |
| 119 | +- GOOD (0.6-0.79): 良好质量,有少量问题,错误率 20-40% |
| 120 | +- ACCEPTABLE (0.4-0.59): 可接受,有明显问题,错误率 40-60% |
| 121 | +- POOR (0.0-0.39): 质量差,需要改进,错误率 > 60% |
86 | 122 |
|
87 | 123 | 综合评分 = 0.4 × Accuracy + 0.4 × Completeness + 0.2 × Precision |
88 | 124 |
|
|
111 | 147 | Your task is to evaluate the quality of relation extraction from a given text block and extracted relation list. |
112 | 148 |
|
113 | 149 | Evaluation Dimensions: |
114 | | -1. ACCURACY (Weight: 40%): Whether the extracted relations are correct and the relation descriptions are accurate |
115 | | -2. COMPLETENESS (Weight: 40%): Whether important relations from the text are missing |
116 | | -3. PRECISION (Weight: 20%): Whether the relation descriptions are precise and not overly broad |
| 150 | +1. ACCURACY (Weight: 40%): Whether the extracted relations actually exist in the text, and if there are any false extractions (False Positives) |
| 151 | + - Check: Do relations actually appear in the text? Are non-existent relations incorrectly identified? |
| 152 | + - Example: Text shows no relation between A and B, but "A-acts_on->B" is extracted → Low accuracy |
| 153 | + - Example: A parallel relationship in text is misidentified as a causal relationship → Low accuracy |
| 154 | +
|
| 155 | +2. COMPLETENESS (Weight: 40%): Whether important relations from the text are missing (Recall, False Negatives) |
| 156 | + - Check: Are all important relations expressed in the text extracted? Are there any omissions? |
| 157 | + - Example: Text explicitly expresses 5 relations, but only 3 are extracted → Low completeness |
| 158 | + - Example: All key relations are extracted → High completeness |
| 159 | +
|
| 160 | +3. PRECISION (Weight: 20%): Whether relation descriptions are precise, relation types are correct, and not overly broad |
| 161 | + - Check: Are relation types accurate? Are relation descriptions specific? Are overly broad relation types used? |
| 162 | + - Example: Should extract "inhibits" relation, but "affects" is extracted → Low precision (type not precise enough) |
| 163 | + - Example: Should extract "directly binds", but "related" is extracted → Low precision (description too broad) |
| 164 | + - Example: Is relation direction correct (e.g., "A activates B" vs "B is activated by A") → Precision check |
117 | 165 |
|
118 | 166 | Scoring Criteria (0-1 scale for each dimension): |
119 | | -- EXCELLENT (0.8-1.0): High-quality extraction |
120 | | -- GOOD (0.6-0.79): Good quality with minor issues |
121 | | -- ACCEPTABLE (0.4-0.59): Acceptable with noticeable issues |
122 | | -- POOR (0.0-0.39): Poor quality, needs improvement |
| 167 | +- EXCELLENT (0.8-1.0): High-quality extraction, error rate < 20% |
| 168 | +- GOOD (0.6-0.79): Good quality with minor issues, error rate 20-40% |
| 169 | +- ACCEPTABLE (0.4-0.59): Acceptable with noticeable issues, error rate 40-60% |
| 170 | +- POOR (0.0-0.39): Poor quality, needs improvement, error rate > 60% |
123 | 171 |
|
124 | 172 | Overall Score = 0.4 × Accuracy + 0.4 × Completeness + 0.2 × Precision |
125 | 173 |
|
|
0 commit comments