Skip to content

Commit 56fd51d

Browse files
authored
fix&perf: correct power law R² calculation in structure evaluator (#142)
* fix: correct power law R² calculation in structure evaluator * docs: add KG evaluation metrics to READM * simplify StructureEvaluator * pylint: clean up whitespace
1 parent f2796f5 commit 56fd51d

File tree

3 files changed

+17
-6
lines changed

3 files changed

+17
-6
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ After data generation, you can use [LLaMA-Factory](https://github.com/hiyouga/LL
6262

6363
## 📌 Latest Updates
6464

65+
- **2025.12.26**: Added comprehensive knowledge graph evaluation metrics including accuracy assessment (entity/relation extraction quality), consistency assessment (conflict detection), and structural robustness assessment (noise ratio, connectivity, degree distribution).
6566
- **2025.12.16**: Added [rocksdb](https://github.com/facebook/rocksdb) for key-value storage backend and [kuzudb](https://github.com/kuzudb/kuzu) for graph database backend support.
6667
- **2025.12.16**: Added [vllm](https://github.com/vllm-project/vllm) for local inference backend support.
6768
- **2025.12.16**: Refactored the data generation pipeline using [ray](https://github.com/ray-project/ray) to improve the efficiency of distributed execution and resource management.

README_zh.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期
6262
在数据生成后,您可以使用[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)[xtuner](https://github.com/InternLM/xtuner)对大语言模型进行微调。
6363

6464
## 📌 最新更新
65+
- **2025.12.26**: 新增知识图谱评估指标,包括准确度评估(实体/关系抽取质量)、一致性评估(冲突检测)和结构鲁棒性评估(噪声比、连通性、度分布)。
6566
- **2025.12.16**:新增 [rocksdb](https://github.com/facebook/rocksdb) 作为键值存储后端, [kuzudb](https://github.com/kuzudb/kuzu) 作为图数据库后端的支持。
6667
- **2025.12.16**:新增 [vllm](https://github.com/vllm-project/vllm) 作为本地推理后端的支持。
6768
- **2025.12.16**:使用 [ray](https://github.com/ray-project/ray) 重构了数据生成 pipeline,提升了分布式执行和资源管理的效率。

graphgen/models/evaluator/kg/structure_evaluator.py

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
from collections import Counter
12
from typing import Any, Dict, Optional
23

34
import numpy as np
@@ -81,14 +82,22 @@ def _calculate_powerlaw_r2(degree_map: Dict[str, int]) -> Optional[float]:
8182
return None
8283

8384
try:
84-
# Fit power law: log(y) = a * log(x) + b
85-
log_degrees = np.log(degrees)
86-
sorted_log_degrees = np.sort(log_degrees)
87-
x = np.arange(1, len(sorted_log_degrees) + 1)
88-
log_x = np.log(x)
85+
degree_counts = Counter(degrees)
86+
degree_values, frequencies = zip(*sorted(degree_counts.items()))
87+
88+
if len(degree_values) < 3:
89+
logger.warning(
90+
f"Insufficient unique degrees ({len(degree_values)}) for power law fitting. "
91+
f"Graph may be too uniform."
92+
)
93+
return None
94+
95+
# Fit power law: log(frequency) = a * log(degree) + b
96+
log_degrees = np.log(degree_values)
97+
log_frequencies = np.log(frequencies)
8998

9099
# Linear regression on log-log scale
91-
r_value, *_ = stats.linregress(log_x, sorted_log_degrees)
100+
r_value, *_ = stats.linregress(log_degrees, log_frequencies)
92101
r2 = r_value**2
93102

94103
return float(r2)

0 commit comments

Comments
 (0)