Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
fbc8d52
wip: refactor data evaluators & add kg evaluators
ChenZiHong-Gavin Dec 17, 2025
18be127
feat: add KG quality evaluation module
CHERRY-ui8 Dec 23, 2025
a44b1f3
refactor: removed repeated calculations and remove hardcoded params
CHERRY-ui8 Dec 23, 2025
6c77734
add: add kg_evaluate config file for params
CHERRY-ui8 Dec 23, 2025
93abd00
fix: correct relation acc evaluation logic
CHERRY-ui8 Dec 23, 2025
777cb25
refactor: enhance KG evaluator to use llm-as judge; remove evaluate_k…
CHERRY-ui8 Dec 23, 2025
5bfdc0a
fix: fix format and clean up imports
CHERRY-ui8 Dec 23, 2025
8ef5f47
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…
ChenZiHong-Gavin Dec 24, 2025
42693df
wip: refactor evaluator structure
ChenZiHong-Gavin Dec 24, 2025
09072f0
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…
ChenZiHong-Gavin Dec 24, 2025
a257246
wip: add annotations
ChenZiHong-Gavin Dec 24, 2025
41015a2
refactor: refactor proj structure & configs
ChenZiHong-Gavin Dec 25, 2025
978b76c
wip: split prompts
ChenZiHong-Gavin Dec 25, 2025
77bb00d
refactor: refactor base_evaluator
ChenZiHong-Gavin Dec 25, 2025
19510d9
refator: refactor LengthEvaluator
ChenZiHong-Gavin Dec 25, 2025
028b043
refactor: refactor MTLDEvaluator
ChenZiHong-Gavin Dec 25, 2025
c161358
refactor: refactor NLTKHelper
ChenZiHong-Gavin Dec 25, 2025
58ede2e
refactor: refactor RewardEvaluator
ChenZiHong-Gavin Dec 25, 2025
f3a0391
refactor: refactor UniEvaluator
ChenZiHong-Gavin Dec 25, 2025
2a3f09f
refactor: refactor evaluator structure
ChenZiHong-Gavin Dec 25, 2025
a4d7993
refactor: change evaluation methods in acc and consistency to sync
CHERRY-ui8 Dec 25, 2025
3ae2321
refactor: streamline evaluation functions for accuracy, consistency, …
CHERRY-ui8 Dec 25, 2025
f5b2254
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…
ChenZiHong-Gavin Dec 25, 2025
86fa173
wip: perf evaluate_service
ChenZiHong-Gavin Dec 25, 2025
8d7e6b4
merge
ChenZiHong-Gavin Dec 25, 2025
06fc6e3
perf: perf evaluate_service
ChenZiHong-Gavin Dec 25, 2025
f9d6dc3
fix: fix output node
ChenZiHong-Gavin Dec 26, 2025
4d022fb
merge
CHERRY-ui8 Dec 26, 2025
084cb08
feat: add KGQualityEvaluator and integrate into EvaluateService for K…
CHERRY-ui8 Dec 26, 2025
98968e6
refactor: remove KGQualityEvaluator and restructure KG evaluation int…
CHERRY-ui8 Dec 26, 2025
71ebba2
pylints
CHERRY-ui8 Dec 26, 2025
f6cce9b
feat: add kg_structure evaluation
ChenZiHong-Gavin Dec 26, 2025
4f0350b
feat: add kg_structure evaluation
ChenZiHong-Gavin Dec 26, 2025
e10b391
feat: add kg_accuracy & kg_consistency metrics
ChenZiHong-Gavin Dec 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions examples/evaluate_kg/evaluate_kg.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
python3 -m graphgen.operators.evaluate_kg.evaluate_kg \
--working_dir cache \
--graph_backend kuzu \
--kv_backend rocksdb \
--sample_size 100 \
--max_concurrent 10
8 changes: 7 additions & 1 deletion graphgen/models/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
from .evaluator import LengthEvaluator, MTLDEvaluator, RewardEvaluator, UniEvaluator
from .evaluator import (
KGQualityEvaluator,
LengthEvaluator,
MTLDEvaluator,
RewardEvaluator,
UniEvaluator,
)
from .generator import (
AggregatedGenerator,
AtomicGenerator,
Expand Down
1 change: 1 addition & 0 deletions graphgen/models/evaluator/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from .kg_quality_evaluator import KGQualityEvaluator
from .length_evaluator import LengthEvaluator
from .mtld_evaluator import MTLDEvaluator
from .reward_evaluator import RewardEvaluator
Expand Down
117 changes: 117 additions & 0 deletions graphgen/models/evaluator/kg/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# KG Quality Evaluation Module

This module provides comprehensive quality evaluation for knowledge graphs built by GraphGen.

## Module Structure

The evaluation functionality has been split into modular components:

- **`accuracy_evaluator.py`**: Entity/relation/triple accuracy evaluation using LLM-as-judge
- **`consistency_evaluator.py`**: Attribute value conflict detection
- **`structure_evaluator.py`**: Graph structural robustness metrics
- **`utils.py`**: Utility functions (NetworkX conversion, text retrieval, sampling)
- **`kg_quality_evaluator.py`**: Main evaluator class that integrates all modules

## Features

### 1. Accuracy Assessment
- **Entity Recognition Accuracy**: Samples entities and validates them using LLM
- **Relation Extraction Accuracy**: Samples relations and validates them using LLM
- **Triple Validation (RLC)**: Samples triples and validates them using LLM
- Calculates Precision, Recall, and F1 scores for each metric

### 2. Consistency Assessment
- Detects attribute value conflicts (same entity, same attribute, different values)
- Calculates conflict rate: `conflict_entities_count / total_entities`
- Returns detailed conflict information

### 3. Structural Robustness Assessment
- **Noise Ratio**: Isolated nodes / total nodes (threshold: < 15%)
- **Largest Connected Component Ratio**: Largest CC nodes / total nodes (threshold: > 90%)
- **Average Node Degree**: Average degree across all nodes (threshold: 2-5)
- **Power Law Distribution R²**: Degree distribution fit (threshold: > 0.75)

## Usage

### Command Line Usage

```bash
# Run all evaluations
python -m graphgen.operators.evaluate_kg.evaluate_kg --working_dir cache

# Run specific evaluation
python -m graphgen.operators.evaluate_kg.evaluate_kg --working_dir cache --accuracy_only

# Custom configuration
python -m graphgen.operators.evaluate_kg.evaluate_kg \
--working_dir cache \
--sample_size 200 \
--graph_backend networkx \
--kv_backend json_kv
```

### Shell Script Usage

```bash
# Basic usage
bash examples/evaluate_kg/evaluate_kg.sh

# With custom options
bash examples/evaluate_kg/evaluate_kg.sh \
--working_dir cache \
--sample_size 200 \
--accuracy_only
```

## Requirements

- **NetworkX**: Required for structural evaluation
- **scipy**: Required for power law distribution fitting
- **numpy**: Required for numerical calculations
- **LLM Client**: Required for accuracy evaluation (configure via `TRAINEE_*` env vars)

## Output Format

The evaluation returns a dictionary with the following structure:

```python
{
"accuracy": {
"entity_accuracy": {
"precision": float,
"recall": float,
"f1": float,
"true_positives": int,
"false_positives": int,
"sample_size": int
},
"relation_accuracy": { ... },
"triple_accuracy": { ... }
},
"consistency": {
"conflict_rate": float,
"conflict_entities_count": int,
"total_entities": int,
"conflicts": [ ... ]
},
"structure": {
"total_nodes": int,
"total_edges": int,
"noise_ratio": float,
"largest_cc_ratio": float,
"avg_degree": float,
"powerlaw_r2": float | None,
"thresholds": {
"noise_ratio": { "value": float, "threshold": float, "pass": bool },
...
}
}
}
```

## Notes

- Accuracy evaluation requires LLM API access and may be slow for large sample sizes
- Structural evaluation automatically converts Kuzu storage to NetworkX for analysis
- All evaluations include error handling and will return error messages if something fails
- The evaluator automatically loads graph and chunk storage from the working directory
14 changes: 14 additions & 0 deletions graphgen/models/evaluator/kg/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from .accuracy_evaluator import AccuracyEvaluator
from .consistency_evaluator import ConsistencyEvaluator
from .structure_evaluator import StructureEvaluator
from .utils import convert_to_networkx, get_relevant_text, get_source_text, sample_items

__all__ = [
"AccuracyEvaluator",
"ConsistencyEvaluator",
"StructureEvaluator",
"convert_to_networkx",
"get_relevant_text",
"get_source_text",
"sample_items",
]
Loading
Loading