Skip to content

Commit 9691eee

Browse files
authored
add complexity score and metrics page (PR #294)
2 parents 900a09f + b969aef commit 9691eee

File tree

8 files changed

+195
-4
lines changed

8 files changed

+195
-4
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
## Changelog
22

3+
### v1.4.10 (December 8, 2025)
4+
- add complexity score
5+
- add documentation for metrics
6+
- bug fixes in Ontologizer
7+
38
### v1.4.9 (December 8, 2025)
49
- add retriever collection
510
- add documentation for retrievers

CITATION.cff

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,5 +31,5 @@ keywords:
3131
- Large Language Models
3232
- Text-to-ontology
3333
license: MIT
34-
version: 1.4.9
34+
version: 1.4.10
3535
date-released: '2025'

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,7 @@ or GitHub repository:
186186
ontologizer/ontology_hosting
187187
ontologizer/new_ontologies
188188
ontologizer/metadata
189+
ontologizer/metrics
189190

190191
.. toctree::
191192
:maxdepth: 1
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
Metrics
2+
=================
3+
4+
.. sidebar:: Metric Space:
5+
6+
There are a dedicated Hugging Face space for `OntoLearner Benchmark Metrics <https://huggingface.co/spaces/SciKnowOrg/OntoLearner-Benchmark-Metrics>`_ with analysis and live plots.
7+
8+
The ``Analyzer`` class in OntoLearner provides a unified interface for computing **ontology metrics**, which can be divided into two main categories: **Topology Metrics** (capture the structural characteristics of the ontology graph) and **Dataset Metrics** (assess the quality and distribution of the extracted learning datasets). Additionally, a **complexity score** can be derived from these metrics to summarize the overall ontology richness and complexity.
9+
10+
Topology Metrics
11+
----------------
12+
Topology metrics describe the structure and organization of an ontology. The ``Analyzer`` computes the following key metrics:
13+
14+
- **Total nodes** (``total_nodes``): Total number of nodes in the ontology graph.
15+
- **Total edges** (``total_edges``): Total number of edges representing relations between nodes.
16+
- **Root nodes** (``num_root_nodes``): Nodes with no incoming edges, representing top-level concepts.
17+
- **Leaf nodes** (``num_leaf_nodes``): Nodes with no outgoing edges, representing bottom-level concepts.
18+
- **Classes** (``num_classes``): Number of distinct ontology classes.
19+
- **Properties** (``num_properties``): Number of distinct properties (object or datatype properties).
20+
- **Individuals** (``num_individuals``): Number of instances associated with classes.
21+
- **Depth metrics**:
22+
23+
- ``max_depth``: Maximum hierarchical depth in the ontology.
24+
- ``min_depth``: Minimum hierarchical depth.
25+
- ``avg_depth``: Average hierarchical depth across all nodes.
26+
- ``depth_variance``: Variance of depth distribution.
27+
28+
- **Breadth metrics**:
29+
30+
- ``max_breadth``: Maximum number of nodes at any single hierarchy level.
31+
- ``min_breadth``: Minimum number of nodes at any hierarchy level.
32+
- ``avg_breadth``: Average number of nodes per hierarchy level.
33+
- ``breadth_variance``: Variance of breadth distribution.
34+
35+
Dataset Metrics
36+
---------------
37+
38+
Dataset metrics evaluate the characteristics of machine-learning datasets extracted from the ontology. These metrics include:
39+
40+
- **Number of term-type mappings** (``num_term_types``): Number of terms associated with types.
41+
- **Number of taxonomic (is-a) relations** (``num_taxonomic_relations``): Count of hierarchical relations.
42+
- **Number of non-taxonomic relations** (``num_non_taxonomic_relations``): Count of semantic associations not in the hierarchy.
43+
- **Average terms per type** (``avg_terms``): Measures dataset balance across classes.
44+
45+
46+
Complexity Score
47+
----------------
48+
49+
The **complexity score** combines topology and dataset metrics into a single normalized score in ``[0, 1]``. First, metrics are **log-normalized** and weighted by category:
50+
51+
.. list-table::
52+
:header-rows: 1
53+
:widths: 25 50 25
54+
55+
* - Metric Category
56+
- Example Metrics
57+
- Weight
58+
* - Graph structure
59+
- ``total_nodes``, ``total_edges``, ``num_root_nodes``, ``num_leaf_nodes``
60+
- 0.3
61+
* - Knowledge coverage
62+
- ``num_classes``, ``num_properties``, ``num_individuals``
63+
- 0.25
64+
* - Hierarchy
65+
- ``max_depth``, ``min_depth``, ``avg_depth``, ``depth_variance``
66+
- 0.10
67+
* - Breadth
68+
- ``max_breadth``, ``min_breadth``, ``avg_breadth``, ``breadth_variance``
69+
- 0.20
70+
* - Dataset (LLMs4OL)
71+
- ``num_term_types``, ``num_taxonomic_relations``, ``num_non_taxonomic_relations``, ``avg_terms``
72+
- 0.15
73+
74+
75+
Next, the weighted sum of metrics is passed through a **logistic function** to normalize the final complexity score.
76+
77+
78+
Example Usage
79+
-------------
80+
81+
Here is a simple example demonstrating how to compute metrics and complexity for an ontology:
82+
83+
.. code-block:: python
84+
85+
from ontolearner.tools import Analyzer
86+
from ontolearner.ontology import Wine
87+
88+
# Step 1 — Load ontology
89+
ontology = Wine()
90+
ontology.build_graph()
91+
92+
# Step 2 — Create the analyzer
93+
analyzer = Analyzer()
94+
95+
# Step 3 — Compute topology and dataset metrics
96+
topology_metrics = analyzer.compute_topology_metrics(ontology)
97+
dataset_metrics = analyzer.compute_dataset_metrics(ontology)
98+
99+
# Step 4 — Compute overall complexity score
100+
complexity_score = analyzer.compute_complexity_score(
101+
topology_metrics=topology_metrics,
102+
dataset_metrics=dataset_metrics
103+
)
104+
# Step 5 — Display results
105+
print("Topology Metrics:", topology_metrics)
106+
print("Dataset Metrics:", dataset_metrics)
107+
print("Ontology Complexity Score:", complexity_score)
108+
109+
110+
This workflow allows ontology engineers and researchers to **quantify structural quality, dataset richness, and overall complexity**, providing actionable insights for ontology evaluation, benchmarking, and improvement.

examples/complexity_score.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
from ontolearner.tools import Analyzer
2+
from ontolearner.ontology import Wine
3+
4+
# Step 1 — Load ontology
5+
ontology = Wine()
6+
ontology.build_graph()
7+
8+
# Step 2 — Create the analyzer
9+
analyzer = Analyzer()
10+
11+
# Step 3 — Compute topology and dataset metrics
12+
topology_metrics = analyzer.compute_topology_metrics(ontology)
13+
dataset_metrics = analyzer.compute_dataset_metrics(ontology)
14+
15+
# Step 4 — Compute overall complexity score
16+
complexity_score = analyzer.compute_complexity_score(
17+
topology_metrics=topology_metrics,
18+
dataset_metrics=dataset_metrics
19+
)
20+
21+
# Step 5 — Display results
22+
print("Topology Metrics:", topology_metrics)
23+
print("Dataset Metrics:", dataset_metrics)
24+
print("Ontology Complexity Score:", complexity_score)

ontolearner/VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.4.9
1+
1.4.10

ontolearner/base/ontology.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -372,7 +372,7 @@ def _update_metrics_space(self, metrics_file_path: Path, metrics: OntologyMetric
372372
# Save updated metrics
373373
df.to_excel(metrics_file_path, index=False)
374374

375-
def is_valid_label(label: str) -> Any:
375+
def is_valid_label(self, label: str) -> Any:
376376
invalids = ['root', 'thing']
377377
if label.lower() in invalids:
378378
return None
@@ -522,7 +522,7 @@ def check_if_class(self, entity):
522522
return True
523523
return False
524524

525-
def _is_anonymous_id(label: str) -> bool:
525+
def _is_anonymous_id(self, label: str) -> bool:
526526
"""Check if a label represents an anonymous class identifier."""
527527
if not label:
528528
return True

ontolearner/tools/analyzer.py

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
import logging
1616
import time
17+
import numpy as np
1718
from abc import ABC
1819
from rdflib import RDF, RDFS, OWL
1920
from collections import defaultdict
@@ -186,6 +187,56 @@ def compute_topology_metrics(ontology: BaseOntology) -> TopologyMetrics:
186187

187188
return metrics
188189

190+
@staticmethod
191+
def compute_complexity_score(
192+
topology_metrics: TopologyMetrics,
193+
dataset_metrics: DatasetMetrics,
194+
a: float = 0.4,
195+
b: float = 6.0,
196+
eps: float = 1e-12
197+
) -> float:
198+
"""
199+
Compute a single normalized complexity score for an ontology.
200+
201+
This function combines structural topology metrics and dataset quality metrics
202+
into a weighted aggregate score, then applies a logistic transformation to
203+
normalize it to the range [0, 1]. The score reflects overall ontology complexity,
204+
considering graph structure, hierarchy, breadth, coverage, and dataset richness.
205+
206+
Args:
207+
topology_metrics (TopologyMetrics): Precomputed structural metrics of the ontology graph.
208+
dataset_metrics (DatasetMetrics): Precomputed metrics of extracted learning datasets.
209+
a (float, optional): Steepness parameter for the logistic normalization function. Default is 0.4.
210+
b (float, optional): Centering parameter for the logistic function, should be tuned to match the scale of aggregated metrics. Default is 6.0.
211+
eps (float, optional): Small epsilon to prevent numerical issues in logistic computation. Default is 1e-12.
212+
213+
Returns:
214+
float: Normalized complexity score in [0, 1], where higher values indicate more complex ontologies.
215+
216+
Notes:
217+
- Weights are assigned to different metric categories: graph metrics, coverage metrics, hierarchy metrics,
218+
breadth metrics, and dataset metrics (term-types, taxonomic, non-taxonomic relations).
219+
- Metrics are log-normalized before weighting to reduce scale differences.
220+
- The logistic transformation ensures the final score is bounded and interpretable.
221+
"""
222+
# Define metric categories with their weights
223+
metric_categories = {
224+
0.3: ["total_nodes", "total_edges", "num_root_nodes", "num_leaf_nodes"],
225+
0.25: ["num_classes", "num_properties", "num_individuals"],
226+
0.10: ["max_depth", "min_depth", "avg_depth", "depth_variance"],
227+
0.20: ["max_breadth", "min_breadth", "avg_breadth", "breadth_variance"],
228+
0.15: ["num_term_types", "num_taxonomic_relations", "num_non_taxonomic_relations", "avg_terms"]
229+
}
230+
weights = {metric: weight for weight, metrics in metric_categories.items() for metric in metrics}
231+
metrics = [metric for _, metric_list in metric_categories.items() for metric in metric_list]
232+
onto_metrics = {**topology_metrics.__dict__, **dataset_metrics.__dict__}
233+
norm_weighted_values = [np.log1p(onto_metrics[m]) * weights[m] for m in metrics if m in onto_metrics]
234+
total_weight = sum(weights[m] for m in metrics if m in onto_metrics)
235+
weighted_sum = sum(norm_weighted_values) / total_weight if total_weight > 0 else 0.0
236+
complexity_score = 1.0 / (1.0 + np.exp(-a * (weighted_sum - b) + eps))
237+
return complexity_score
238+
239+
189240
@staticmethod
190241
def compute_dataset_metrics(ontology: BaseOntology) -> DatasetMetrics:
191242
"""

0 commit comments

Comments
 (0)